I have had the pleasure of working with ExtraHop for several months. I have been blown away by not only the sheer amount of data the product analyzes but also the matter in which it's displayed. The dashboards are visually appealing and are also succinct. Each dashboard page contains all of the data you need to quickly diagnose issues. In most cases, you shouldn't need to see one symptom and then drill down through other dashboards and menus to find what you need. ExtraHop isn't really a traditional monitoring tool. It a wire data analytics tool. I will explain this as we dig into things. From an engineer's perspective, if you have say a Citrix performance issue, you can look at the Citrix page of Dashboards, look at a weeks worth of data, detect a trend and you'll likely see the cause for whatever issue you are having.
Here is a great example. I spent days troubleshooting a "Citrix slowness" issue which turned out to be caused by a storm of client calls for WPAD. The very first day the ExtraHop Discover was setup, the very first thing noticed was WPAD calls. I scratched my head..."hey, I thought that was resolved!" It turned out the fix had not been deployed to every machine in the org. We used this data and got it resolved immediately. We got value from the very first moment.
The screenshot above displays data spanning only a few minutes. The dashboards and UI is very slick and intuitive. If you see what looks like a spike on one of the charts, you can highlight that part of the timeline to display only the data which occurred during the spike. You can also adjust the amount of data at any time in the top left hand corner of the page by selecting a beginning date and time and an end date and time giving you historical data which is instantly accessible. The product can also highlight deviations to put that historical data to use. Perhaps the data shows you have 200 aborted Citrix app launch attempts in a given day, if you spike to 3000, that is a deviation which you should be alarmed by! It likely indicate a problem.
Right now, I primarily work with XenApp for published applications and VMware Horizon for virtual desktops. The data provided proved invaluable during a very busy migration period. When embarking on a project of any kind, the data can be used for performance baseline e.g. if you are about go on and AD and Group Policy cleaning expedition, you could look at performance metrics before and after. You can figure out your baseline performance across the board for various different actions.
For our environment, we setup three ExtraHop appliances in our DC. Port mirroring was enabled to allow ExtraHop to sniff all of the packets across our environment. This is incredibly powerful as it's seeing everything. That means it's not just displaying data for our Citrix servers or our Horizon virtual desktops. We are getting data on Active Directory, DNS, Storage and pretty much every facet of our environment. From a Citrix admin perspective, this is huge.
To people who don't work with XenApp, they don't realize how poor performance or issues with storage, network, DNS etc. can impact XenApp users. If there are DNS issues, connections may fail. If there are storage issues, PVS may lose it's stores. Both DNS and storage issues could also cause slowness within the apps themselves. If you rely entirely on a Citrix specific tool such as Director, you won't see the full picture. It will tell you that servers have become unregistered or connections are failing but in most cases, it won't be able to tell you why. ExtraHop can.
This isn't just for Citrix admins either. Right after setting this up, we could see a bunch of AD errors that appeared to be related a user who left the company but had left themselves logged into a machine. A process was running in the user's context but failing as it couldn't authenticate. That would have had minimal impact but that along with accounts getting locked and other general account related issues can accumulate. ExtraHop enabled us to do some really crucial house cleaning!
With VMware Horizon and it's use of Active Directory Lightweight Services, it is very sensitive to AD replication problems. ExtraHop can also help to figure those problems out too. While we setup the physical appliances, ExtraHop also have a virtual appliance which can be run within the hypervisor if you would like to just get data for within your virtual platform.
Something which may seem small but to me is actually one of the most useful features is the fact you can get historical packet captures. How many times have you had to troubleshoot a Citrix launch issue and had to setup Wireshark, do a capture and possibly involve your network team? On a virtual desktop in particular with slowness issues, trying to install Wireshark can compound the problem and take a long time to just get to the point of being able to perform the capture. With ExtraHop, I don't need to hop on a remote session with a user, setup a tool and perform a capture. I can just search their machine in the search bar and within a specified time window, export a capture.
ExtraHop provides wire analytics over layer 2 to layer 7. Above is a screenshot showing, at a glance various different protocols at layer 7 both in and out. You can see I have quite a lot of CIFS transactions.
I can go into CIFS even further. Above you can see CIFS errors on a given XenApp session host. ExtraHop can show errors, even at the file level making it easy to remediate.
Something which wasn't even on our radar in terms of reasons why we were looking at ExtraHop is it's security benefits. Above, you can see a screenshot of a ransomware dashboard. It detected quite a few events. Don't panic, it doesn't mean you have ransomware in your environment, it is simply detecting a pattern similar to ransomware. This data can be used by your InfoSec team to investigate. In the case above, this was my SCCM distribution point downloading updates which did not have a file extension. A server downloading thousands of files without a known extension can be reason for alarm.
Remember WannCry!? Surrrrrrrrrreeee, you do! I reached out to an ExtraHop engineer while in a meeting with our different IT departments on the Friday that will now live in IT infamy! I had a simple question, I can see our SMB traffic already, is there a way for me to filter it to show only SMB v1 traffic. An hour later, we had our dashboard. We feverishly disabled SMB v1 across the organization safe in the knowledge that we could look at our historical data to see what might be effected by disabling this. We could weigh up the risks before taking action.
I know every vendor hopped on the Ransomware bandwagon during the attacks but I can honestly say ExtraHop was an awesome asset. Not only could we see what in our environment used SMB v1, they also provided an updated Ransomware bundle, in which they put in specific logic for the WannaCry patterns. You can see more on that HERE. We setup alerts, so the very first instance it was detected on a machine we would know and we could take action. Luckily, it didn't come to that!
I hope the take away from this is that ExtraHop gathers and analyzes so much data that it's a tool which can be used by all IT department within an organization. It doesn't necessarily have to replace your existing teams monitoring and troubleshooting tools but it can certainly provide additional benefits that traditional monitoring tools cannot.