I recently acquired vCenter Operations enterprise. I have had a PoC done by VMware to ensure it met my requirements of a consolidated view of what is really going on in my converged infrastructure. Although I have some extensive knowledge into vCOps I am very interested in getting an under the covers look at how some of the data is calculated. I am also interested on how best to monitor my operation and what KPI’s I should really be looking at. Then I want to learn how best to interpret some of them. Let’s see what this session has to offer….
1. Lol first slide says it’s not just black magic! I like this session already.
2. With virtualization capacity is now fluid, I agree.
3. Invisible walls, with vm CPU and memory issues may not be resolved by adding more, contention can play a significant role. Proper troubleshooting is required.
4. With vmview you have to monitor end users not VM’s. I agree here as well as a user may move between virtual desktops. End user experience is important.
5. The key thing for VC OPs to do for me is to take the tons of metrics I have and to present the end calculation to me. Am I green or red?
6. Dynamic threshold analysis uses competing algorithms, meaning the system actually uses multiple methods to calculate the trend, then checks to see who is right more often and then uses that method. Genius. These are calculated every night.
7. A version change can cause the normal operation of a system to change. Thresholds may not catch this but trends have a better chance. I liked this idea.
8. Trending noise to determine abnormalities, this is really going to help my environment since we use a number of tools that are all sending emails for every little thing. We use our brains today to get a feel for the data center health. I declared this a broken model earlier in the year.
9. Alerts should be an indication of a real problem, yes yes yes! Do not alert on every threshold that is reached. Yes please. May I subscribe to your newsletter!
10. Root cause determination in this product is really root metric determination. It isn’t telling you what the problem was just what metric that was being monitored was the starting metric of the issue. I.e. We saw disk latency go to 100ms before the app crashed.
11. Workload is demand divided by entitlement.
12. Right-sizing is a concept I always support, but VM admins and I seem to be on the front line of this alone on this.






