Last night I spent two hours talking to a couple of developers from the Site Recovery Manager group. They turned out to be excellent people who listened to my experiences with SRM. I felt empowered as a small voice for the SRM user base. They were already well aware of some of the issues I addressed, others seemed to spark their interest.
IP customization obviously needs work. An enterprise class user interface is needed and it should not look like a spreadsheet. We talked about policy and rule based ip customization. This would work well in my environment since I change the 2nd and 3rd octet. For example in site A you may have the ip address 10.20.30.40. In site B this machine would have its ip changed to 10.40.130.40. The 10.40.30.0/24 network is already in use as a server vlan (vlan 30) there. Policy or rule based, in the inventory mappings possibly, would let me say that I want the destination ip to always have 100 added to the 3rd octet and use 40 for the 2nd octet if the 2nd octet is 20. Then I would never have to do anything else for ip’s. Of course exceptions and additional expressions would be awesome.
Failback is another key point. Currently SRM is a one way product. You set it all up, test your plan to prove success, and you wait for an event that justifies hitting that red run button. Now you are up and running in a new and strange place, but with no easy way back. You have to create your ip customizations and build your protection group and recovery plans. The ability to do all of this ahead of time needs to be there. Your protection group should have a failover and a failback plan section of the recovery plan. Your configurations should be stored in both sites. It would be nice to be able to test the failback as well, however this could get complicated I would think.
Deep recovery plan testing is something you can do now, but it requires considerable development on the customer end. Of course I am talking about proving your recovery plan beyond simply having the virtual machine power up. A virtual machine is useless! It’s what we have running in them that matters most. Exchange, Citrix, DNS, Active Directory, Oracle, or SQL insert any number of applications and services you may use, these are what you are really after. You can write your own scripts to check that these function. Then have exit codes from the scripts that let SRM say if a VM had an error and thus the DR plan. It would be excellent if VMWare provided these scripts for the default installations of major applications like Exchange or SQL. In addition a place for the VMWare user community to post their own scripts. I would really like to see more than just checking for a process to be running, but will take anything. This would also lead us into scheduling DR tests in the middle of the night or weekend. We can sleep while the systems make sure the DR plan is good once a week. Notify of results with an email you check on Monday. I think this is the future of DR tests. If VMWare doesn’t do it someone else will.
Test bubbles are also something I would like to see improved. Currently the test bubble is there just to allow testing to work on a single ESX host. You can create an isolated vlan (OSI layer 2) to allow VM’s to communicate across ESX hosts. However there is no built in solution to allow for routing. We have tackled this at my shop with a linux machine running in the recovery site as a VM. It has 4 virtual nics and I have created 3 isolated vlans for testing my servers. The fourth virtual nic is available if needed to allow internet connectivity to the machines for testing. This allows me to have a 3 subnet test bubble with routing inside of it. VMWare could package up a quick appliance with a web interface that you could configure all of the subnets and it would create virtual nics for you. There are obvious issues that could come up if you allow your test machines to communicate with production or the internet. An example would be an email server brought up in test allowed to connect to internet and delivering email even the it’s production twin already did.
I would like to give a big THANK YOU to the developers standing at the SRM station in the VMWare booth. It was a pleasure speaking to you. I really do appreciate your work on SRM. The product is wonderful and appears to have an excellent future.