Oct 4

I’ve been researching Tivoli Storage Manager a bit lately and have some good information to report. VMware API for Data Protection, or VADP for short, is VMware’s replacement for VCB. In some ways this is completely new and in others it is going to be very familiar.

You have to still use a proxy both ways. For full images you use a hardware proxy and vcb. For file level you can use a Windows Virtual Machine with the 6.2.1 client to backup other Windows VM’s. The last row in this table is incredibly vague in all honesty. http://www-01.ibm.com/support/docview.wss?uid=swg21426059.

If you are only interested in file level backups of your vm’s, you will likely be happy with TSM’s integration to date with VADP. You’ll want the 6.2.1 windows client loaded on a windows VM. I used Windows 2008 R2. In this proxy machine you have to do some configurations. You cannot use the GUI to do the actual backup. The backup has to be ran just like VCB “dsmc backup vm”. At this point in vCenter you should see a snapshot made for the machine you are backing up. You also will soon see the proxy machine reconfigured with that machines vmdk mounted. To see some of the configs in the GUI open the 6.2.1 GUI (i’m using windows). Click on Edit, client preferences, vm backup. You will see on the right where you point it to virtual center. List vm’s to backup and select the style. Once this is done you can operate this new proxy vm just as you did your VCB box. Some nice features are backup all windows vm’s and then the ability to do a minus vm “-vm ” to exclude some vm’s.

Some missing features are things like auto detecting new VM’s and backing them up. The “All-Windows” selection in TSM will catch the new VM’s but it will fail to back them up because they do not have a node in the TSM server. Also your proxy node has not been granted access to back them up in TSM server. I am playing with some ways to script this but hopefully this is already on Tivoli’s roadmap.

Now if you want image level backups you are in for a surprise. TSM does not currently support VADP for image level backups. I currently do not do image level backups for production (that’s what replication is for I always said). I am now thinking of doing them simply to have another method to protect data.

If you have any issues please feel free to post a comment. I would love to hear how others are using TSM and VADP.


Mar 28

VMware Consolidated Backup is going to continue to mature into a great solution for backing up VM’s. Some of the early adopters may have experienced problems with centralized storage. If you have more than one path to your centralized storage it has been recommended in VMWare’s documentation that you disable all inactive paths. This to me does not seem acceptable as part of the reason you paid for your highly available storage was for it to be “HIGHLY AVAILABLE”. This was a big issue for me.

VMWare VCB 1.1.0 apparently works fine with PowerPath. I have configured this and I am currently backing up about 20 guests. I am using Windows 2003 SP2 and PowerPath 5.1.0. The version of VCB I am using is exactly VMware-vcb-64559. I am also using Tivoli Storage Manager 5.5.0.4

I am unsure if VCB and PowerPath are officially supported yet. I am unable to find any documentation on VMWare’s site that says it is supported. I am also unable to find anything in release notes or the forums. It was recommended to me by vmware support to see if it resolved a disk access issue we seemed to randomly be having. I’m happy to say that after installing PowerPath I am able to backup all of my vm’s I have in vmlist so far. I would try it in a lab environment for a bit. I’d be interested to hear if it works for anyone else.


Mar 14

Every morning I pick up my blackberry and look at an email from TSM Operational Reporting. If you are responsible for a Tivoli Storage Manager install you should be looking at a report from TSM Operational Reporting at sometime during your day as well. TSMOR has two key reports that can be emailed to you. The daily and hourly reports work in tandem to give you a wide range of information when it is needed and focused information if there is ever a problem. In addition TSMOR has the ability to store a current as well as previous reports, using html formatting, in a directory. Until TSMOR became available there was hardly any way to easily see the health of your TSM environment.

TSM Operational Reporting’s daily report is about as complete of a report you can get. Beginning with a general summary of the target TSM server some of the items monitored are shown below.

These are simple counts or numbers:

    Administrative Schedules – Success, Error, Fail, Missed
    Client Schedules – No error, Skipped files, warnings, Errors, Failed, Missed
    Total GB – Backed up, Restored, Archived, Retrieved
    Database and Log utilization
    DB Cache Hit Ratio
    Diskpool Utilization
    Scratch and Unavailable Volumes

Then it gives you some detailed information of the Administrative Schedules and Client Schedules. Detail such as what missed or failed. At this point I have a solid idea of the health and success of the previous backup cycle. From here I can move on to troubleshooting problems if needed or can safely move on to other tasks if everything was successful. While scrolling down I pass up some awesome looking but often not useful for me graphs of different load summaries. I’ll list them to see if any catch your attention.

    Session Load Summary
    Tape Mount Load Summary
    Migration Load Summary
    Reclamation Load Summary
    Database Backup Load Summary
    Storage Pool backup Load Summary
    Expiration Load Summary

These graphs are useful for more of an 50,000 ft overview of the loads your server is under through out the day. If I want a bit more detail on the clients such as Bytes Transfered or Node versions I can simply scroll down. The Node Activity Summary is one section I watch frequently. It give a list of Nodes and their version of TSM BAClient. This is currently very useful in my environment as I am currently phasing out 5.3.x.x and moving to 5.5.0.4. Support for 5.3.x.x is not going to be available after April 30th, 2008.

The next section I have recently had to turn off. Activity Log Details is the out put of your activity log for the past 24 hours. I have a few HSM for Windows clients and these have been generating a lot of information in the activity log. So much information in fact that it actually made my daily report fail. After disabling this section in the daily report, it runs just fine.

The Missed File Summary is useful at times. An example would be a new agent on all of your machines that has a file that is being skipped on all of the machines. You will see the number of occurrences of skipped files and the name of those files. The next section Missed File Details is what I actually use to troubleshoot missed files. It gives you two key pieces of information. The node name and the unc path to the file. The third piece of information is the time at which it was skipped. This time can be useful if you know some other job doesn’t finish or start until a certain time to release the file. The first two pieces of information should give you enough information to know if you need to edit your include/excludes.

The Session Summary section is awesome. But mostly used for bragging. If you are a backup administrator the only thing that comes close to being able to say you can and have restored anything is how fast you can do it. This section will list for each node:

    Objects Inspected, Backed Up, Updated, Rebound, Failed
    Bytes Moved
    Elapsed Time
    Aggregated Rate KB/Sec
    Percent Compressed

If you are like me knowing how many objects and how fast they moved and the total size of that data is a very good number to know. For instance for nodes with lots of objects it may be worth it to have the tivoli journal engine running. Slow aggregated Rate and high bytes moved can sometimes reveal network bottlenecks. Session Summary is available for both backup sessions and archive sessions as well as restores and retrieves. The last section is Timing Information which is how long it took in each section to gather the data.

I hope this review and summary has informed you a bit. There are many other features which I did not cover but may do so at a later time. In case you would like to research them on your own I will point you in the right direction. You can create multiple daily and hourly reports. You can create your own custom select statements to pull data you need for your environment. You also have the ability to change any the parameters that cause the hourly report to notify you or show errors. One I change is number of scratch tapes required to be health from 5 to 3. I also have the hourly report only email me if there is a problem, such as out of scratch or a log filling up. You may also want to look at per node notifications which would be very handy in larger IT organizations where backups are done on servers you do not care about but some one else does.


« Previous Entries