In the newsgroups I see a lot of questions about how to back up and recover SharePoint. I decided it was time to put some of my ideas out here. This blog entry will cover options that come out of the box with SharePoint, with the exception of SharePoint Designer. It will talk about the different needs that DR addresses and it will show you how to use different features to meet those needs. For the most part these features work best with small to medium sized environments, but the information will be good for enterprise admins as well.
When we start talking about disaster recovery we need to decide on just what kind of disaster we are talking about. In this paper we will cover two types of disasters; content deletion and catastrophic failure. You will want to take different measures depending on the type of recovery you want to protect your environment from. In this blog we will address catastrophic failure first, then content recovery.
If you want to protect your SharePoint farm against catastrophic disaster you have two aspects of SharePoint to keep in mind; configuration and content. In the 2007 versions of SharePoint Microsoft has added a facility for making backups with the intent of recovering from catastrophic failures. This can be found in the Central Administration web site on the Operations page under the "Backup and Restore" heading. If you click "Perform a Backup" you can see that quite a bit is covered. Since this is at the Farm level it includes any web applications you have as well as your Config and Search databases. Any of your SharePoint environment that exists in SQL is covered here, as well as some information from your WFEs. When you choose to do a backup here you are asked where you would like the backup to be saved. It is important to know that this backup process runs in two distinct steps. The first runs on the WFE that Central Admin is running on and the second runs on the SQL server. For your backup to be successful both processes, and the users they are running as must have access to the directory where you point Central Admin. That is why the example is a UNC path instead of a local path. The WFE portion of the backup process runs as the Central Admin app pool id. The SQL portion runs under the context of the account that the SQL services are running as. Both accounts must have write access to the directory for your backups to be successful.
After you have created a successful backup you can walk through the restore process. While I encourage you to practice any recovery processes you have in place be careful when walking through this, you will overwrite any content that is in place. It is a good idea to go through the restore process at least once to get familiar with it and then periodically to make sure your backups are working correctly.
If you're like me you're curious about how things work. After I ran my first backup I immediately jumped into the directory to see what was there. I found that each backup run is put into its own directory. Each of those directories has a SPBACKUP.XML file that is the table of contents for the backup. You can look through there and see which elements are being backup up. Most interesting are the entries for the objects in the "Microsoft.SharePoint.Administration.SPContentDatabase" class. These represent your Content Databases, as you may have guessed. If you continue looking at the properties for the object you will come across one that is the database's name and SQL instance. The value for that parameter is the name of the file in the backup set. Since this part of the backup is essentially just an SQL dump, you can take this file and restore it into SQL with SQL Management Studio if you would like. This makes it easy for you to restore the databases to different environments, or as difference names.
The built in backup procedure has two big shortcomings, as I see it. First, it does not get everything you probably want if you need to rebuild your environment, namely all your Configuration information. The Content is covered when your content databases are backed up. Configuration covers a wide variety of information and it is spread out so it is easy to lose track of bits of it. For instance if you have added an icon for PDF files this is part of your configuration. When you do this you copy a GIF file to the Templates\Images directory of the 12 Hive and add an entry for that image in your DOCICON.XML file. A Central Admin backup will not get this change. For this reason I recommend a few supplemental processes to round out your backup. First, use IISBACK.VBS to make a backup of your IIS settings which are stored in two files; Metabase.xml and MBSchema.xml. Use a command similar to this:
iisback.vbs /backup /b SharePointBackup
This creates a backup of your Metabase and MBSchema files and saves them in %systemroot%\system32\inetsrv\MetaBack. You will also want to backup the contents of this directory as well as a couple more. The directory C:\Program Files\Common Files\Microsoft Shared\web server extensions\12 is known as the '12 Hive'. It is the directory that SharePoint is installed in and where most of your SharePoint specific changes live. You will also want to add the Inetpub directory, normally at C:\InetPub. To package them all neatly I use a command line compression tool like 7-Zip to zip them all up nicely.
I mentioned the included backup had two shortcomings. The second is that since it is web based, it cannot be scripted to run regularly. Fortunately this is easily remedied with STSADM. STSADM can create backups that are compatible with Central Admin backups. Use backup operation but instead of giving it a URL and filename, give it your backup directory name and whether you want a full or differential backup. The command would look like this:
stsadm –o backup –directory \\server\SPBackups -backupmethod full
If you point STSADM at the same directory you point Central Admin at the backups will seamlessly be integrated. One can restore what the other backs up. Since STSADM is a command line utility it is a perfect candidate for automation. You can create a simple script file to run your farm level backups and schedule it to run every night or however often you want.
These methods work well for restoring from catastrophes and they also work well for moving content or settings between test environments and production environments, or vice versa.
The previous methods work great if you want protection against a hardware failure. What if you just want to have some protection against your users (or your admins!) deleting content? The methods we discussed above would work, but they might be more than you need. In this section I will cover some ways to recover content.
Your first weapon against content deletion is the Recycle Bin. New in SharePoint 2007 this gives you two layers of protection as it is two stage. The Recycle Bin is on by default and is configurable in Central Admin. Recovering items from the first stage can be done by regular site members. After they expire from there they can be recovered from the second stage by a Site Collection Administrator.
Not everything is captured by the Recycle Bin, unfortunately. When folders are deleted they do not pass 'Go', they immediately just go away. Webs and Sites are the same way. You need some way to protect against that, or you may have chosen not to enable the Recycle Bin.
SharePoint Designer (SPD) is the next version of FrontPage, and as the name suggests it is very SharePoint friendly. It have a lot of great SharePoint functionality, but this article will only its ability to backup and restore content. The beauty of this approach is that end users can take advantage of it, as a site can be backed up by farm admins as well as site members. This is a great option if you have adventurous users that like to push SharPoint to its limits. They can make a backup of their site before they make their changes. Open the site or web in SPD then go to Site in the Menu Bar. Choose Administration and finally "Backup Web Site." This will create a single CMP file backup of your site. This file can be restored back to its original location, a different location in the same site, or a different farm entirely.
Like the Central Admin backups, if you're curious you can break these backups into their elemental parts. The CMP file is just a CAB file. If you rename it as .CAB you can open it up and see inside. The MANIFEST.XML file is your roadmap to the contents. The object type "SPFile" is where you will find individual files. With this knowledge you can pull files out of your backup without having to restore it back to SharePoint, if you do not want to. Just find the file in the Manifest.XML file and copy out the file from the archive in the FileValue attribute. Rename to your original document and you have your file back.
If the Recycle Bin is not what you are looking for, and SPD does not get you where you want to go, then STSADM comes to your rescue. STSADM is a SharePoint admin's best friend. While STSADM has over 180 operations I will only cover four in this article; backup, restore, export and import. Import and Export replace the functionality that SMIGRATE had in the 2003 versions of SharePoint. Import and Export deal with subwebs. You can use them much like SPD to take snapshots of webs and restore them later if you want to. If you want to protect against accidental Site Collection deletion, use the 'backup' and 'restore' operations of STSADM. These work at the Site Collection level and will create a single file, full fidelity backup. As with any STSADM operation, if you have questions on usage, type 'stsadm –help' followed by the operation name. STSADM will give you usage for the command and maybe an example or two.
If you need to restore a single document or folder you can restore the STSADM backup of the site collection to a different URL or a different farm and retrieve the document that way. If you do restore a second instance of a site collection it cannot be in the same Content Database as the original. STSADM preserves many list GUIDs and they cannot exist more than once in a Content DB. If you try to do this, STSADM will report back that there are no databases available. If you have a recovery or test environment, you can restore your backup there as well. Just make sure both farms are running the exact same version of SharePoint, right down to the patches. You will also need to have all the same software and web parts installed.
For small to medium sized environments, it is tough to beat STSADM site collection backups. You still need to schedule the task, and you need to keep a running list of the Site Collections you need to back up. The second part is easy to address. In his book SharePoint 2007 Unleashed Michael Noel has a chapter on Disaster Recovery. In that chapter he has a script that can be scheduled to run that takes the output of "stsadm –o enumsites" and creates a backup for each site collection listed. If that were not enough, the script will email you when it is finished. He was gracious enough to let me share that script with you all here. Go ahead and download it and take it for a spin. You will need to remove the TXT extension to get CSCRIPT to execute it properly. I think you will really like it. If you do, I encourage you to consider picking up his book. That script is just one example of the vast amount of SharePoint information that is in there.
There are many disasters that can befall a SharePoint environment. Servers can burst into flames, or the CEO's administrative assistant can delete a folder full of important documents. Regardless of how you define disaster there are ways to protect yourself built in to SharePoint. I hope this blog entry has given you some ideas on how to protect your environment.
Let me know what you think. Leave me a comment below.
tk