Don't let the elephants trample you...
If you have data centers on opposite sides of the country or on the other side of the world chances are you have experienced the effects of an elephant. I know your probably thinking “A what?”. The elephant is a Long Fat Network or LFN (pronounced elephant) for short. The phenomenon is caused by the way TCP works and it can really slow down your work.
TCP is a good protocol and has some very useful features which help reduce network congestion and automatically detect dropped packets. This is where the problem comes in. In order to do those things, TCP relies on acknowledgements. Normally in a lan this isn’t a big deal because the latency is low. However in a wan we start to see an increase in latency so it takes much longer for these acknowledgements to get back to the sender.
A lot of people misunderstand the difference between bandwidth and throughput. You may have a large pipe of 1Gbps or even 10Gbps connecting those faraway data centers but you can still experience slow transfers because of the high latency. In essence your actual throughput is nowhere near the actual bandwidth. As an example if you are trying to copy files between a data center in Dallas, TX and London, England you will experience about 125ms of delay. Because of this delay it doesn’t matter if you have a 10Mbps pipe or 1Gbps pipe. You are still limited to throughput of about 4.19Mbps for each flow. This means there is a lot of extra bandwidth setting there unused and your transfers are going to take much longer than they need to.
So now that you know what an elephant is your probably wondering why this matters to you. At work we have data centers around the world and I am always having to design and configure things like log shipping, mirroring, and replication across data centers. Even for small databases it can take a long time to copy the backup to initialize the secondary servers in these configs. Luckily there are a few things we can do to drastically reduce the time it takes to copy the backups over.
The first thing that probably came to mind is compression. If you are using SQL 2008 Enterprise, SQL 2008R2 Standard or SQL 2008 R2 Enterprise you can use the built in compression for your backups. Of course there are also third party backup tools or you can even compress the regular backup files using something like winzip or winrar.
The other method I like to use in combination with the compression is to split the backups into multiple files. In TSQL you can create the backup to multiple files like so.
BACKUP DATABASE [AdventureWorks]
So now that I have multiple files I can copy both of them at the same time and double my effective throughput. Instead of transferring at 4.19Mbps I mentioned before I can now get 8.38Mbps throughput and the file effectively copies in half the time. If I have the available bandwidth I might even split the file into 4 pieces for even faster throughput. From my experience copying between two and four files at a time gives me the best throughput. Anything over four files did not seem to improve things much.
Network Engineers can be your friends!
Make sure you test to find out the optimal number of files for your situation. Also it is a pretty good idea to get to know your network engineers to find out how much bandwidth you have. You want to make sure that you are playing nice with everyone else and not saturating the network. If you are interested in calculating the throughput for yourself, check out this handy calculator from Silver Peak. By working with your network engineers and leveraging a little info about how TCP works you can save yourself a lot of time when transferring files across great distances.
If anyone out there has their own tips I would love to hear them!