
Don't let the elephants trample you...
If you have data centers on opposite sides of the country or on the other side of the world chances are you have experienced the effects of an elephant. I know your probably thinking “A what?”. The elephant is a Long Fat Network or LFN (pronounced elephant) for short. The phenomenon is caused by the way TCP works and it can really slow down your work.
TCP is a good protocol and has some very useful features which help reduce network congestion and automatically detect dropped packets. This is where the problem comes in. In order to do those things, TCP relies on acknowledgements. Normally in a lan this isn’t a big deal because the latency is low. However in a wan we start to see an increase in latency so it takes much longer for these acknowledgements to get back to the sender.
A lot of people misunderstand the difference between bandwidth and throughput. You may have a large pipe of 1Gbps or even 10Gbps connecting those faraway data centers but you can still experience slow transfers because of the high latency. In essence your actual throughput is nowhere near the actual bandwidth. As an example if you are trying to copy files between a data center in Dallas, TX and London, England you will experience about 125ms of delay. Because of this delay it doesn’t matter if you have a 10Mbps pipe or 1Gbps pipe. You are still limited to throughput of about 4.19Mbps for each flow. This means there is a lot of extra bandwidth setting there unused and your transfers are going to take much longer than they need to.
So now that you know what an elephant is your probably wondering why this matters to you. At work we have data centers around the world and I am always having to design and configure things like log shipping, mirroring, and replication across data centers. Even for small databases it can take a long time to copy the backup to initialize the secondary servers in these configs. Luckily there are a few things we can do to drastically reduce the time it takes to copy the backups over.
The first thing that probably came to mind is compression. If you are using SQL 2008 Enterprise, SQL 2008R2 Standard or SQL 2008 R2 Enterprise you can use the built in compression for your backups. Of course there are also third party backup tools or you can even compress the regular backup files using something like winzip or winrar.
The other method I like to use in combination with the compression is to split the backups into multiple files. In TSQL you can create the backup to multiple files like so.
BACKUP DATABASE [AdventureWorks] TO DISK='c:\backups\AdvetureWorks-part1.bak', TO DISK='c:\backups\AdventureWorks-part2.bak';
So now that I have multiple files I can copy both of them at the same time and double my effective throughput. Instead of transferring at 4.19Mbps I mentioned before I can now get 8.38Mbps throughput and the file effectively copies in half the time. If I have the available bandwidth I might even split the file into 4 pieces for even faster throughput. From my experience copying between two and four files at a time gives me the best throughput. Anything over four files did not seem to improve things much.

Network Engineers can be your friends!
Make sure you test to find out the optimal number of files for your situation. Also it is a pretty good idea to get to know your network engineers to find out how much bandwidth you have. You want to make sure that you are playing nice with everyone else and not saturating the network. If you are interested in calculating the throughput for yourself, check out this handy calculator from Silver Peak. By working with your network engineers and leveraging a little info about how TCP works you can save yourself a lot of time when transferring files across great distances.
If anyone out there has their own tips I would love to hear them!

Hi Jeremy,
Nice blog, found you via sqlserverpedia syndication.
Can you expand on the limit of ’4.19Mbps for each flow’?
What is a ‘flow’ and what defines this limit?
thanks
rich
Rich,
Thank you very much! A flow is just a single network session. For example if you go to My Computer and you select multiple files to copy using Ctrl+C/Ctrl+V, the files are copied one at a time. The copy is single threaded. Therefore there would only be one network flow or session at a time. On the other hand if you use a tool like RichCopy which is multi-threaded, you can copy multiple files at the same time ginving you multiple flows.
The limit is due to the nature of tcp. TCP requires acknowledgments that the packets arrived safely on the other side. On a fat network pipe covering long distances, the sending server actually gets slowed down because it is waiting for those acknowledgements before it can send more data. For more info, along with some improvements made in Server 2008, you should check out this Microsoft Technet article.
The 4.19Mbps comes from the Silverpeak calculator I linked to above. It gives some sample data to use, you can use ping, or talk to your network guys about your network characteristics. For my example I used 10Mbps wan bandwidth, 125ms wan latency, and 0% packet loss. The interesting thing here is that whether you enter 10Mbps, 1000Mbps, or even 10Gbps each flow (or file) is still limited to 4.19Mbps.