(This is a guest post by Antoni Sawicki aka Tenox)
I often make copies of large data archives, typically many TB in size. I found that rsync
transfer speed slows down over time, typically after a few GB, especially when copying large files. Eventually reaching crawl speeds of just few KB/s. The internet is littered with people asking the same question or why rsync
is slow in general. There really isn’t a good answer out there, so I hope this may help.
After doing some quick profiling I found out that the main culprit was rsync's
advanced delta transfer algorithm. The algorithm is super awesome for incremental updates as it will only transfer changed parts of a file instead of the whole thing. However when performing initial copy it’s not only unnecessary but gets in the way and the CPU is spinning calculating CRC on chunks that never could have changed. As such…
Initial rsync copies should be performed with -W
option, for example:
$ rsync -avPW <src> <dst>
The -W
or --whole-file
option instructs rsync
to perform full file copies and do not use delta transfer algorithm. In result there is no CRC calculation involved and maximum transfer speeds can be easily achieved.
Long term, rsync could be patched to do a full file transfer if the file doesn’t exist in destination.
Also while copying jumbo archives of many TB I don’t want to see every individual file being copied. Instead I want a percentage of the total archive size and current transfer speed in MB/s. After some experiments I arrived at this weird combo:
$ rsync -aW --no-i-r --info=progress2 --info=name0 <src> <dst>
Outstanding–thanks so much for this solution!
Thanks.
Thank you so much, but why do you use this flag?
–no-i-r
Didn’t do anything in my case. rsync doesn’t pass 30MB/s out of a 1Gbit FD connection
Check if you are using -z (use compression) parameter. On fast networks it’s most of the times counterproductive, as compression takes more time than the uncompressed file transfer would need.