(This is a guest post by Antoni Sawicki aka Tenox)
I often find myself replicating and making copies of large data archives, typically many TB in size. I found that
rsync transfers slow down over time, typically after a few hundred MB, especially when copying large files. Eventually reaching crawl speeds of just few KB/s. The internet is littered with people asking the same question or why
rsync is slow in general. There really isn’t a good answer out there so I hope this may help.
I decided to get to the bottom of it. After doing some quick profiling I found out that the main culprit was
rsync's advanced delta transfer algorithm. The algorithm is super awesome for incremental updates as it will only transfer changed parts of a file instead of the whole thing. However when performing initial copy it’s not only unnecessary but gets in the way and the CPU is spinning calculating CRC on chunks that never could have changed. As such…
Initial rsync copies should be performed with
-W option, for example:
$ rsync -avPW src dst
--whole-file option instructs
rsync to perform full file copies and do not use delta transfer algorithm. In result there is no CRC calculation involved and maximum transfer speeds can be easily achieved.
Long term, rsync could be patched to do a full file transfer if the file doesn’t exist in destination.
While copying jumbo archives of many TB I don’t want to see every individual file being copied. Instead I want a percentage of the total archive size and current transfer speed in MB/s. After some experiments I arrived at this weird combo:
$ rsync -aW --no-i-r --info=progress2 --info=name0 <src> <dst>