An idea: rsyncfs, an rsync-based real-time replicated filesystem

Wed Apr 13 17:51:50 GMT 2005

On 4/12/05, Lester Hightower wrote:

> The actual replication happens in user-land with rsync as the transport.
> I think rsync will have to be tweaked a little to make this work, but
> given all the features already in rsync I don't think this will be a big
> deal.  I envision an rsync running on Host A like:
> 
> # rsync --constant --from0 --files-from-fifo=/vol/0/chglog.fifo ...
> 
> that will be communicating with an "rsync --constant ..." on the other
> end.  The --constant flag is my way of stating that both rsyncs should
> become daemons and plan to "constantly" exchange syncing information until
> killed -- that is, this is a "constant" rsync, not just one run.

Lester:

Something like this is very high on my list of products I wish I had. 
I frequently use rsync to replicate data on a near real-time basis. 
My biggest pain point here is replicating filesystems with many
(millions) of small files.  The time rsync spends traversing these
directories is immense.

There have been discussions in the past of making an rsync that would
replicate the contents of a raw device directly, saving the time spent
checking each small file:

http://lists.samba.org/archive/rsync/2002-August/003545.html
http://lists.samba.org/archive/rsync/2003-October/007466.html

It seems that the consensus from the list at those times is that rsync
is not the best utility for this since it's designed to transfer many
files rather than just one really big "file" (the contents of the
device.)

Despite the fact that the above discussions are almost 18 months ago,
I have seen no sign of the rsync-a-device utility.  If it exists, this
might be the solution to what you propose-- and it would work on more
than Linux.

To achieve your goal with this proposed utility you would simply do
something like this:

+ for each device
++ make a snapshot if your LVM supports it
++ transfer the diffs to the remote device
+ go back and do it all again

If the appropriate permissions were in place this could be done
entirely in user-mode, which is a great advantage for portability.  As
you touched on in your original message, knowing what's changed since
the last run would be very helpful in reducing the amount of data that
needs to be read on the source side.  In my experience, sequential
reads like this, even on large devices, don't take a huge amount of
time compared with accessing large numbers of files.  If there were
only a few files on a mostly-empty volume the performance difference
would be more substantial.  ;-)

Another thought to eliminate the kernel dependency is to combine the
inode-walk done by the "dump" utility with the rsync algorithm to
reduce the file data transferred.  The inode walk would be
filesystem-specific, but could be done in user space using existing
interfaces.

  -- Steve