directory replication between two servers

Eric Ziegast ziegast at vix.com
Wed Jul 3 11:11:02 EST 2002


> I am two Linux servers with rsync server running on both. Now I am
> replicating directories in both servers with the command rsync -avz ....
> My requirement is, if I made any changs in the first server, say server
> A,   I want to see the changes in the scond server immediately....some
> thing similar to mysql database replication....how can I do that..??

... a vague question.  It depends on the application.

In high-avilability environments it's best to do the replication in the
application so that the application can deal with or work around any
failure conditions.  In the case of a database, database replication
methods work better than depending on the filesystem.  The filesystem does
not know the state of transactions within the database.

Imagine this: Instead of having your client application write to one
filesystem, have it write to two filesystems before saying the write
was completed or committed.  If one system fails, the other is updated
just as well as the failed filesystem (caveat: I'm ignoring race
conditions!).


If you need read-write access on both local and remote servers and have
partitioned data sets (i.e. don't need to depend on block-level locking),
consider having both servers use a dedicated high-availability network
attached storage server (HA solution).  Both can access an NFS server,
or the second server can mount the filesystem from the first server (not
an HA solution).


If you need read-write access on one server and need to replicate data
to a read-only server _and_ if the replicaiton process can be asynchronous,
doing multiple rsyncs can work.

	while true
	do
		rsync -avz source destination
		if [ $? != 0 ]; then
			Get Help
		fi
	done

If you know where your applications are doing writes, you might limit
your replication to the subdirectory or files on which writes are
performed to help speed up the search process.  Note, though, that
rsync-based replicaiton methods are not efficient on the disks or
filesystems, just the network traffic.  Imagine reading _all_ of your
data over and over and over and over again when only a few blocks might
change periodically.


If you need read-write access on one server and need to replicate data
to a read-only server and need synchronous operation (i.e.: the
write must be completed on the remote server before returning to the
local server), then you need operating-system-level or storage-level
replication products.

    Veritas:
	It's not available on Linux yet, but Volume Replicator performs
	block-level incremental copies to keep two OS-level filesystems
	in sync.  $$

	File Replicator is based (interestingly enough) on rsync, and
	runs under a virtual filesystem layer.  It is only as reliable
	as a network-wide NFS mount, though.  (I haven't seen it used
	much on a WAN.)  $$

    Andrew File System (AFS)
	This advanced filesystem has methods for replication
	built in, but have a high learning curve for making them
	work well.  I don't see support for Linux, though. $

    Distributed File System (DFS)
	Works alot like AFS, built for DCE clusters, commercially
	supported (for Linux too)  $$$

    NetApp, Procom (et.al.):
	Several network-attached-storage providers have replication
	methods built into their products.  The remote side is kept
	up to date, but integrity of the remote data depends on the
	application's use of snapshots.  $$$

    EMC, Compaq, Hitachi (et.al.):
	Storage companies have replication methods and best practices
	built into their block-level storage products.   $$$$


Another alternative (cheaper, too) is to just use a database, period.
People who worry about data storage, data integrity, failover, and
replication have put alot of thought into their database products.
If you can modify your application to depend on a database and not
a filesystem, you may be better off in the long run.  Lazy people use
filesystems as their database.  It works just fine up to the point
where you need to worry about real-time replication.

Again, it really depends on the application.

If others know of other replication methods or distributed filesystem
work, feel free to chime in.

--
Eric Ziegast




More information about the rsync mailing list