directory replication between two servers

Mon Jul 8 14:49:10 EST 2002

On Wednesday 03 July 2002 20:10, Eric Ziegast wrote:
> > I am two Linux servers with rsync server running on both. Now I am
> > replicating directories in both servers with the command rsync -avz ....
> > My requirement is, if I made any changs in the first server, say server
> > A,   I want to see the changes in the scond server immediately....some
> > thing similar to mysql database replication....how can I do that..??
>
> ... a vague question.  It depends on the application.
>
> In high-avilability environments it's best to do the replication in the
> application so that the application can deal with or work around any
> failure conditions.  In the case of a database, database replication
> methods work better than depending on the filesystem.  The filesystem does
> not know the state of transactions within the database.
>
> Imagine this: Instead of having your client application write to one
> filesystem, have it write to two filesystems before saying the write
> was completed or committed.  If one system fails, the other is updated
> just as well as the failed filesystem (caveat: I'm ignoring race
> conditions!).
>
>
> If you need read-write access on both local and remote servers and have
> partitioned data sets (i.e. don't need to depend on block-level locking),
> consider having both servers use a dedicated high-availability network
> attached storage server (HA solution).  Both can access an NFS server,
> or the second server can mount the filesystem from the first server (not
> an HA solution).
>
>
> If you need read-write access on one server and need to replicate data
> to a read-only server _and_ if the replicaiton process can be asynchronous,
> doing multiple rsyncs can work.
>
> 	while true
> 	do
> 		rsync -avz source destination
> 		if [ $? != 0 ]; then
> 			Get Help
> 		fi
> 	done
>
> If you know where your applications are doing writes, you might limit
> your replication to the subdirectory or files on which writes are
> performed to help speed up the search process.  Note, though, that
> rsync-based replicaiton methods are not efficient on the disks or
> filesystems, just the network traffic.  Imagine reading _all_ of your
> data over and over and over and over again when only a few blocks might
> change periodically.
>
>
> If you need read-write access on one server and need to replicate data
> to a read-only server and need synchronous operation (i.e.: the
> write must be completed on the remote server before returning to the
> local server), then you need operating-system-level or storage-level
> replication products.
>
>     Veritas:
> 	It's not available on Linux yet, but Volume Replicator performs
> 	block-level incremental copies to keep two OS-level filesystems
> 	in sync.  $$
>
> 	File Replicator is based (interestingly enough) on rsync, and
> 	runs under a virtual filesystem layer.  It is only as reliable
> 	as a network-wide NFS mount, though.  (I haven't seen it used
> 	much on a WAN.)  $$
>
>     Andrew File System (AFS)
> 	This advanced filesystem has methods for replication
> 	built in, but have a high learning curve for making them
> 	work well.  I don't see support for Linux, though. $
>
>     Distributed File System (DFS)
> 	Works alot like AFS, built for DCE clusters, commercially
> 	supported (for Linux too)  $$$
>
>     NetApp, Procom (et.al.):
> 	Several network-attached-storage providers have replication
> 	methods built into their products.  The remote side is kept
> 	up to date, but integrity of the remote data depends on the
> 	application's use of snapshots.  $$$
>
>     EMC, Compaq, Hitachi (et.al.):
> 	Storage companies have replication methods and best practices
> 	built into their block-level storage products.   $$$$
>

If your problem is just the automatic launch of synchronization, you
should take a look at fam and imon (http://oss.sgi.com/projects/).
This 2 extensions for the linux kernel provide a way of triggering
actions on file and inode alterations.

This is a good way of starting a replication/copy if you don't have too
many concurrent writes. But i know that this extension are not the
best security practices (You can do many thing by triggering actions
on file alteration :-) 

>
> Another alternative (cheaper, too) is to just use a database, period.
> People who worry about data storage, data integrity, failover, and
> replication have put alot of thought into their database products.
> If you can modify your application to depend on a database and not
> a filesystem, you may be better off in the long run.  Lazy people use
> filesystems as their database.  It works just fine up to the point
> where you need to worry about real-time replication.
>
> Again, it really depends on the application.
>
> If others know of other replication methods or distributed filesystem
> work, feel free to chime in.