Storage compression patch for Rsync (unfinished)

jw schultz jw at pegasys.ws
Wed Jan 15 11:24:00 EST 2003


On Wed, Jan 15, 2003 at 11:50:27AM +0100, Harald Fielker wrote:
> Hi,
> 
> i am using Rsync for making backups of a MySQL database. The MySQL files can 
> be compressed about 1:10 and i want to make use of this fact.
> 
> Rsync currently doesn't support saving files in a compressed state. I 
> personally think this should be a feature for the filesystem (in the sense of 
> "synchronised files") but currently there is no such filesystem for Linux 
> available.

e2compr is not dead.  See http://www.alizt.com/

> Here my idea:
> 
> We will have two new options:
> 
> -X : this will specify a compress programm (e.g. gzip, bzip...) - the default 
> compressor is "gzip"
> -Z : this will activate storage file compression.

Why two options?  Just specify the compressor and that
enables compression.

> If "-Z" is enabled. every name (files, directories, links, ...) get's an 
> extension called ".rsc". 

And .rsc stands for what, rsync?  Even windows has overcome
the three letter extension limit.

> If we have a true file, there is a header section and a data section. The 
> header section will store the followin attributes:
> 
> - magic number
> - unpacked size
> - packed size
> - compress programm (e.g. gzip, bzip2,  ...)
> - magic number

So you add yet another compressed file format.  There's
something the world is crying out for.

> After the header section we will have the compressed file using the programm 
> the user gave us with "-X"
> 
> Every action in rsync will work - we will some exceptions:
> 
> 1) Every file objects has the extension .rsc. 
> 2) Doing simple checks (size, etc.) on files. the filesize needs evaluation 
> for the .rsc header.
> 3) The local file needs to be decompressed when it is accessed for reading.
> 4) The local file needs to be compressed after it was modified or created. A 
> header section needs to be added.
> 5) The file stats (atime/ctime/mtime) will be applied to the .rsc file. In 
> normal way.
> 
> Problems/ideas:
> 
> 1) On Unix this will allow us only files with names 255 - strlen(".rsc") ... 
> but this might be a very very rare case we will disable compression for this 
> single file.

Rsync already has issues with tempfile names.  This is
shorter.

> 2) Rsync will need a new option for decompressing and stating the .rsc file 
> tree. (single file, recursive)
> 
> We should also offer options for validating .rsc files and converting a tree 
> to a .rsc filetree.
> 
> I am sending some compressor patches. I am very new to the rsync source, so 
> here a list of what i did:
> 
> options.c
> - added -X and -Z options (-Z is passed thru a server wenn using 
> user at host.foo:/directory) 
> 
> flist.c:
> extension ".rsc" is added to every file/directory (in -Z mode)
> 
> rsync.c:
> finish_transfer() now does the compression when in -Z mode before stating the 
> file. That means the compressed file has the same stat as the uncompressed 
> file.
> 
> receiver.c:
> I added two new functions: 
> - storage_decompress: this will decompress an .rsc file to a tmp file, e.g. 
> for calculating sums (note: a delete function is missing!)
> 
> - storage_decompress_update_stats: this will update a given stat structure 
> with the decompressed filesize of the rsc file.
> 
> 
> Currently transfering new files and compressing works. But the receiver 
> doesn't make use of the stats that storage_decompress_update_stats. I don't 
> know if i am calling it at the right place. I also don't know if the sum is 
> allways calculated for a file. If this is the case we need to store the md4 
> sum in the .rsc header.

While the idea of rsyncing with compression is mildly
attractive i can't say i care for the new compression
format.  It would be better just to use the standard gzip or
other format.  If you are going to create a new file type
you could at least discuss storing the blocksums in it so
that the receiver wouldn't have to generate them.

Finally, i didn't even look at your patch because it was not
text/plain.  Unless absolutly necessary patches should be
either inline or text/plain attachments.


-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw at pegasys.ws

		Remember Cernan and Schmitt



More information about the rsync mailing list