Storage compression patch for Rsync (unfinished)

Harald Fielker fielker at informatik.fh-augsburg.de
Wed Jan 15 10:51:01 EST 2003


Hi,

i am using Rsync for making backups of a MySQL database. The MySQL files can 
be compressed about 1:10 and i want to make use of this fact.

Rsync currently doesn't support saving files in a compressed state. I 
personally think this should be a feature for the filesystem (in the sense of 
"synchronised files") but currently there is no such filesystem for Linux 
available.

Here my idea:

We will have two new options:

-X : this will specify a compress programm (e.g. gzip, bzip...) - the default 
compressor is "gzip"
-Z : this will activate storage file compression.

If "-Z" is enabled. every name (files, directories, links, ...) get's an 
extension called ".rsc". 

If we have a true file, there is a header section and a data section. The 
header section will store the followin attributes:

- magic number
- unpacked size
- packed size
- compress programm (e.g. gzip, bzip2,  ...)
- magic number

After the header section we will have the compressed file using the programm 
the user gave us with "-X"

Every action in rsync will work - we will some exceptions:

1) Every file objects has the extension .rsc. 
2) Doing simple checks (size, etc.) on files. the filesize needs evaluation 
for the .rsc header.
3) The local file needs to be decompressed when it is accessed for reading.
4) The local file needs to be compressed after it was modified or created. A 
header section needs to be added.
5) The file stats (atime/ctime/mtime) will be applied to the .rsc file. In 
normal way.

Problems/ideas:

1) On Unix this will allow us only files with names 255 - strlen(".rsc") ... 
but this might be a very very rare case we will disable compression for this 
single file.

2) Rsync will need a new option for decompressing and stating the .rsc file 
tree. (single file, recursive)

We should also offer options for validating .rsc files and converting a tree 
to a .rsc filetree.

I am sending some compressor patches. I am very new to the rsync source, so 
here a list of what i did:

options.c
- added -X and -Z options (-Z is passed thru a server wenn using 
user at host.foo:/directory) 

flist.c:
extension ".rsc" is added to every file/directory (in -Z mode)

rsync.c:
finish_transfer() now does the compression when in -Z mode before stating the 
file. That means the compressed file has the same stat as the uncompressed 
file.

receiver.c:
I added two new functions: 
- storage_decompress: this will decompress an .rsc file to a tmp file, e.g. 
for calculating sums (note: a delete function is missing!)

- storage_decompress_update_stats: this will update a given stat structure 
with the decompressed filesize of the rsc file.


Currently transfering new files and compressing works. But the receiver 
doesn't make use of the stats that storage_decompress_update_stats. I don't 
know if i am calling it at the right place. I also don't know if the sum is 
allways calculated for a file. If this is the case we need to store the md4 
sum in the .rsc header.




-- 
Bye,
Harald
Email:             fielker at informatik.fh-augsburg.de ICQ: #15582696
A cool os:         www.linux.org
PGP Finger-print:  C2 8F 7B 55 7B 9B 8C 7E  48 35 48 21 8A DF 01 66
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rsync-compress-2.5.5.patch.gz
Type: application/x-gzip
Size: 4991 bytes
Desc: not available
Url : http://lists.samba.org/archive/rsync/attachments/20030115/894a8ee3/rsync-compress-2.5.5.patch.bin


More information about the rsync mailing list