rsync's internal "virtual file system"

Ben Escoto bescoto at stanford.edu
Sun Nov 24 07:20:00 EST 2002


>>>>> "JWS" == jw schultz <jw at pegasys.ws>
>>>>> wrote the following on Sat, 23 Nov 2002 15:42:17 -0800

  JWS> I suspect that XML might be excessive but not by much if simple
  JWS> storage is all that is needed.  The issues would be what to
  JWS> name the file(s) -- it would have to have some sort of magic
  JWS> name.  Do we create one extension file per real file (as
  JWS> needed), one per directory, or one for the whole tree.  If one
  JWS> per tree, how do you deal with subtree transfers?  If the
  JWS> extension file is for many files you may want a way to
  JWS> accelerate access which might be difficult to deal with in XML.
  JWS> Finally, you will need to deal with blobs, the necessitated
  JWS> coding of which may limit the value of text files.

Yes, with a text or XML file that contained data on many files, the
easiest way to get information about a particular file would be to
read the whole the file from the beginning.  However, a text or XML
file would compress very well, so reading the (compressed) data off a
disk would be fairly quick, and once in memory a text or XML file can
probably be scanned at hundreds of MB/sec.  Since rsync processes
files sequentially, once the appropriate spot in the file was found,
the necessary data could be read off without much inefficiency.

    Also, depending on the format of the text file, since the records
would be listed in order, it would be possible with seeking to find a
a record in log(n) time.  But then the whole file couldn't be gzipped
compressed in the straightforward way, I believe.

    So my impression is that a compressed (gzipped) text/XML file
format would be serviceable.  In practice it seems there would be a
few seconds of wasted scanning time for doing small (e.g. single file
in middle of tree) accesses, but this would be acceptable in most
cases.  Having a more complicated format could make arbitrary accesses
faster, but what would this format be, and would it allow for the data
be compressed?


-- 
Ben Escoto
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 226 bytes
Desc: not available
Url : http://lists.samba.org/archive/rsync/attachments/20021124/3e6734ae/attachment.bin


More information about the rsync mailing list