I would like to add features to rsync: tags and saving local modifications

Grzegorz Borowiak gborowiak at gmail.com
Fri Aug 21 22:12:41 UTC 2015


Hello!

My name is Grzegorz Borowiak and I am a programmer.
I work for a company which uses rsync internally, to distribute our
continuously changing development environment.
The environment weighs several gigabytes and consists of over 100000
files, most of them binary, so VCS-es like git and subversion are not
an option, but rsync performs very efficiently.

However, I would like to add some features, which we need, and they
are generic enough to be useful for someone else, so I would like to
add them in a way which would allow them to be contributed.

Feature 1: tags

Our environment is large, but modularised, i.e. every file in it
belongs to some module. Not every user needs not every module, so the
download by rsync is parametrised by checking or unchecking the
modules. However, currently this is implemented as filters, which
include or exclude some files by their path or, in some cases, by
substrings in file names.

To make modularisation more straightforward, and not limited by
necessity of differentiation between files by path or name, I propose
to introduce concept of tags. Every file could be tagged with some
string as an xattr (for example, user.rsync.tag=TAG), and in
downloading rsync invocation you could specify a parameter
--tag=TAG. This option could be specified more than once. rsync,
once invoked in such way, would affect:
- all files without tag at all
- all files which match any of specified tags

Other approach would be to use multiple tags for each file. This would
be achieved by setting or unsetting xattrs like user.rsync.tag.TAG. If
a file is tagged by tags "a" and "b", it has xattrs user.rsync.tag.a
and user.rsync.tag.b. This would allow to divide more finely and be
able to use logical expressions, like --tag-expr='a || (b && !c)'
would specify all files with have tag "a" or have tag "b" but not "c".

rsync already uses xattrs for storing metadata in fake super mode, so
it seems a natural way to implement tags.

In both approaches, the filtering could be integrated with filter
rules. If a modifier "t" were appended after "+", "-", "H", "S", "P"
or "R", it would treat the following expression not as a path matching
pattern, but rather as a tag or logical combination of tags. For
example, the following rule:

"+t base" would include all files with tag "base"
"Ht gui" would hide all files with tag "gui"
"Ht a && !b" would hide all files tagged with "a" but not "b"

Feature 2: saving local modifications

Our users frequently do some local modifications. They always get lost
when they rsync with newer version.

I would like to make it possible to detect these modifications and
backup that file. There is already --backup option, but this is
insufficient, as it saves too many files -- also those which were not
locally modified.

To solve this problem, I would like to use xattr again and introduce
the user.rsync.md5sum, which would store the md5sum of that file; when
a file is going to be overwritten or deleted by rsync, it first
calculates md5sum for it and if it differs from what is in xattr, the
file is saved to backup. If a file has no md5sum xattr at all, it is
also saved to backup, as this was for sure created locally.

Another, quicker and less demanding, but imperfect method would be to
create a special file after each downloading rsync, which would serve
as a timestamp, and treat all files with newer mtime as locally
modified.


And here go my questions:
- is any of above features already implemented in some form,
  or is being implemented now (in-progress)?
- for feature 1, which solution would you prefer: single or multiple
  tagging?
- for feature 1, is this a good idea to extend filter rules to handle
  tags, or it is better to stay with standalone arguments?
- for feature 2, which solution would you prefer: md5sum, timestamp,
  or both (they can be implemented both)
- 'fake super' uses user.rsync.%stat xattr; is the percent sign a part
  of some convention, which my xattrs should also follow?
- did I miss something?
- do you have other ideas how to provide these features?
- what are the coding guidelines for rsync development?



More information about the rsync mailing list