Estimating backup usage with dir-merge filter
Paul Dugas
paul at dugasenterprises.com
Fri Oct 7 05:40:32 MDT 2011
On Thu, Oct 6, 2011 at 6:49 PM, Henri Shustak <henri.shustak at gmail.com> wrote:
>>> It sounds like you missed the point of Kevin's message (in the other fork of this thread). The point wasn't to use
>>> `du`, it was that you can run your stats against the backed-up files, not the source. Then you're only running stats
>>> against the results of running the backup using the filters, so you don't need to filter them again.
>>
>> I got that but neglected to respond to the whole group. My mistake.
>> The backups are being performed using BackupPC to a central server
>> where compression and de-duplication is done. While it's true that
>> the actual storage on the backup server being consumed by each user is
>> less because of these, I don't have any problem hiding this from them
>> and instead telling them what their uncompressed and duplicated usage
>> is instead. It has more of an effect that way if you know what I
>> mean.
>>
>>> If that doesn't make sense or isn't possible (backups are on some remote server), then just use your rsync command
>>> with '--list-only', and post-process that list.
>>
>> I've been tinkering with using --verbose and --dry-run then parsing
>> the total size our of the last line of the output and I think I'm
>> close. Curiously, when I don't include the --filter option as a
>> baseline, I'm not getting the same results as "du".
>>
>> $ du -sb . | awk '{print $1}'
>> 508625653
>>
>> $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
>> '{print $4}'
>> 506037893
>>
>> The difference is minimal and probably negligible for this purpose but
>> I'm still curious where it's coming from. Maybe there are some sparse
>> files in there somewhere.
>
> Do you have the same discrepancy if you use the --stats option?
Yes. Using --stats, the last line of the output is the same as is the
earlier "Total file size:" line in the additional output.
Paul
More information about the rsync
mailing list