Estimating backup usage with dir-merge filter
Henri Shustak
henri.shustak at gmail.com
Thu Oct 6 16:49:15 MDT 2011
>> It sounds like you missed the point of Kevin's message (in the other fork of this thread). The point wasn't to use
>> `du`, it was that you can run your stats against the backed-up files, not the source. Then you're only running stats
>> against the results of running the backup using the filters, so you don't need to filter them again.
>
> I got that but neglected to respond to the whole group. My mistake.
> The backups are being performed using BackupPC to a central server
> where compression and de-duplication is done. While it's true that
> the actual storage on the backup server being consumed by each user is
> less because of these, I don't have any problem hiding this from them
> and instead telling them what their uncompressed and duplicated usage
> is instead. It has more of an effect that way if you know what I
> mean.
>
>> If that doesn't make sense or isn't possible (backups are on some remote server), then just use your rsync command
>> with '--list-only', and post-process that list.
>
> I've been tinkering with using --verbose and --dry-run then parsing
> the total size our of the last line of the output and I think I'm
> close. Curiously, when I don't include the --filter option as a
> baseline, I'm not getting the same results as "du".
>
> $ du -sb . | awk '{print $1}'
> 508625653
>
> $ rsync --dry-run --verbose -a . /tmp/does_not_exist | tail -1 | awk
> '{print $4}'
> 506037893
>
> The difference is minimal and probably negligible for this purpose but
> I'm still curious where it's coming from. Maybe there are some sparse
> files in there somewhere.
Do you have the same discrepancy if you use the --stats option?
------------------------------------
This email is protected by LBackup
http://www.lbackup.org
More information about the rsync
mailing list