[clug] file reorgs: moving files & maintaining directory structures
steve jenkin
sjenkin at canb.auug.org.au
Sat Aug 26 04:54:35 UTC 2017
I’m in the middle of a re-org of files that have built up over the years (>10) in a Download directory.
I could use ‘cp -a’ or ‘rsync’ to clone the whole directory, as is, but that re-organise the hierarchy or just move old files to a backup location.
There’s ~100GB and ~100,000 files on a desktop computer, which means I have to partition the data and phaf about
My first step was to identify and remove duplicate copies of files. De-duplicating before backup / reorg seemed a good idea :)
Then I pushed software and code / packages into more appropriate places, deleted old temporary files.
My first action has been to identify the oldest half of the files. ‘find -mtime’ wasn’t suitable - it presumes you know the day number.
I tried ‘-newer' against a timestamp’ed file - wasn’t what I wanted either…
Discovered ‘stat’ and that it’d print a/c/m-time in Unix epoch seconds - good for sorting.
From that, it was easy to create a file-list of the oldest half, then use tar to copy files.
‘tar’ & ‘cpio’ create the whole directory hierarchy, whereas ‘mv’ doesn’t.
‘cp -a’ doesn’t take a file list from STDIN. you’re supposed to use ‘xargs’ some way :(
‘rsync’ with "--remove-source-files” & "--prune-empty-dirs" (only discovered that in the research) almost did what I wanted.
I wanted to be able to feed it, like tar & cpio, a list of the files I wanted to copy or move, but have never known how to do that.
> cut -f3 file-list | sort | tar -T - -cf - -C $DN | tar -xvpf - -C $v
I really would’ve like to have used something like this, to avoid a) pipeline and unnecessary processes and b) all the bugs in tar.
> cut -f3 file-list | tr ‘\n’ ‘\0’ | xargs -0 sexy-mv-cmd -t $v
gnu-mv has the -t option, which _almost_ does everything I wanted.
> -t, --target-directory=DIRECTORY
> move all SOURCE arguments into DIRECTORY
Questions:
1. Are there good DeDuplication tools you can recommend based on SHA1 or MD5? I had to invent my own :(
2. What tools other people used for this sort of work? [Reorganising and partitioning by date or size]
cheers
steve
--
Steve Jenkin, IT Systems and Design
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA
mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin
More information about the linux
mailing list