[clug] file reorgs: moving files & maintaining directory structures

steve jenkin sjenkin at canb.auug.org.au
Sat Aug 26 04:54:35 UTC 2017


I’m in the middle of a re-org of files that have built up over the years (>10)  in a Download directory.

I could use ‘cp -a’ or ‘rsync’ to clone the whole directory, as is, but that re-organise the hierarchy or just move old files to a backup location.
There’s ~100GB and ~100,000 files on a desktop computer, which means I have to partition the data and phaf about

My first step was to identify and remove duplicate copies of files. De-duplicating before backup / reorg seemed a good idea :)
Then I pushed software and code / packages into more appropriate places, deleted old temporary files.

My first action has been to identify the oldest half of the files. ‘find -mtime’ wasn’t suitable - it presumes you know the day number.
I tried ‘-newer' against a timestamp’ed file - wasn’t what I wanted either…

Discovered ‘stat’ and that it’d print a/c/m-time in Unix epoch seconds - good for sorting.
From that, it was easy to create a file-list of the oldest half, then use tar to copy files.

‘tar’ & ‘cpio’ create the whole directory hierarchy, whereas ‘mv’ doesn’t.
‘cp -a’ doesn’t take a file list from STDIN. you’re supposed to use ‘xargs’ some way :(
‘rsync’ with "--remove-source-files” & "--prune-empty-dirs" (only discovered that in the research) almost did what I wanted.
I wanted to be able to feed it, like tar & cpio, a list of the files I wanted to copy or move, but have never known how to do that.

> cut -f3 file-list | sort | tar -T - -cf - -C $DN | tar -xvpf - -C $v

I really would’ve like to have used something like this, to avoid a) pipeline and unnecessary processes and b) all the bugs in tar.

> cut -f3 file-list | tr ‘\n’ ‘\0’ | xargs -0 sexy-mv-cmd -t $v

gnu-mv has the -t option, which _almost_ does everything I wanted.
>  -t, --target-directory=DIRECTORY
>               move all SOURCE arguments into DIRECTORY


Questions:

1. Are there good DeDuplication tools you can recommend based on SHA1 or MD5? I had to invent my own :(

2. What tools other people used for this sort of work? [Reorganising and partitioning by date or size]

cheers
steve

--
Steve Jenkin, IT Systems and Design 
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin




More information about the linux mailing list