[Samba] cleaning up duplicate files on the file server
Aaron Kincer
kincera at gmail.com
Mon Feb 5 20:31:59 GMT 2007
Dealing with data duplication is not always particularly easy. What I
would suggest is the following:
1) Identify the duplicates with the oldest modification date
2) Notify your users that you are making changes and to be on the
lookout for any problems
3) Change the file permissions so that they can't be accessed by anyone
other than you
4) If after some predetermined length of time (measured in months
preferably) nobody has complained, delete the duplicates
Changing the permissions offers you an easy way to simulate deleting
without actually deleting. You could issue a command to dump the ACLs
for each file into a log by using a modified form of the command I've
posted in the past for setting the archive bit of files that have been
modified. Here is is for your convenience:
/usr/bin/find /share/ -name '*' -mtime 0 -exec setfattr
--name=user.DOSATTRIB --value=0x30783230 {} \;
You could change the find command to use your find duplicates and change
the setfattr to getfacl. With some fancy footwork, you should be able to
do all of that and redirect output into a text file in the event that
you have to restore permissions to their previous state. Of course, you
could also use this command to set permissions on all of the files by
using setfacl.
Just a suggestion. Any shell gurus out there that can offer up better or
more clear advice please do so.
James A. Dinkel wrote:
> I imagine we can save some space on our file server by cleaning up all
> the files that are saved multiple times by different people. There is
> already the fdupes command in linux that will scan a directory tree and
> report what files have duplicates. This could be easily scripted to
> turn those duplicate files into symlinks to one file.
>
>
>
> The problem is see, then, is what would happen if someone tries to
> change a duplicate file that they think is their own copy. Of course,
> everyone with a symlink to that file would get the changes, which is not
> what I would want. What it would need is some sort of copy-on-edit
> mechanism, so when the file is changed, instead of changing the original
> file, the symlink is replaced with the edited version of the file.
>
>
>
> Does this make sense? Has anyone else thought about this, or found an
> elegant solution to this?
>
>
>
> James Dinkel
>
> Network Engineer
>
> Butler County of Kansas
>
>
>
> There are 10 types of people in the world: those who understand binary,
> and those who don't.
>
>
>
>
More information about the samba
mailing list