[clug] The first de-duplicated filesystem, late 1990's?

steve jenkin sjenkin at canb.auug.org.au
Mon Jan 14 18:07:55 MST 2013


I've been looking at different filesystem designs and the "Plan 9"
archival system didn't just store daily snapshots as I thought...

[I'd also missed that the read/write 'fossil' filesystem implemented
frequent snapshots, user accessible, as well.]

Venti also de-duplicated data, because blocks were stored by 'score'
(hash-value).
Giving filesystems a neat internal check:
  does the hash of the retrieved block match the requested 'score'?

In a man page, it's noted that the stored blocks could also be
compressed to increase drive space.

Could this be the first de-duplicated file system??

This was around the time that NetApp did snapshots with their patented
WAFL design...
[The definitive RAID paper by Patterson, Gibson, Katz was 1988/9.]

That was some productive decade in Storage Design/Research!

cheers
steve

<http://swtch.com/plan9port/man/man8/venti.html>
<http://en.wikipedia.org/wiki/Venti>
<http://static.usenix.org/events/fast02/quinlan.html> [2002 Paper]

"Venti is a network storage system that permanently stores data blocks.

 A 160-bit SHA-1 hash of the data (called score by Venti) acts as the
address of the data.

 This enforces a write-once policy since no other data block can be
found with the same address: the addresses of multiple writes of the
same data are identical, so duplicate data is easily identified and the
data block is stored only once.

 Data blocks cannot be removed, making it ideal for permanent or backup
storage.
 Venti is typically used with Fossil to provide a file system with
permanent snapshots."


==> Venti still lives in 'Inferno', a cross-platform hosted variant

<http://www.vitanuova.com/inferno/man/2/venti.html>

-- 
Steve Jenkin, Info Tech, Systems and Design Specialist.
0412 786 915 (+61 412 786 915)
PO Box 48, Kippax ACT 2615, AUSTRALIA

sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin


More information about the linux mailing list