[clug] Anyone want to give a talk for this week's PSIG meeting?

Tue Mar 11 03:09:06 MDT 2014

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/03/14 09:00, Paul Wayper wrote:
> Hi,
> 
> Currently there is no talk scheduled for this week's PSIG meeting.
> 
> If you'd like to give a talk at this week's meeting, or a future
> meeting, please drop me a line.

I've got a topic of my own I'd like to ask people about, if that's OK...

There are lots of situations where a program needs to make a relatively
small change to a file that involves adding or deleting data.  The obvious
one is changing metadata information inside a music or video file, but even
standard operations like document editing, applying patches to text files,
editing graphics or audio files and more do this basic process of inserting
or removing data within a file.

And yet it's not something that is supported by the file system.  This is
basically a historic legacy - most file systems are basically optimised to
write fixed-length blocks of data, usually 512 bytes but increasingly 4096
bytes.  It's easy to pass this problem onto user-space.  It's also easier to
write your file system like this when you have 128kb of memory to fit your
operating system and user-space into.

The usual way of solving this is to read your file into memory, alter it
there, and write a new one out.  In the process, you can accidentally
overwrite your only copy of your data with something that's junk, you can be
fooled into overwriting something that shouldn't be touched (e.g.
/etc/passwd), and you can leave temporary files around and not get to delete
them.  It also means that each user-space program that does this kind of
process - and, when you think about it, there are an awful lot of them - has
to re-implement this process again.  While inventing the wheel is fun, and
each one might optimise its operations to its own workload, it means that we
keep debugging the same problems.

As a proof of concept, I wanted to make a generic library that would provide
an interface to a truly random access file structure.  You could read,
insert, overwrite and delete arbitrary data from arbitrary places in the
file.  Underneath, the library keeps a track of where all these chunks of
information are in an actual file.  The on-disk format may look nothing like
the raw data you see if reading the file from start to finish, but that's OK
- - the program using the library is essentially saying "you handle the actual
storage, I'll just put data in where I like".  It's no different,
conceptually, from a bitmap not necessarily being rows of columns of pixels
in RGB format.

What I'd like to do at the PSIG meeting is talk about what kind of
operations could be supported by this library, how the underlying storage
might work and what limitations I'm going to have to overcome.

Any interest in this?

Have fun,

Paul
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlMe0rIACgkQu7W0U8VsXYL/KACgpP4VUcXVwTPv/LT1vfWFsmR6
vrwAnjKbQ/6adSzfzPFmtor/fsTBAaYr
=jVd4
-----END PGP SIGNATURE-----