[clug] Anyone want to give a talk for this week's PSIG meeting?

Tue Mar 11 22:23:17 MDT 2014

On 11/03/14 20:09, Paul Wayper wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/03/14 09:00, Paul Wayper wrote:
>> Hi,
>>
>> Currently there is no talk scheduled for this week's PSIG meeting.
>>
>> If you'd like to give a talk at this week's meeting, or a future
>> meeting, please drop me a line.
>
> I've got a topic of my own I'd like to ask people about, if that's OK...
>
> There are lots of situations where a program needs to make a relatively
> small change to a file that involves adding or deleting data.  The obvious
> one is changing metadata information inside a music or video file, but even
> standard operations like document editing, applying patches to text files,
> editing graphics or audio files and more do this basic process of inserting
> or removing data within a file.
>
> And yet it's not something that is supported by the file system.  This is
> basically a historic legacy - most file systems are basically optimised to
> write fixed-length blocks of data, usually 512 bytes but increasingly 4096
> bytes.  It's easy to pass this problem onto user-space.  It's also easier to
> write your file system like this when you have 128kb of memory to fit your
> operating system and user-space into.
>
> The usual way of solving this is to read your file into memory, alter it
> there, and write a new one out.  In the process, you can accidentally
> overwrite your only copy of your data with something that's junk, you can be
> fooled into overwriting something that shouldn't be touched (e.g.
> /etc/passwd), and you can leave temporary files around and not get to delete
> them.  It also means that each user-space program that does this kind of
> process - and, when you think about it, there are an awful lot of them - has
> to re-implement this process again.  While inventing the wheel is fun, and
> each one might optimise its operations to its own workload, it means that we
> keep debugging the same problems.
>
> As a proof of concept, I wanted to make a generic library that would provide
> an interface to a truly random access file structure.  You could read,
> insert, overwrite and delete arbitrary data from arbitrary places in the
> file.  Underneath, the library keeps a track of where all these chunks of
> information are in an actual file.  The on-disk format may look nothing like
> the raw data you see if reading the file from start to finish, but that's OK
> - - the program using the library is essentially saying "you handle the actual
> storage, I'll just put data in where I like".  It's no different,
> conceptually, from a bitmap not necessarily being rows of columns of pixels
> in RGB format.
>
> What I'd like to do at the PSIG meeting is talk about what kind of
> operations could be supported by this library, how the underlying storage
> might work and what limitations I'm going to have to overcome.
>
> Any interest in this?
>
> Have fun,
>
> Paul

Hi Paul,

I'm sort of interested. I am thinking, though, that inserting and
deleting bytes from a file without reference to the structure of the
data may have limited usefulness? If the file contains offsets etc.
and you just add or delete bytes from it then all those offsets will
be incorrect?

Surely this is why we have "standards" like XML, JSON and databases
for structuring data?

On the other hand, if you are not inserting or deleting bytes, just
changing chunks "in place", then the existing seek() and write() API
calls will do just fine?

Or am I missing something?

Bob Edwards.

> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlMe0rIACgkQu7W0U8VsXYL/KACgpP4VUcXVwTPv/LT1vfWFsmR6
> vrwAnjKbQ/6adSzfzPFmtor/fsTBAaYr
> =jVd4
> -----END PGP SIGNATURE-----
>