[PATCH 00/14] tdb: Update pytdb API to match what is provided by libtdb

Sat Oct 2 07:41:08 MDT 2010

Jelmer,

First of all I'm sorry it took me so long to reply.

On Tue, Sep 28, 2010 at 10:05:22AM +0200, Jelmer Vernooij wrote:
> On Mon, 2010-09-27 at 12:35 +0400, Kirill Smelkov wrote:
> > On Mon, Sep 20, 2010 at 01:16:49PM +0400, Kirill Smelkov wrote:
> > > Hi Jelmer,
> > > 
> > > On Sun, Sep 19, 2010 at 10:16:18AM -0700, Jelmer Vernooij wrote:
> > > > Hi Kirill,
> > > > 
> > > > On Sun, 2010-09-19 at 13:53 +0400, Kirill Smelkov wrote: 
> > > > > Rusty, Jelmer,
> > > > > 
> > > > > The subject says it all. Not 100% complete, but near.
> > > > Thanks for the patches. I've applied most of the Python ones. I'm not at
> > > > all convinced we should match the C API in the Python API though, I
> > > > rather think we should let the needs of our Python users drive what we
> > > > expose. Some of the worst Python bindings I've seen were created by
> > > > simply mapping every C function one on one to Python.
> > > > 
> > > > Is there any particular reason why some of these functions should be
> > > > exposed? Why do you need low-level locking?
> > > 
> > > Thanks for applying some patches and sorry I've not described my context
> > > initially...
> > > 
> > > In this case I myself is tdb python user - I use tdb in embedded system
> > > for internal database to which many programms "connect" simultaneously
> > > to read/write it.
> > > 
> > > That's why I need locking, and better, to avoid lock contention, the
> > > chainlock_* family variants.
> > > 
> > > Also, sometimes it is not important to write data to db immediately, so
> > > to minize latencies, apps keep to-be-written queue internally until they
> > > know they can write to some chain, or start transaction - that's why I
> > > need *_nonblock variants.
> > > 
> > > Same for reading - once initially read, it's not that important to get
> > > up-to-date values immediately, that's why I'd also use
> > > tdb_chainlock_read_nonblock().
> > > 
> > > And to make life a bit more interesting, db is stored on compact flash
> > > -- various types, from various vendors, so with various types of flash
> > > translation layers (FTL) -- so inevitably with bugs in FTL with respect
> > > to sudden power failures, so I'm preparing to have corrupt tdb one day 
> > > 
> > > http://ozlabs.org/~rusty/index.cgi/tech/2009-10-20.html
> > > http://lwn.net/Articles/349970/
> > > 
> > > That's why I'd also like to have debugging routines (dump_all,
> > > print_freelist, etc..,), and tdb_check (not yet done, should I?), and
> > > also tdb_fd and tdb_repack come for completness (doesn't tdb_repack
> > > complement tdb_wipe_all() which has python bindings?).
> > > 
> > > And we don't have shutdown sequence - normal shutdown is poweroff...
> > > 
> > > 
> > > Hope this clarifies my rationale about why we should expose more
> > > functionality in pytdb.
> > 
> > Silence...
> > 
> > Jelmer, others, what I'm maybe doing wrong here? I just wanted to use
> > tdb from python without major constraints compared to C version.
> Sorry, as Andrew mentioned most of us were at a conference last week so
> I haven't had much time to look at your patches again.
> 
> With regard to the chainlock functions; I can see the use in exposing
> these, but am not convinced mapping them one to one from the C functions
> is necessarily the best idea. The header warns to use the chainlock
> functions with care;

That warning dates back to 2000 (see 7e4c4721). To me it's like
low-level locking should be always used with care, because it's much
more easier to use one global lock. But should this prevent exposing
locking functionality?

> have the bindings for them been tested extensively?

Not yet - I have only a prototype which uses them, but please read
below.

> Does using these functions from Python not cause unexpected segfaults in
> some situations? With some more unit tests I'd be happy to accept those
> patches.

I've ported tdbtorture to verify this, and yes, they don't produce
segfaults and also tdb_check says OK.

However, please note - I've got occasional infrequent torture failures
with -k (kill random) for _both_ C tdbtorture and python one.

I'll send updated patches shortly. Would pytdbtorture be enough or maybe
you've meant some other approach to test it?

> With regards to some of the other functions, I don't think completeness
> is a valid reason for adding bindings per se. I really don't see why
> tdb_fd would need to be exposed on the Python level (or at all) for
> example.

I thought it's just a matter of consistency and completenes. Many
python's file-like objects have .fileno(), so if C tdb itself exposes
tdb_fd(), why don't we expose it to Python?

Maybe someone crazy enough would want to get fd from open tdb, dup it,
lseek it to file start, and then sendfile it across network.

I agree, this example is maybe a bit artificial, but my point here is
that bindings should not set up policy of what is doable and what is
not. If C tdb exposes some functionality, so should bindings unless
there are strong reaseong _not_ to do it, imho.

> The more functions are exposed the harder it becomes to find
> something by browsing the API and the harder it becomes to change that
> API.

To me it was exactly vice versa - I've browsed C tdb api, and then
discovered that a lot of functions were missing in pytdb. By the way,
the only description we have is docs/README, so I assume those who want
to use tdb start from there, and then those who turn to pytdb are like
me.

> You mention corruption, but that would be due to bugs in tdb. I
> don't understand why we'd need to expose any other function than
> tdb_check in this regard.

I've mentioned corruption not because of bugs in tdb, but because of
storage I'm going to keep my tdbs is unreliable -- flashes don't have
sectors/blocks/etc, they have eraseblock (e.g. 128K).

So for flashes which mimic IDE or SATA drives, there is on-card
controller which translates block-language into what flash understands.

And when e.g. host writes new sector into flash, that controller first
_erases_ much more bigger eraseblock, and only then writes back old
eraseblock conntent with sector.

Now, if power suddenly goes down in the middle, we are potentially
getting corruption - becase write operation touches sectors block layer
though it was keeping intact.

This property means journaling does not work reliably on flash storage
with FTL - this is described in LWN article I've mentioned.

> See my reply about dump_all() etc writing to stdout. I think if we
> expose these functions in Python we should at least make them write to a
> file-like object, not to stdout. That will also make testing easier.

Yes, agree. Also let's postpone dump_all() and friends until we sort out
more important ones, e.g. locking, check, traverse, log and
pytdbtorture.

Thanks,
Kirill

P.S. I'm not sure, but I'll probably be offline for the next 2 weeks...