[PATCH 00/14] tdb: Update pytdb API to match what is provided by libtdb

Sat Oct 2 10:14:44 MDT 2010

Hi Kirill,

On Sat, 2010-10-02 at 17:41 +0400, Kirill Smelkov wrote:
> First of all I'm sorry it took me so long to reply.
No worries, thanks for your reply and your patience.

> On Tue, Sep 28, 2010 at 10:05:22AM +0200, Jelmer Vernooij wrote:
> > On Mon, 2010-09-27 at 12:35 +0400, Kirill Smelkov wrote:
> > > On Mon, Sep 20, 2010 at 01:16:49PM +0400, Kirill Smelkov wrote:
> > > Jelmer, others, what I'm maybe doing wrong here? I just wanted to use
> > > tdb from python without major constraints compared to C version.
> > Sorry, as Andrew mentioned most of us were at a conference last week so
> > I haven't had much time to look at your patches again.
> > 
> > With regard to the chainlock functions; I can see the use in exposing
> > these, but am not convinced mapping them one to one from the C functions
> > is necessarily the best idea. The header warns to use the chainlock
> > functions with care;
> That warning dates back to 2000 (see 7e4c4721). To me it's like
> low-level locking should be always used with care, because it's much
> more easier to use one global lock. But should this prevent exposing
> locking functionality?
Ok, fair enough. Perhaps I'm being overly cautious, I just got a bit
worried reading that comment. I'd just like to make sure it's not too
easy for users to shoot themselves in the foot. 

> > Does using these functions from Python not cause unexpected segfaults in
> > some situations? With some more unit tests I'd be happy to accept those
> > patches.
> I've ported tdbtorture to verify this, and yes, they don't produce
> segfaults and also tdb_check says OK.

> However, please note - I've got occasional infrequent torture failures
> with -k (kill random) for _both_ C tdbtorture and python one.
> 
> I'll send updated patches shortly. Would pytdbtorture be enough or maybe
> you've meant some other approach to test it?
Sorry for being unclear.  What I meant was that I'd like to see some
more unit tests for these functions - e.g. testing that trying to
chainlock the same key multiple times does the right thing, attempting
to remove a chainlock where no exists works, etc.

Either way, thanks for providing pytdbtorture, it's nice to have an
extra test.

> > With regards to some of the other functions, I don't think completeness
> > is a valid reason for adding bindings per se. I really don't see why
> > tdb_fd would need to be exposed on the Python level (or at all) for
> > example.
> I thought it's just a matter of consistency and completenes. Many
> python's file-like objects have .fileno(), so if C tdb itself exposes
> tdb_fd(), why don't we expose it to Python?
As I mentioned earlier, I think it's a bad idea to map all C functions
one to one to Python functions. E.g. we don't have tdb.traverse() either, we have 
Python iterators. 

At the very least, if we're trying to match file-like objects in Python
objects we should name this method "fileno" for consistency, not "fd".

> Maybe someone crazy enough would want to get fd from open tdb, dup it,
> lseek it to file start, and then sendfile it across network.

> I agree, this example is maybe a bit artificial, but my point here is
> that bindings should not set up policy of what is doable and what is
> not. If C tdb exposes some functionality, so should bindings unless
> there are strong reaseong _not_ to do it, imho.
I think it should be the other way around. There should be a reason for
having a method there. I would like to understand why it is in the C
bindings before blindly adding Python bindings for it.

> > The more functions are exposed the harder it becomes to find
> > something by browsing the API and the harder it becomes to change that
> > API.
> 
> To me it was exactly vice versa - I've browsed C tdb api, and then
> discovered that a lot of functions were missing in pytdb. By the way,
> the only description we have is docs/README, so I assume those who want
> to use tdb start from there, and then those who turn to pytdb are like
> me.
I don't see how methods that you don't need to use being missing in the
Python API could be an issue.

> > You mention corruption, but that would be due to bugs in tdb. I
> > don't understand why we'd need to expose any other function than
> > tdb_check in this regard.
> I've mentioned corruption not because of bugs in tdb, but because of
> storage I'm going to keep my tdbs is unreliable -- flashes don't have
> sectors/blocks/etc, they have eraseblock (e.g. 128K).
> 
> So for flashes which mimic IDE or SATA drives, there is on-card
> controller which translates block-language into what flash understands.
> 
> And when e.g. host writes new sector into flash, that controller first
> _erases_ much more bigger eraseblock, and only then writes back old
> eraseblock conntent with sector.
> 
> Now, if power suddenly goes down in the middle, we are potentially
> getting corruption - becase write operation touches sectors block layer
> though it was keeping intact.
> 
> This property means journaling does not work reliably on flash storage
> with FTL - this is described in LWN article I've mentioned.
Sure, I understand that. What I'm interested in is how you think those
methods will help you deal with those issues.

> > See my reply about dump_all() etc writing to stdout. I think if we
> > expose these functions in Python we should at least make them write to a
> > file-like object, not to stdout. That will also make testing easier.
> Yes, agree. Also let's postpone dump_all() and friends until we sort out
> more important ones, e.g. locking, check, traverse, log and
> pytdbtorture.
Works for me.

I'll merge the patches on which I haven't commented.

Cheers,

Jelmer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20101002/b0b2ca55/attachment.pgp>