code duplication in ccan (was: Re: snprintf on SunOS)

Thu Jun 14 20:10:13 MDT 2012

On Fri, 15 Jun 2012 00:36:57 +0200, Jelmer Vernooij <jelmer at samba.org> wrote:
> Hi Rusty,

Hi Jelmer!

        I'm delighted we're actually discussing this stuff BTW!
Sometimes it seems hard to get other people's attention on these kind of
details.

> On Sun, Jun 10, 2012 at 01:02:50PM +0930, Rusty Russell wrote:
> > Why don't we just roll ccan into samba-util?  Either move ccan/ under
> > util, or leave it where it is and just make samba-util pull in ccan.
> That seems reasonable. We would have to make sure samba-util builds
> without the rest of Samba so it can be a dependency of tdb, but that's
> doable (can we name it libtutil?). There is still some overlap that
> would be nice to avoid though (not just because of the amount of code
> that's duplicated, but also because it leads to multiple ways to do
> the same thing, which makes it harder to read code).

One option is to move the ccan.

> > If you think of CCAN as a separate project, the question becomes: should
> > we help CCAN, or just do our own samba-specific code?
> If we want to get our utility code from an external source we could
> also look at using e.g. glib.  That's got a fair amount of
> functionality, and is present on a lot of systems already. One of the
> advantages would also be that we would no longer have to worry about
> maintaining that code.  Instead, we could ditch parts of samba-util
> that provide the same functionality.

I really dislike glib; it's more a framework you have to adapt to than a
set of things we can cherry-pick from.  If we were writing Samba from
scratch, perhaps.

> I don't really understand why ccan was (silently) introduced as a
> separate directory into Samba - it overlaps with something we already
> have in Samba, and it wasn't discussed AFAIK.

Actually, I asked Tridge about it.  It seemed sensible to put it in the
lib/ dir rather than under tdb2, so others could use it.

> ccan as is is not really appropriate to build as a shared library - it
> is more like a code collection from which you can cherrypick what you
> like.

Yes!  It's built as a shared lib because that's how Samba builds things,
not because it's sane.

> In its current form it has extra overhead with each module (a copy of
> the license and a description) and a test system that's different from
> the rest of Samba.

In the ccan repo we symlink the licenses.  We could probably do that
here (it's mainly 10 copies of the GPL).  The description is
documentation: that's not overhead!

See below on the test system integration...

> It took us a long time to go from two sets of utility functions in
> Samba 3 and Samba 4 that had diverged back to a single set that is now
> known as libsamba-util. This feels like going a step back.

Understood.  However, I see it as an opportunity to do some real
unification.  So let's look at that!

> > OTOH, I'd like SAMBA to unify on ccan/likely and ccan/time; the former
> > because it has nice infrastructure for measuring branches which
> > CCAN_LIKELY_DEBUG is set, and the latter because Samba doesn't have a
> > unified time infrastructure; there are pieces all over the tree.
> Note that samba already has a definition of likely() somewhere in
> lib/util.

Gah!!!  You keep saying vague things like this.  If it's worth
discussing, it's worth reading the code first!

There are *five* definitions of likely() in the tree.  Four are
potentially buggy, because they treat the condition differently for
gcc vs other compilers.

I want to unify all of them.  Let's discuss how!

> Samba has time functionality in lib/util/time.h, in what way is the
> version in ccan better?

And we also have a complete duplication of functionality in tevent :(

That's a bit hard to solve, since it's in the ABI, but we should
back-end them onto common code, IMHO.  Pulling in libutil is overkill
though, because it has samba-specific functions like NTTIME stuff.

But it would be easy if lib/util/time.h used ccan/time for the generic
stuff.  So could tevent.

I think ccan/time could use improvement, too.  Mainly, it would be nice
just to use timespec everywhere, even on systems where we don't get ns
accuracy.  Frobbing between different time formats is a recipe for
confusion.  I think Samba would be neater for such a change.  What do
you think?

I do think that we should pass structures by copy rather than pointer,
as it makes it far easier to chain things, eg: 

   end = time_add(time_now(), time_from_msec(500));

We should also consider some inlining for the trivial stuff, too.

I think if we tried to unify them we'd understand much better what a
great set of time primitives would look like.  Put those in ccan/time.
Put the Samba-specific stuff in lib/util/time.h.

> > I think ccan/build_assert and ccan/cast would also be minor wins.
> 
> cast seems to do the same thing as discard_const? I would really like
> to avoid getting into the situation where some source files use
> discard_const and others use cast_const, while the two do the same thing.

There are 3 copies of discard_const() in Samba.  At least they're all
the same.

We should kill discard_const().  It just suppresses a -Wcast-qual
warning, but at the cost of type safety:

        void my_func(const int *intptr)
        {
                char *foo = discard_const(intptr);
        }

vs:

        void my_func(const int *intptr)
        {
                char *foo = cast_const(intptr);
        }

The latter gives a warning, since cast_const(intptr) gives an "int *",
not a "void *".

> > It makes sense also to share some of samba-util with others, by turning
> > them into CCAN modules, eg. dlinklist.
> Why not just expose samba-util as a shared library? That seems like an
> easier way of sharing it with others than CCAN.

Because you'd have to maintain an ABI.  And that would stop us from
fixing bugs or removing functions; you really don't want a bag of utils
to be a shared library!

Or we could keep changing the ABI, making it useless for anyone outside
Samba anyway.

> > Well, ccan and tdb2/ntdb share the same test infrastructure.  I'd
> > actually like to make that more common: having infrastructure for
> > such low-level unit tests in Samba would be a significant benefit.
> I don't see why we need to come up with a new infrastructure for
> low-level unit tests for that, the existing code can be used for that
> as well.

Specifically smbtorture?  I don't think we can :(

We want these lowlevel unit tests to depend on as little as possible,
and we want them run first.  Little test programs are nice for that;
easy to write, easy to run.

The talloc testsuite.  How would we unify this?

The tdb testsuite?  It uses direct includes, which is a very powerful
unit testing technique which is much harder to do in a framework like
smbtorture.

ldb also seems to have its own testsuite.

Let's unify the test infrastructure, but we need to do it intelligently.
My preference would be to treat unit tests separately: standalone little
programs which test individual things, and get run first, then
smbtorture for higher-level tests?  If we standardize on naming, we can
make nice helpers to find, build, and run them.

I'd like to write more unit tests for Samba code: it's nicer to get a
local failure than have smbtorture report some high-level failure
halfway through a 2 hour test run :(

Cheers,
Rusty.