efficient rpc io using vstr

Michael B Allen mba2000 at ioplex.com
Tue Jan 20 08:57:21 GMT 2004


Martin Pool said:
>> I don't like it for a variety of reasons. Mainly IMHO C string
>> abstractions are unncessary because of the powerful C feature that is
>> pointer arithmetic.
>
> I think you mean, "powerful way for people to take over your machine." :-)

True. True.

> If there is a way to get away from the security problems of char*
> buffers without giving up too much speed or flexibility then I think
> that's a good thing.

I totally agree you have to have at least a convention to keep folks from
doing inappropriate things.

>> More practically, one problem is that vstr has nothing
>> to do with strings really. It's an I/O buffering strategy that trys to
>> hide pointer arithmetic from the user by managing pointers to data that
>> doesn't change. This might be useful to Samba for packet processing.
>
> Well, that was what I was using it for here.

First, I'm not a vstr guru. I had some dialog with the author and he
implemented my csv.c FSM parser[1] for vstr. So I know of vstr's
limitations regarding string handling but it might very well turn out that
the principle s used in vstr folds into your async IO strategy well.

>> I don't really know. But using it for character string manipulation
>> might give you some problems because a) it is completely oblivious
>> to internationalization
>
> It looks like there is some support but I haven't looked closely.  Is
> it much harder to iconv things on top of Vstr than on top of char*?

Probably not. You could certainly use iconv on top of vstr. The problem is
you'll *have to*. You will not really be taking advantage of the benifits
of using vstr in the first place.

My undstanding of the way vstr works is that it just tracks pointers to
memory and the length of interest at that memory. It provides wrappers for
common functions that can operate on disjoint possible unterminated
strings. So if the represetation of that memory is the same as what is
needed internally then no copying is necessary. Good. But if the
representation is different then it quietly makes a copy. I could very
well be a little off base here. I didn't look at vstr's internals closely.
I'm sure the author would be happy to clarify.

The problem is; with Samba the buffer representation of objects and the
representation used internally is rarely the same. Character strings,
which as we all know are UCS-2LE on the wire, will need to be converted
(unless it's a simple plain string and your using UCS-2LE internally) and
ultimately these things need to be converted to the 8bit locale encoding
anyway. Compounded with the fact that you frequently need to change the
string slightly such as switching '\' for '/' or canonicalize a path vstr
just ends up creating copies of little memory fragments.

>> and b) certain manipulation is going to cause allocation and copying
>> of possibly large numbers of little fragments.
>
> Plain C strings can cause a great deal of allocation and copying too.
> The question is, will Vstr be worse?  Indeed, this example addresses a
> particular case of receiving on a nonblocking socket where it is quite
> hard to avoid extensive copying in plain C.

Actually this is where vstr *might* help. From a throuput stand point you
would like to read in whatever data is in the socket buffer. If you read
in 5 SMBs in one read you *might* be able to use vstr to manage multiple
smbs in the same buffer. But that's just a vague idea. I don't know if you
can really do that with vstr.

>> That can totally eliminate a lot of strlen/strcpy kind of work which
>> is what vstr is designed to deal with.
>
> Can you explain how suba helps?

Well the part you quoted and suba are two different techniques. I believe
the phrase I used was "copy as you parse". In this case an example is
worth a thousand DWORDS :) Consider this path canonicalization routine:

  http://www.ioplex.com/~miallen/libmba/dl/src/path.c

So after iconv-ing the string off the buffer you can canonicalize it right
into a new buffer that is a valid path ready to use with the host API.
Another good example of this technique is the csv routine cited. Notice
the sentinal points make these routines a lot safer.

Using suba to assist string processing would be a lot more like what
you're doing right now. Currently I believe you're using "pstring" and
other "scratch pad" techniques to do work with minimal need for
deinitialization. Suba is a very plain and small circular linked list
memory allocator. It's used just like the stdlib allocator but it's
lockless and has two key features that the stdlib allocator doesn't have
-- one of which is pertainent to this discussion.

First, it can be initialized with stack memory:

int
myfn(int len)
{
    unsigned char tmp[0xFFF];
    struct allocator *suba = suba_init(tmp, 0xFFF, 1, 0);
    str_t *str;

    /* make lots of stings */

    str = suba_alloc(suba, len * sizeof(*str), 0);

    /* and just return
     * no free-ing necessary */
    return 0;
}

I beleive your talloc code serves the same purpose? Can talloc use stack
memory?

Anyway the point is, if you're going to make a lot of copies of stuff you
might as well use a tempory allocator. Suba is ~20% faster than malloc.

Mike

[1] http://www.ioplex.com/~miallen/libmba/dl/src/csv.c



More information about the samba-technical mailing list