efficient rpc io using vstr

Michael B Allen mba2000 at ioplex.com
Tue Jan 20 08:31:01 GMT 2004


Martin Pool said:
> On 19 Jan 2004, Michael B Allen <mba2000 at ioplex.com> wrote:
>> Martin Pool said:
>> > Here's an sketch of the kind of packet buffering using vstr that I
>> > mentioned to tridge yesterday.  More discussion is in the comments.
>> >
>> > A sample input file is here
>> >
>> >   http://sourcefrog.net/projects/vstr-readpacket/
>> >
>> > I haven't benchmarked it other than running it under strace.
>>
>> I've had discussions with the author of vstr about his library.
>> Personally
>> I don't like it for a variety of reasons. Mainly IMHO C string
>> abstractions are unncessary because of the powerful C feature that is
>> pointer arithmetic.
>
> I think you mean, "powerful way for people to take over your machine." :-)

True. True.

> If there is a way to get away from the security problems of char*
> buffers without giving up too much speed or flexibility then I think
> that's a good thing.

Yup. I agree you have to have at least a convention to keep folks from
doing inappropriate things.

>> More practically, one problem is that vstr has nothing
>> to do with strings really. It's an I/O buffering strategy that trys to
>> hide pointer arithmetic from the user by managing pointers to data that
>> doesn't change. This might be useful to Samba for packet processing.
>
> Well, that was what I was using it for here.

First, I'm not a vstr guru. I had some dialog with the author and he
implemented my csv.c FSM parser for vstr. So I know of vstr's limitation
regarding string handling but it might very well turn out that the
principle s used in vstr folds into your async IO strategy well.

>> I don't really know. But using it for character string manipulation
>> might give you some problems because a) it is completely oblivious
>> to internationalization
>
> It looks like there is some support but I haven't looked closely.  Is
> it much harder to iconv things on top of Vstr than on top of char*?

Probably not. You could certainly use iconv on top of vstr. The problem is
you'll *have to*. You will not really be taking advantage of the benifits
of vstr.

My undstanding of the way vstr works is that it just tracks pointers to
memory and the length of interest at that memory. It provides wrappers for
common functions that can operate on disjoint possible unterminated
strings. So if the represetation of that memory is the same as what is
needed internally then no copying is necessary. Good. But if the
representation is different then it quietly makes a copy. I could very
well be a little off base here. I didn't look at vstr's internals closely.

The problem is, with Samba, the buffer representation of objects and the
representation used internally is rarely the same. In particular,
character strings which as we all know are UCS-2LE on the wire will need
to be converted unless it's a simple string and your using UCS-2LE
internally but  usually things ultimately need to be converted to the 8bit
locale encoding anyway. Compounded with the fact that you frequently need
to change the string slightly such as switching '\' for '/' or
canonicalize a path you really end up just copying everything. Thus vstr
probably won't help.

>> and b) certain manipulation is going to cause allocation and copying
>> of possibly large numbers of little fragments.
>
> Plain C strings can cause a great deal of allocation and copying too.
> The question is, will Vstr be worse?  Indeed, this example addresses a
> particular case of receiving on a nonblocking socket where it is quite
> hard to avoid extensive copying in plain C.

Actually this is where vstr *might* help. From a throuput stand point you
would like to read in whatever data is in the socket buffer. If you read
in 5 SMBs in one read you *might* be able to use vstr to manage multiple
smbs in the same buffer. But that could get really hairy.

>> That can totally eliminate a lot of strlen/strcpy kind of work which
>> is what vstr is designed to deal with.
>
> Can you explain how suba helps?

Well the part you quoted and suba are two different techniques. I believe
the phrase I used was "copy as you parse". In this case an example is
worth a thousand DWORDS :) Consider this path canonicalization routine:

  http://www.ioplex.com/~miallen/libmba/dl/src/path.c

So after iconv-ing the string off the buffer you can canonicalize it right
into a new buffer that is a valid path ready to use with the host API.

Now suba would be a lot more like what you're doing right now. Currently I
believe you're using pstring's and talloc as a scratch pad to do work with
minimal deinitialization necessary. Suba is a very simple circular linked
list memory allocator but with two key features, one of which is
pertainent to this discussion. First, it can be initialized with stack
memory:

int
myfn(void)
{
    unsigned char mem[0xFFF];
    struct allocator *suba = suba_init(mem, 0xFFF, 1, 0);

    /* allocate memory */
    obj = suba_alloc(suba, 10, 1);
    /* and just return; no freeing necessary*/


}


More information about the samba-technical mailing list