efficient rpc io using vstr

Martin Pool mbp at samba.org
Tue Jan 20 09:00:42 GMT 2004

On 20 Jan 2004, Michael B Allen <mba2000 at ioplex.com> wrote:

> True. True.
> > If there is a way to get away from the security problems of char*
> > buffers without giving up too much speed or flexibility then I think
> > that's a good thing.
> Yup. I agree you have to have at least a convention to keep folks from
> doing inappropriate things.

On the Vstr web page, James says that a convention alone is not enough
to save you, because nonconventional uses can slip through.  Samba has
done some technical things to make it harder but it still happens. 

> > It looks like there is some support but I haven't looked closely.  Is
> > it much harder to iconv things on top of Vstr than on top of char*?
> Probably not. You could certainly use iconv on top of vstr. The problem is
> you'll *have to*. You will not really be taking advantage of the benifits
> of vstr.
> My undstanding of the way vstr works is that it just tracks pointers to
> memory and the length of interest at that memory. It provides wrappers for
> common functions that can operate on disjoint possible unterminated
> strings. So if the represetation of that memory is the same as what is
> needed internally then no copying is necessary. Good. But if the
> representation is different then it quietly makes a copy. I could very
> well be a little off base here. I didn't look at vstr's internals
> closely.

I think of Vstr as primarily a layer for allocating byte arrays, and
secondarily a string library.  You can do things with its allocations
that don't make sense in C strings: split them, delete parts, and have
embedded \0s.

> The problem is, with Samba, the buffer representation of objects and the
> representation used internally is rarely the same. In particular,
> character strings which as we all know are UCS-2LE on the wire will need
> to be converted unless it's a simple string and your using UCS-2LE
> internally but  usually things ultimately need to be converted to the 8bit
> locale encoding anyway. Compounded with the fact that you frequently need
> to change the string slightly such as switching '\' for '/' or
> canonicalize a path you really end up just copying everything. Thus vstr
> probably won't help.

Right, in all those cases you do need to copy.  Vstr does not address
that particular part of the problem.  On the other hand, I think it
will not hurt either, and might make it better.  For example, rather
than convert_string_allocate, call

  Vstr_base unix_vstr = convert_vstr(CH_UCS2, CH_UNIX, wire_vstr);

The contents of that function would be similar to at present; rather
than malloc it makes a new vstr.

To fix up the path:

  size_t pos = 1;
  while ((pos = vstr_srch_chr_fwd(path_vstr, pos, path_vstr->len, 
                                  '\\')) != 0) {
    vstr_sub_buf(path_vstr, pos, 1, "/", 1);

I don't think that is enormously harder than plain char*.

Speaking of canonicalizing paths: this is something that is
persistently difficult to do in plain C strings without either
overflowing or getting the wrong result.  There have been many holes
in Apache, ftp daemons, IIS, and other programs because of trying to
do it in low-level pointer arithmetic.

> >> and b) certain manipulation is going to cause allocation and copying
> >> of possibly large numbers of little fragments.
> >
> > Plain C strings can cause a great deal of allocation and copying too.
> > The question is, will Vstr be worse?  Indeed, this example addresses a
> > particular case of receiving on a nonblocking socket where it is quite
> > hard to avoid extensive copying in plain C.
> Actually this is where vstr *might* help. From a throuput stand point you
> would like to read in whatever data is in the socket buffer. If you read
> in 5 SMBs in one read you *might* be able to use vstr to manage multiple
> smbs in the same buffer. But that could get really hairy.

It's not even just throughput: if you want a nonblocking server then
you need to be able to accept and hold partial packets, and similarly
for output.  That implies some kind of buffering.  I think doing it
through vstrs is pretty clean.  What did you think of the main() in
that file?

> >> That can totally eliminate a lot of strlen/strcpy kind of work which
> >> is what vstr is designed to deal with.
> >
> > Can you explain how suba helps?
> Well the part you quoted and suba are two different techniques. I believe
> the phrase I used was "copy as you parse". In this case an example is
> worth a thousand DWORDS :) Consider this path canonicalization routine:
>   http://www.ioplex.com/~miallen/libmba/dl/src/path.c
> So after iconv-ing the string off the buffer you can canonicalize it right
> into a new buffer that is a valid path ready to use with the host API.

I'm not sure I understand your point.  I don't think it's necessary to
never use char*s, just that sometimes using a buffer API can be
> Now suba would be a lot more like what you're doing right now. Currently I
> believe you're using pstring's and talloc as a scratch pad to do work with
> minimal deinitialization necessary. Suba is a very simple circular linked
> list memory allocator but with two key features, one of which is
> pertainent to this discussion. First, it can be initialized with stack
> memory:

and second?
> int
> myfn(void)
> {
>     unsigned char mem[0xFFF];
>     struct allocator *suba = suba_init(mem, 0xFFF, 1, 0);
>     /* allocate memory */
>     obj = suba_alloc(suba, 10, 1);
>     /* and just return; no freeing necessary*/
> }

That's very clever.  On the other hand, there is no protection against
overflow, which I think ought to be a major consideration for any
network code.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/samba-technical/attachments/20040120/0691ed7d/attachment.bin

More information about the samba-technical mailing list