efficient rpc io using vstr
Martin Pool
mbp at samba.org
Tue Jan 20 09:00:42 GMT 2004
On 20 Jan 2004, Michael B Allen <mba2000 at ioplex.com> wrote:
> True. True.
>
> > If there is a way to get away from the security problems of char*
> > buffers without giving up too much speed or flexibility then I think
> > that's a good thing.
>
> Yup. I agree you have to have at least a convention to keep folks from
> doing inappropriate things.
On the Vstr web page, James says that a convention alone is not enough
to save you, because nonconventional uses can slip through. Samba has
done some technical things to make it harder but it still happens.
> > It looks like there is some support but I haven't looked closely. Is
> > it much harder to iconv things on top of Vstr than on top of char*?
>
> Probably not. You could certainly use iconv on top of vstr. The problem is
> you'll *have to*. You will not really be taking advantage of the benifits
> of vstr.
>
> My undstanding of the way vstr works is that it just tracks pointers to
> memory and the length of interest at that memory. It provides wrappers for
> common functions that can operate on disjoint possible unterminated
> strings. So if the represetation of that memory is the same as what is
> needed internally then no copying is necessary. Good. But if the
> representation is different then it quietly makes a copy. I could very
> well be a little off base here. I didn't look at vstr's internals
> closely.
I think of Vstr as primarily a layer for allocating byte arrays, and
secondarily a string library. You can do things with its allocations
that don't make sense in C strings: split them, delete parts, and have
embedded \0s.
> The problem is, with Samba, the buffer representation of objects and the
> representation used internally is rarely the same. In particular,
> character strings which as we all know are UCS-2LE on the wire will need
> to be converted unless it's a simple string and your using UCS-2LE
> internally but usually things ultimately need to be converted to the 8bit
> locale encoding anyway. Compounded with the fact that you frequently need
> to change the string slightly such as switching '\' for '/' or
> canonicalize a path you really end up just copying everything. Thus vstr
> probably won't help.
Right, in all those cases you do need to copy. Vstr does not address
that particular part of the problem. On the other hand, I think it
will not hurt either, and might make it better. For example, rather
than convert_string_allocate, call
Vstr_base unix_vstr = convert_vstr(CH_UCS2, CH_UNIX, wire_vstr);
The contents of that function would be similar to at present; rather
than malloc it makes a new vstr.
To fix up the path:
size_t pos = 1;
while ((pos = vstr_srch_chr_fwd(path_vstr, pos, path_vstr->len,
'\\')) != 0) {
vstr_sub_buf(path_vstr, pos, 1, "/", 1);
}
I don't think that is enormously harder than plain char*.
Speaking of canonicalizing paths: this is something that is
persistently difficult to do in plain C strings without either
overflowing or getting the wrong result. There have been many holes
in Apache, ftp daemons, IIS, and other programs because of trying to
do it in low-level pointer arithmetic.
> >> and b) certain manipulation is going to cause allocation and copying
> >> of possibly large numbers of little fragments.
> >
> > Plain C strings can cause a great deal of allocation and copying too.
> > The question is, will Vstr be worse? Indeed, this example addresses a
> > particular case of receiving on a nonblocking socket where it is quite
> > hard to avoid extensive copying in plain C.
>
> Actually this is where vstr *might* help. From a throuput stand point you
> would like to read in whatever data is in the socket buffer. If you read
> in 5 SMBs in one read you *might* be able to use vstr to manage multiple
> smbs in the same buffer. But that could get really hairy.
It's not even just throughput: if you want a nonblocking server then
you need to be able to accept and hold partial packets, and similarly
for output. That implies some kind of buffering. I think doing it
through vstrs is pretty clean. What did you think of the main() in
that file?
> >> That can totally eliminate a lot of strlen/strcpy kind of work which
> >> is what vstr is designed to deal with.
> >
> > Can you explain how suba helps?
>
> Well the part you quoted and suba are two different techniques. I believe
> the phrase I used was "copy as you parse". In this case an example is
> worth a thousand DWORDS :) Consider this path canonicalization routine:
>
> http://www.ioplex.com/~miallen/libmba/dl/src/path.c
>
> So after iconv-ing the string off the buffer you can canonicalize it right
> into a new buffer that is a valid path ready to use with the host API.
I'm not sure I understand your point. I don't think it's necessary to
never use char*s, just that sometimes using a buffer API can be
safer/faster.
> Now suba would be a lot more like what you're doing right now. Currently I
> believe you're using pstring's and talloc as a scratch pad to do work with
> minimal deinitialization necessary. Suba is a very simple circular linked
> list memory allocator but with two key features, one of which is
> pertainent to this discussion. First, it can be initialized with stack
> memory:
and second?
>
> int
> myfn(void)
> {
> unsigned char mem[0xFFF];
> struct allocator *suba = suba_init(mem, 0xFFF, 1, 0);
>
> /* allocate memory */
> obj = suba_alloc(suba, 10, 1);
> /* and just return; no freeing necessary*/
> }
That's very clever. On the other hand, there is no protection against
overflow, which I think ought to be a major consideration for any
network code.
--
Martin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.samba.org/archive/samba-technical/attachments/20040120/0691ed7d/attachment.bin
More information about the samba-technical
mailing list