[linux-cifs-client] [PATCH] cifs: Fix insufficient memory allocation for nativeFileSystem field

Thu Apr 9 14:40:47 GMT 2009

On Thu, 09 Apr 2009 19:59:13 +0530
Suresh Jayaraman <sjayaraman at suse.de> wrote:

> Steve French wrote:
> > On Tue, Apr 7, 2009 at 8:15 AM, Suresh Jayaraman <sjayaraman at suse.de> wrote:
> >> Jeff Layton wrote:
> >>> On Mon, 06 Apr 2009 22:33:09 +0530
> >>> Suresh Jayaraman <sjayaraman at suse.de> wrote:
> >>>
> >>>> Steve French wrote:
> >>>>> I don't think that we should be using these size assumptions
> >>>>> (multiples of UCS stringlen). � �A new UCS helper function should be
> >>>>> created that calculates how much memory would be needed for a
> >>>>> converted string - and we need to use this before we do the malloc and
> >>>>> string conversion. �In effect a strlen and strnlen function that takes
> >>>>> a target code page argument. �For strings that will never be more than
> >>>>> a hundred bytes this may not be needed, and we can use the length
> >>>>> assumption, but since mallocs in kernel can be so expensive I would
> >>>>> rather calculate the actual string length needed for the target.
> >>>> Ah, ok. I thought of writing a little function based on
> >>>> cifs_strncpy_to_host() and adding a comment like below:
> >>>>
> >>>> /* UniStrnlen() returns length in 16 bit Unicode �characters
> >>>> �* (UCS-2) with base length of 2 bytes per character. An UTF-8
> >>>> �* character can be up to 8 bytes maximum, so we need to
> >>>> �* allocate (len/2) * 4 bytes (or) (4 * len) bytes for the
> >>>> �* UTF-8 string */
> >>>>
> >>> I think you'll have to basically do the conversion twice. Walk the
> >>> string once and convert each character determine its length and then
> >>> discard it. Get the total and allocate that many bytes (plus the null
> >> Thanks for explaining. It seems adding a new UCS helper that computes
> >> length in bytes like the below would be good enough and make use of it
> >> to compute length for memory allocation.
> >>
> >>> termination), and do the conversion again into the buffer.
> >> Do we still need this conversion again?
> >>
> >>
> >> diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
> >> index 14eb9a2..0396bdc 100644
> >> --- a/fs/cifs/cifs_unicode.h
> >> +++ b/fs/cifs/cifs_unicode.h
> >> @@ -159,6 +159,23 @@ UniStrnlen(const wchar_t *ucs1, int maxlen)
> >> �}
> >>
> >> �/*
> >> + * UniStrnlenBytes: Return the length in bytes of a UTF-8 string
> >> + */
> >> +static inline size_t
> >> +UniStrnlenBytes(const unsigned char *str, int maxlen)
> >> +{
> >> + � � � size_t nbytes = 0;
> >> + � � � wchar_t *uni;
> >> +
> >> + � � � while (*str++) {
> >> + � � � � � � � /* convert each char, find its length and add to nbytes */
> >> + � � � � � � � if (char2uni(str, maxlen, uni) > 0)
> >> + � � � � � � � � � � � nbytes += strnlen(uni, NLS_MAX_CHARSET_SIZE);
> >> + � � � }
> >> + � � � return nbytes;
> >> +}
> >> +
> >> +/*
> >>
> >> We would still be needing the version (UniStrnlen) that returns length
> >> in characters also.* UTF-8 encoded UCS characters may be up to six bytes long, however the
> Unicode standard specifies no characters above 0x10ffff, so Unicode
> characters can only be up to four bytes long in UTF-8.
> >>
> >>> I'm not truly convinced this is really necessary though. You have to
> >>> figure that kmalloc is a power-of-two allocator. If you kmalloc 17
> >>> bytes, you get 32 anyway. You'll probably end up using roughly the same
> >>> amount of memory that you would have had you just estimated the size.
> > 
> > Shaggy made the comment that the string length calculation probably
> > won't matter (exact size vs. estimate) for most cases in cifs since
> > small allocations off the slab are fairly fast and it doesn't hurt to
> > overallocate by this amount.    Although for the typical cases a
> > Unicode string usually will shrink when converted to UTF-8 obviously
> > we have to allow for the maximum size conversion.
> > 
> > 
> 
> OTOH, felix-suse at fefe.de pointed me to utf-8 man page:
> 
> * UTF-8 encoded UCS characters may be up to six bytes long, however the
> Unicode standard specifies no characters above 0x10ffff, so Unicode
> characters can only be up to four bytes long in UTF-8.

Don't they mean 3 bytes there?

> 
> Going by this, length * 2 (original code) might still be sufficient?
> 

I think the safest thing is still to just calculate the exact lengths of
the buffer before allocating. It's hard to imagine that it'll have
significant performance impact.

If we later find that it does then we can look at optimizing those cases
for speed instead of size, but at least at that point we're working
with code that has the buffers sufficiently sized.

-- 
Jeff Layton <jlayton at redhat.com>