[linux-cifs-client] [PATCH] cifs: Fix insufficient memory allocation for nativeFileSystem field

Tue Apr 7 15:22:34 GMT 2009

On Tue, 7 Apr 2009 09:59:46 -0500
Steve French <smfrench at gmail.com> wrote:

> On Tue, Apr 7, 2009 at 8:15 AM, Suresh Jayaraman <sjayaraman at suse.de> wrote:
> > Jeff Layton wrote:
> >> On Mon, 06 Apr 2009 22:33:09 +0530
> >> Suresh Jayaraman <sjayaraman at suse.de> wrote:
> >>
> >>> Steve French wrote:
> >>>> I don't think that we should be using these size assumptions
> >>>> (multiples of UCS stringlen).    A new UCS helper function should be
> >>>> created that calculates how much memory would be needed for a
> >>>> converted string - and we need to use this before we do the malloc and
> >>>> string conversion.  In effect a strlen and strnlen function that takes
> >>>> a target code page argument.  For strings that will never be more than
> >>>> a hundred bytes this may not be needed, and we can use the length
> >>>> assumption, but since mallocs in kernel can be so expensive I would
> >>>> rather calculate the actual string length needed for the target.
> >>> Ah, ok. I thought of writing a little function based on
> >>> cifs_strncpy_to_host() and adding a comment like below:
> >>>
> >>> /* UniStrnlen() returns length in 16 bit Unicode  characters
> >>>  * (UCS-2) with base length of 2 bytes per character. An UTF-8
> >>>  * character can be up to 8 bytes maximum, so we need to
> >>>  * allocate (len/2) * 4 bytes (or) (4 * len) bytes for the
> >>>  * UTF-8 string */
> >>>
> >>
> >> I think you'll have to basically do the conversion twice. Walk the
> >> string once and convert each character determine its length and then
> >> discard it. Get the total and allocate that many bytes (plus the null
> >
> > Thanks for explaining. It seems adding a new UCS helper that computes
> > length in bytes like the below would be good enough and make use of it
> > to compute length for memory allocation.
> >
> >> termination), and do the conversion again into the buffer.
> >
> > Do we still need this conversion again?
> >
> >
> > diff --git a/fs/cifs/cifs_unicode.h b/fs/cifs/cifs_unicode.h
> > index 14eb9a2..0396bdc 100644
> > --- a/fs/cifs/cifs_unicode.h
> > +++ b/fs/cifs/cifs_unicode.h
> > @@ -159,6 +159,23 @@ UniStrnlen(const wchar_t *ucs1, int maxlen)
> >  }
> >
> >  /*
> > + * UniStrnlenBytes: Return the length in bytes of a UTF-8 string
> > + */
> > +static inline size_t
> > +UniStrnlenBytes(const unsigned char *str, int maxlen)
> > +{
> > +       size_t nbytes = 0;
> > +       wchar_t *uni;
> > +
> > +       while (*str++) {
> > +               /* convert each char, find its length and add to nbytes */
> > +               if (char2uni(str, maxlen, uni) > 0)
> > +                       nbytes += strnlen(uni, NLS_MAX_CHARSET_SIZE);
> > +       }
> > +       return nbytes;
> > +}
> > +
> > +/*
> >
> > We would still be needing the version (UniStrnlen) that returns length
> > in characters also.
> >
> >>
> >> I'm not truly convinced this is really necessary though. You have to
> >> figure that kmalloc is a power-of-two allocator. If you kmalloc 17
> >> bytes, you get 32 anyway. You'll probably end up using roughly the same
> >> amount of memory that you would have had you just estimated the size.
> 
> Shaggy made the comment that the string length calculation probably
> won't matter (exact size vs. estimate) for most cases in cifs since
> small allocations off the slab are fairly fast and it doesn't hurt to
> overallocate by this amount.    Although for the typical cases a
> Unicode string usually will shrink when converted to UTF-8 obviously
> we have to allow for the maximum size conversion.
> 
> 
> Except for long lived strings, for temporary Unicode strings
> conversions that start with a Unicode string length of 256 wchars long
> or shorter, probably is no point in calculating the string length
> since the slab allocation for the worst case target is fast enough.
> Obviously for path lengths though it can make a huge difference
> (\\server\share\directory1\directory2\directory3\ etc.) and we ought
> to calculate the exact length.
> 

The only reason not to calculate the entire string length before the
allocation is to avoid the overhead of walking the string twice. The
only place this is likely to make a significant difference is the
readdir codepath (since that involves a lot of string conversion).
That's also the place where we have to store significant numbers of
pathnames. Therefore, I submit that there's little benefit in skipping
the length calculation anywhere. The places where we can do it are too
small and infrequent to matter.

For simplicity's sake I suggest that we just have everything just
calculate the exact string length and do the allocation that way.
Trying to get clever is more likely to lead us back into problems where
the buffer is too short.

-- 
Jeff Layton <jlayton at redhat.com>