[linux-cifs-client] [PATCH] cifs: Fix insufficient memory allocation for nativeFileSystem field

Mon Apr 6 17:03:09 GMT 2009

Steve French wrote:
> I don't think that we should be using these size assumptions
> (multiples of UCS stringlen).    A new UCS helper function should be
> created that calculates how much memory would be needed for a
> converted string - and we need to use this before we do the malloc and
> string conversion.  In effect a strlen and strnlen function that takes
> a target code page argument.  For strings that will never be more than
> a hundred bytes this may not be needed, and we can use the length
> assumption, but since mallocs in kernel can be so expensive I would
> rather calculate the actual string length needed for the target.

Ah, ok. I thought of writing a little function based on
cifs_strncpy_to_host() and adding a comment like below:

/* UniStrnlen() returns length in 16 bit Unicode  characters
 * (UCS-2) with base length of 2 bytes per character. An UTF-8
 * character can be up to 8 bytes maximum, so we need to
 * allocate (len/2) * 4 bytes (or) (4 * len) bytes for the
 * UTF-8 string */

and make the callers use this. May be Steve's idea is a better one.
However, I could find any utility function inside kernel source that
calculates length of a UTF-8 string. It's not clear how can I find the
length of a UTF-8 string.

Thanks,

> On Mon, Apr 6, 2009 at 9:30 AM, simo <idra at samba.org> wrote:
>> On Mon, 2009-04-06 at 10:08 -0400, Jeff Layton wrote:
>>> On Mon, 06 Apr 2009 14:02:25 +0000
>>> simo <idra at samba.org> wrote:
>>>
>>>> On Mon, 2009-04-06 at 09:22 -0400, Jeff Layton wrote:
>>>>>> True. Seems I was influenced by a comment in fs/cifs/sess.c
>>>>>>
>>>>>> 313 � � � � /* UTF-8 string will not grow more than four times as
>>>>> big as
>>>>>> UCS-16 */
>>>>>>
>>>>> That looks like that's wrong (or at least potentially so). AFAICT,
>>>>> UTF-8
>>>>> allows up to 6 bytes per character. I suppose that it's possible that
>>>>> none of the characters allowed in UCS-16 will ever translate to a
>>>>> character that's more than 4 bytes, but I'd like to see that confirmed
>>>>> before we depend on it.
>>>> I wonder whats UCS-16 tho, UCS-16 does not exist :)
>>>>
>>>> It may be either UTF16 or UCS2.
>>>> Both these charsets have a base length of 2 bytes per character. UCS2 is
>>>> limited to 65535 values, while UTF-16 is a multi-word charset.
>>>>
>>>> If the comment above is to be read as "UTF-8 string will not grow more
>>>> than four times as big as UCS-2/UTF-16" then what it is saying is that
>>>> at maximum an UTF-8 chars can be 4 words (or 8 bytes long).
>>>> IIRC UTF-8 chars length is maximum 6 bytes, so an 8 byte per char max
>>>> estimate seem correct.
>>>>
>>>> If "length" in the code was the length in bytes, and not the number of
>>>> characters, of an UCS-2/UTF-16 string than 4*length should, indeed be
>>>> long enough.
>>>>
>>> Ahh, you're correct. I guess I'm accustomed to thinking about lengths in
>>> bytes. I guess though this means that 4*length is allocating too
>>> much...wouldn't 3*length then be right (assuming NULL termination is
>>> also accounted for) ?
>>>
>>> Regardless of the math, I'd like to see all of this moved into some
>>> nice, well commented helper functions instead of being open-coded all
>>> over the place. It's just too easy to get this stuff wrong. Let's solve
>>> this in a way that makes it easier in the future.
>> Yes, this is the way to go.
>>

-- 
Suresh Jayaraman