[Samba] Conversion error: Illegal multibyte sequence

Jeremy Allison jra at samba.org
Tue Sep 10 12:03:46 MDT 2013


On Tue, Sep 10, 2013 at 07:44:39PM +0200, Volker Lendecke wrote:
> On Tue, Sep 10, 2013 at 09:48:57AM -0700, Jeremy Allison wrote:
> > It's an old, old check back from when SJIS and EUC were
> > common multi-byte systems.
> > 
> > SJIS especially has the property that the second byte
> > can contain a value <127 as part of the 2-byte char
> > set. So if CH_UNIX is set to a char set with such a
> > property we can't walk it as bytes, but must see if
> > a pair of values [0] (> 0x80) [1] (any value) can be
> > converted into a valid multi-byte char, in which case
> > we ignore it (otherwise we might look at the second
> > byte value of ':' or something and consider it invalid).
> > 
> > I thought about removing this and re-writing it, but
> > it made my brain hurt (and might break some very old
> > systems :-). So moving to next_codepoint() which checks
> > the next char len without causing the conversion error
> > messages seemed the simplest fix :-).
> 
> Thanks! +1 from me.

Actually - your question made me think about this
some more and I think I can easily simplify this - due
to the fact that no encoding with a length > 1 can
contain invalid characters (which are all ASCII < 0x80).

So here is the fix I'd like to commit to master, and
then I'll create a bug and back-port for 4.1.0, 4.0.next
and 3.6.next.

Please re-review (sorry :-).

Jeremy


More information about the samba mailing list