[PATCH] fix util/rfc1738.c

Tue Feb 20 23:20:42 UTC 2018

G'Day Douglas,

I can feel the burn from here!

Reviewed-by: Andrew Bartlett <abartlet at samba.org>

Please push!

Andrew Bartlett

On Wed, 2018-02-21 at 09:59 +1300, Douglas Bagnall via samba-technical
wrote:
> We have a file called rfc1738.c that implements a few functions for
> handling URL-style percent-encoding. It has a few unfortunate issues
> and shows little sign of having read the eponymous RFC, which in any
> case was obsoleted by RFC 3986 in 2004.
> 
> This set of patches introduces tests and fixes many problems,
> simplifying code along the way. Unused functions are removed, bizarre
> encodings are avoided, and error detection is introduced.
> 
> Percent-encoding involves escaping bytes as a percent symbol followed
> by two hex digits; for example "%" can be encoded as "%25", and "A" as
> "%41". RFC 1738 and RFC 3986 agree on this but differ on the canonical
> form of an escaped string: while "%" is necessarily encoded and "A" is
> normally not, "!" ("%21") is normally encoded in RFC 3986 but not in
> RFC 1738. The proposed patch follows the newer convention; our current
> behaviour is idiosyncratic. More serious is an apparent confusion with
> printf syntax, where we treat "%%" as a valid way of escaping "%". It
> is not. For decoding we are using sscanf(), which accepts strings like
> "% a" and "%-5" as escapes.
> 
> Errors in an encoded string have been silently ignored by the decoding
> routine. The proposed solution provides an API that may look odd:
> `char * rfc1738_unescape(char *s)` returns NULL on error, otherwise a
> pointer to the end of the string (like strtok, but for other reasons).
> Consider this kind of thing:
> 
>  char *s = "innocent message%00secret coda";
>  char *e = rfc1738_unescape(s);
>  if (e == NULL) {
>      //error
>  }
>  /* s is now "innocent message\00secret coda" */  printf("%s", s);
>  /* "innocent message" */  so if this might matter, you can go
> 
>  if (e == NULL || e - s != strlen(s)) {
>      //error
>  }
> 
> There are currently a number of encoding functions, but only one of
> them is used (or indeed usable), so the patch removes the others.
> 
> How did we get in this mess? At the top of rfc1738.c:
> 
> > * This file imported from the Squid project.  [...]
> > * - Andrew Bartlett Oct-2009
> 
> So, what were Squid thinking? Well, they were in a bit of flux. Some
> commits following our October 2009 fork:
> 
> 2009-11-04 Improve %nn parser
> 2009-11-04 regression in %nn change, failed to handle trailing %
> 2009-11-05 Another regression in %nn cleanup
> 2009-11-10 Polish rfc1738 library code. Add cppunit tests.
> 2010-03-20 Better fix for fromhex()
> 2010-03-20 Cleanup rfc1738.c
> 2010-03-20 Correct buffer calculation for snprintf
> 2010-04-09 Bug 2899: Restore lost rfc1738_unescape() data type
> 2011-03-05 RFC 1738 encoder upgraded action flags
> 2011-08-28 Bug 3295: broken escaping in rfc1738_do_escape
> 
> Which is *another* good argument for keeping your git master in a good
> state: someone could fork any bit of your most terrible code at any
> moment.
> 
> As it happens (and this why I was looking here), this is not the worst
> fork-and-forget percent-encoding library I have come across. Possibly
> because the format is so obviously simple and foolproof, nobody ever
> bothers writing tests or reading the spec or even writing their own
> routines, preferring to copy them from the last person who refused to
> think about it.
> 
> As a side-effect or preliminary step, these patches move some routines
> for fast safe accurate hex-digit parsing out of librpc/ndr/uuid.c into
> lib/util. I hope to use them in other places too.
> 
> Douglas
-- 
Andrew Bartlett
https://samba.org/~abartlet/
Authentication Developer, Samba Team         https://samba.org
Samba Development and Support, Catalyst IT   
https://catalyst.net.nz/services/samba