[PATCH] fix util/rfc1738.c

Wed Feb 21 22:22:30 UTC 2018

Hi Douglas,

Not related to your patches, but just to thank you for your
mails -> I am having tons of fun reading them :)

Keep up your awesome work!

Cheers,
Kamen

On 02/20/2018 10:59 PM, Douglas Bagnall via samba-technical wrote:
> We have a file called rfc1738.c that implements a few functions for
> handling URL-style percent-encoding. It has a few unfortunate issues
> and shows little sign of having read the eponymous RFC, which in any
> case was obsoleted by RFC 3986 in 2004.
>
> This set of patches introduces tests and fixes many problems,
> simplifying code along the way. Unused functions are removed, bizarre
> encodings are avoided, and error detection is introduced.
>
> Percent-encoding involves escaping bytes as a percent symbol followed
> by two hex digits; for example "%" can be encoded as "%25", and "A" as
> "%41". RFC 1738 and RFC 3986 agree on this but differ on the canonical
> form of an escaped string: while "%" is necessarily encoded and "A" is
> normally not, "!" ("%21") is normally encoded in RFC 3986 but not in
> RFC 1738. The proposed patch follows the newer convention; our current
> behaviour is idiosyncratic. More serious is an apparent confusion with
> printf syntax, where we treat "%%" as a valid way of escaping "%". It
> is not. For decoding we are using sscanf(), which accepts strings like
> "% a" and "%-5" as escapes.
>
> Errors in an encoded string have been silently ignored by the decoding
> routine. The proposed solution provides an API that may look odd:
> `char * rfc1738_unescape(char *s)` returns NULL on error, otherwise a
> pointer to the end of the string (like strtok, but for other reasons).
> Consider this kind of thing:
>
>   char *s = "innocent message%00secret coda";
>   char *e = rfc1738_unescape(s);
>   if (e == NULL) {
>       //error
>   }
>   /* s is now "innocent message\00secret coda" */  printf("%s", s);
>   /* "innocent message" */  so if this might matter, you can go
>
>   if (e == NULL || e - s != strlen(s)) {
>       //error
>   }
>
> There are currently a number of encoding functions, but only one of
> them is used (or indeed usable), so the patch removes the others.
>
> How did we get in this mess? At the top of rfc1738.c:
>
>> * This file imported from the Squid project.  [...]
>> * - Andrew Bartlett Oct-2009
> So, what were Squid thinking? Well, they were in a bit of flux. Some
> commits following our October 2009 fork:
>
> 2009-11-04 Improve %nn parser
> 2009-11-04 regression in %nn change, failed to handle trailing %
> 2009-11-05 Another regression in %nn cleanup
> 2009-11-10 Polish rfc1738 library code. Add cppunit tests.
> 2010-03-20 Better fix for fromhex()
> 2010-03-20 Cleanup rfc1738.c
> 2010-03-20 Correct buffer calculation for snprintf
> 2010-04-09 Bug 2899: Restore lost rfc1738_unescape() data type
> 2011-03-05 RFC 1738 encoder upgraded action flags
> 2011-08-28 Bug 3295: broken escaping in rfc1738_do_escape
>
> Which is *another* good argument for keeping your git master in a good
> state: someone could fork any bit of your most terrible code at any
> moment.
>
> As it happens (and this why I was looking here), this is not the worst
> fork-and-forget percent-encoding library I have come across. Possibly
> because the format is so obviously simple and foolproof, nobody ever
> bothers writing tests or reading the spec or even writing their own
> routines, preferring to copy them from the last person who refused to
> think about it.
>
> As a side-effect or preliminary step, these patches move some routines
> for fast safe accurate hex-digit parsing out of librpc/ndr/uuid.c into
> lib/util. I hope to use them in other places too.
>
> Douglas