[PATCH] fix util/rfc1738.c

Tue Feb 20 20:59:50 UTC 2018

We have a file called rfc1738.c that implements a few functions for
handling URL-style percent-encoding. It has a few unfortunate issues
and shows little sign of having read the eponymous RFC, which in any
case was obsoleted by RFC 3986 in 2004.

This set of patches introduces tests and fixes many problems,
simplifying code along the way. Unused functions are removed, bizarre
encodings are avoided, and error detection is introduced.

Percent-encoding involves escaping bytes as a percent symbol followed
by two hex digits; for example "%" can be encoded as "%25", and "A" as
"%41". RFC 1738 and RFC 3986 agree on this but differ on the canonical
form of an escaped string: while "%" is necessarily encoded and "A" is
normally not, "!" ("%21") is normally encoded in RFC 3986 but not in
RFC 1738. The proposed patch follows the newer convention; our current
behaviour is idiosyncratic. More serious is an apparent confusion with
printf syntax, where we treat "%%" as a valid way of escaping "%". It
is not. For decoding we are using sscanf(), which accepts strings like
"% a" and "%-5" as escapes.

Errors in an encoded string have been silently ignored by the decoding
routine. The proposed solution provides an API that may look odd:
`char * rfc1738_unescape(char *s)` returns NULL on error, otherwise a
pointer to the end of the string (like strtok, but for other reasons).
Consider this kind of thing:

 char *s = "innocent message%00secret coda";
 char *e = rfc1738_unescape(s);
 if (e == NULL) {
     //error
 }
 /* s is now "innocent message\00secret coda" */  printf("%s", s);
 /* "innocent message" */  so if this might matter, you can go

 if (e == NULL || e - s != strlen(s)) {
     //error
 }

There are currently a number of encoding functions, but only one of
them is used (or indeed usable), so the patch removes the others.

How did we get in this mess? At the top of rfc1738.c:

> * This file imported from the Squid project.  [...]
> * - Andrew Bartlett Oct-2009

So, what were Squid thinking? Well, they were in a bit of flux. Some
commits following our October 2009 fork:

2009-11-04 Improve %nn parser
2009-11-04 regression in %nn change, failed to handle trailing %
2009-11-05 Another regression in %nn cleanup
2009-11-10 Polish rfc1738 library code. Add cppunit tests.
2010-03-20 Better fix for fromhex()
2010-03-20 Cleanup rfc1738.c
2010-03-20 Correct buffer calculation for snprintf
2010-04-09 Bug 2899: Restore lost rfc1738_unescape() data type
2011-03-05 RFC 1738 encoder upgraded action flags
2011-08-28 Bug 3295: broken escaping in rfc1738_do_escape

Which is *another* good argument for keeping your git master in a good
state: someone could fork any bit of your most terrible code at any
moment.

As it happens (and this why I was looking here), this is not the worst
fork-and-forget percent-encoding library I have come across. Possibly
because the format is so obviously simple and foolproof, nobody ever
bothers writing tests or reading the spec or even writing their own
routines, preferring to copy them from the last person who refused to
think about it.

As a side-effect or preliminary step, these patches move some routines
for fast safe accurate hex-digit parsing out of librpc/ndr/uuid.c into
lib/util. I hope to use them in other places too.

Douglas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: rfc1738.patch
Type: text/x-patch
Size: 38605 bytes
Desc: not available
URL: <http://lists.samba.org/pipermail/samba-technical/attachments/20180221/2b5dcf6e/rfc1738-0001.bin>