[PATCH] RFC: shell-safe version of string_sub()?

Steve Langasek vorlon at netexpress.net
Wed Oct 25 22:22:28 GMT 2000


On Wed, 25 Oct 2000, Peter Samuelson wrote:

> > I was thinking more about UTF-8. Unix syscalls all use char * for
> > string arguments; where they use unsigned char * UTF-8 can be used

> OK, well, I don't know the details of UTF-8 although I think I know the
> basic idea (variable-length encoding, like Huffman compression).
> However, unless it uses things like \0 (which you say it doesn't), I
> believe it should be safe for sh_string_sub().  Assuming your shell
> doesn't have a problem with 8-bit characters, which it might, I haven't
> checked.  (bash, as an example, has two input modes in interactive
> mode: either it can register 8-bit characters as themselves, or as key
> sequences involving 'meta' keys.  I assume that in non-interactive mode
> it always inputs them as regular 8-bit characters ... but I'm not
> sure.)

UTF-8 is explicitly designed to be backwards-compatible with
8-bit ASCII string-handling functions. No problems with \0, and most control
characters are also left out so they never represent anything other than
themselves...

Steve Langasek
postmodern programmer





More information about the samba-technical mailing list