[clug] awk or Perl regex question
fj.whittle at gmail.com
fj.whittle at gmail.com
Sun Jul 21 03:29:02 UTC 2019
In Perl I'd be more tempted to extract the part needed and print it...
Assuming Unicode to match the ’ in the O’SHEA example - should this be
O'SHEA? But Steve said single quote... Data came from MS Office?
This will do it in one regex, for more than just ANSI:
perl -CS -nE '/(?=\S) \b (?<surname> [\p{Lu}\x{2019}\s]+) (?<=\S) \s*
$/x and say $+{surname}' < names.txt
\p{Lu} is all characters matching the Unicode Uppercase_letter
property, \x{2019} is ’
-CS turns on UTF-8 encoding for standard I/O streams.
Of course this will still only work for writing systems where uppercase
is even a thing. If any of your names are in e.g. Chinese (and not
pinyin) you're out of luck, because it will give you the whole name.
Or in Perl6:
perl6 -ne '/« $<surname> = <:Lu + [’\s]>+ » \s* $/ and put
$<surname>' < names.txt
(« is a start of word assertion, » end of word)
– Francis
Overcomplicating things since forever ago
On Sun, 2019-07-21 at 07:23 +1000, Kim Holburn via linux wrote:
> Does this do what you want?
>
> perl -p -e 's/\b[A-Z][a-z]+\b//g;s#^[/\s]*##;' < names.txt
>
> I sent this but it never seemed to have arrived. Perhaps filtered by
> AV?
>
> > On 2019/Jul/20, at 6:08 pm, steve jenkin via linux <
> > linux at lists.samba.org <mailto:linux at lists.samba.org>> wrote:
> >
> > In awk, I’m trying to remove First Names from Full Name strings.
> > There might be multiple first names and alternative separated by a
> > ‘/‘
> >
> > Surnames as UPPERCASE and happen at the end of the string [and may
> > contain single quote (O’SHEA) or a blank (DE SMETS).
> >
> > Currently I’ve got a working version doing two different subs, the
> > first is unanchored, the second is anchored to the start of the
> > string (^)
> >
> > sub(/Mc[A-Z][a-z]* /, "", A[1]);
> > sub(/^([A-Z][a-z\047]*[ /])+/, "", A[1]);
> >
> > I’ve tried this regex, unachored and not, with ‘?’ for 0 or 1
> > repeats of the group or ‘*’ for 0 or more repeats.
> >
> > (Mc)?([A-Z][a-z\047]*[ /])+
> >
> > Any suggestions for other things to try?
> >
> > --
> > Steve Jenkin, IT Systems and Design
> > 0412 786 915 (+61 412 786 915)
> > PO Box 38, Kippax ACT 2615, AUSTRALIA
> >
> > mailto:sjenkin at canb.auug.org.au <mailto:sjenkin at canb.auug.org.au>
> > http://members.tip.net.au/~sjenkin <
> > http://members.tip.net.au/~sjenkin>
> >
> >
> > --
> > linux mailing list
> > linux at lists.samba.org <mailto:linux at lists.samba.org>
> > https://lists.samba.org/mailman/listinfo/linux
>
> --
> Kim Holburn
> IT Network & Security Consultant
> T: +61 2 61402408 M: +61 404072753
> mailto:kim at holburn.net <mailto:kim at holburn.net> aim://kimholburn
> <aim://kimholburn>
> skype://kholburn <skype://kholburn> - PGP Public Key on request
>
>
>
More information about the linux
mailing list