[clug] Substring repetition detection

David Ananian-Cooper davidac17 at yahoo.com
Wed Jun 25 17:27:03 EST 2003


hey

On Wed, 25 Jun 2003 16:37, David Price wrote:
> I haven't tried this, so I don't know if it would work, but...
>
> Record beside each letter, the number of characters since each
> previous occurrence of that letter.  eg:
>
> a  b  c  d  e  f  g  h  d  e  f  g  h  d  e  f  g  h  i  j  k
>                         5  5  5  5  5  5  5  5  5  5
>                                        10 10 10 10 10
>
> Then look for continuous sequences of the same number, where the
> length of the sequence is >= the number.  So here, the 10s can't
> count, since there are only 5 of them, but the 5s can count because
> there are 10 of them.  The period of repetition is the number of the
> sequence.  Presumably, you would want the longest sequences possible,
> so you'd start with the highest number, check if it works, then work
> your way down.
>
> I have no idea if this would work for all possible problems, but I
> can't think of anything it wouldn't work for at the moment.

sounds like this would work - u'd just have to careful with sequences which 
repeat like 1 and a half times - u'd have to make sure u ignore the last half 
sequence

e.g.

a b c d a b c d a b z f g
- - - - 4 4 4 4 4 4 - - -

here 4 >= 6, but the last 2 4's should not be considered as part of the 
repeating pattern

david ac

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
Url : http://lists.samba.org/archive/linux/attachments/20030625/fc3a5839/attachment.bin


More information about the linux mailing list