[clug] Cleanfeed. Other Comsequences.
Alex Satrapa
grail at goldweb.com.au
Tue Oct 28 01:29:24 GMT 2008
On 28/10/2008, at 10:13 , Daniel Pittman wrote:
> Paul Matthews <plm at netspace.net.au> writes:
>
>> There are another set of /*possible*/ consequences that no-one has
>> touch on with the Cleanfeed proposals.
>>
>> Does a false positive block rate of 3% mean that 3% of Amazon's book
>> catalogue will be un-viewable?
>
> Almost certainly not.
Unless Amazon is white-listed, you can expect that a random sample of
pages from Amazon's book catalogue will end up being blocked. "Like
Water for Chocolate", "Brokeback Mountain", "Les Miserables": these
are a few titles that come to mind that deal with sex, (homo)sexuality
and child slavery that I recall were blocked by a program I was trying
a few years ago ("My family My rules" IIRC), but they weren't on
Amazon so I can't be sure that Amazon would be affected.
> The problem is that you are assuming the "false positive" rate is
> completely randomly distributed; in practice it means that a
> proportion
> of sites are completely unavailable despite being innocent, so 3
> percent
> of *websites*, not three percent of *content of all websites*.
You are assuming that the blocking is done by site, when it is in fact
done by page - the filter checks the content in the page as it is
being received and makes a decision on whether or not the page is
"safe".
If a web page expresses any information or opinion about sex,
sexuality, death, or fitness for Government, you can expect it would
end up triggering some rules in your filtering software of choice.
These rules are complex, thus not easily predictable. So for lack of
omniscience, the occurrence of false positives ends up looking random
- when reviewing pages after the fact it is sometimes obvious why the
filter blocked the pages, most of the time you have to check what
rules were triggered only to find out that there are a whole bunch of
slang words for genitalia or faeces that you never knew existed.
Alex
More information about the linux
mailing list