[clug] Cleanfeed. Other Comsequences.

Alex Satrapa grail at goldweb.com.au
Tue Oct 28 01:29:24 GMT 2008

On 28/10/2008, at 10:13 , Daniel Pittman wrote:

> Paul Matthews <plm at netspace.net.au> writes:
>> There are another set of /*possible*/ consequences that no-one has
>> touch on with the Cleanfeed proposals.
>> Does a false positive block rate of 3% mean that 3% of Amazon's book
>> catalogue will be un-viewable?
> Almost certainly not.

Unless Amazon is white-listed, you can expect that a random sample of  
pages from Amazon's book catalogue will end up being blocked. "Like  
Water for Chocolate", "Brokeback Mountain", "Les Miserables": these  
are a few titles that come to mind that deal with sex, (homo)sexuality  
and child slavery that I recall were blocked by a program I was trying  
a few years ago ("My family My rules" IIRC), but they weren't on  
Amazon so I can't be sure that Amazon would be affected.

> The problem is that you are assuming the "false positive" rate is
> completely randomly distributed; in practice it means that a  
> proportion
> of sites are completely unavailable despite being innocent, so 3  
> percent
> of *websites*, not three percent of *content of all websites*.

You are assuming that the blocking is done by site, when it is in fact  
done by page - the filter checks the content in the page as it is  
being received and makes a decision on whether or not the page is  

If a web page expresses any information or opinion about sex,  
sexuality, death, or fitness for Government, you can expect it would  
end up triggering some rules in your filtering software of choice.  
These rules are complex, thus not easily predictable. So for lack of  
omniscience, the occurrence of false positives ends up looking random  
- when reviewing pages after the fact it is sometimes obvious why the  
filter blocked the pages, most of the time you have to check what  
rules were triggered only to find out that there are a whole bunch of  
slang words for genitalia or faeces that you never knew existed.


More information about the linux mailing list