[clug] Re: linux Digest, Vol 70, Issue 48

Michael Tomkins michft at gmail.com
Tue Oct 28 13:05:32 GMT 2008


On 28/10/2008, at 7:46 PM, linux-request at lists.samba.org wrote:

>
>> Going to www.theage.com.au involves 106 http requests, with a 3%  
>> error
>> rate the chance of getting the whole page is only 97%^106=3.69%.
>
> Unlikely.



As someone who works behind such a clean-feed style filter, I can say  
today I went and looked at a number of sites that were partially  
blocked, I believe I average 20-50 false positives a day, the false  
positives also tend to clump. The sheer complexity of a modern web- 
page means a number of bits come from different sites and if ONE is  
blocked it can;
- stop an image loading.
- stop a script from completing. or
- means browser just gets confused and shows a blank page AFTER  
"finishing".


Google has ads that are blocked, Debian/Microsoft/IBM have forums  
that are blocked, deity help you is you ever need to see a Wiki for  
that code example / computing problem. Wikipedia's Page relative  
search is blocked because it uses a remote page redirect.
http://en.wikipedia.org/wiki/Special:Search?search=Term&go=Go OK
http://en.wikipedia.org/w/index.php?title=Special% 
3ASearch&search=Term&ns0=1&fulltext=Search Blocked


Oh but thats OK because benevolent master has thought of all that,  
and won't block the "good" stuff. Anyway we are working on fixing the  
rest.


So website X has a page with a script that loads ad A then loads item  
B. The filter will just block A, that's the bad stuff, then the  
script doesn't work and the item B is never seen. The page generally  
hangs. (And I know this because I also can view the web from home).


For example I can load google maps in about 2 seconds from home, it  
takes 1 min to timeout at work. That is not an 86% throughput. The  
filter looks at every image and because maps.google dynamic they are  
all "New". You can't do banking at Lunch time because bank X loads a  
state blocker so you don't click withdraw/deposit twice. Everything  
on the page is "new" and it's 1 min to timeout for it to work because  
the filter looks at it all, IN SEQUENCE. With the extra lunch  
congestion the net just doesn't work. (mostly by people loading the  
SAME login page 2-3 times, or the page who's script has hung or the  
formatting/images/layout that just doesn't make it through).


List of other things that don't work with said filter;
- Scripts that load sequential things. (Website : what browser's page  
should I sent you? Work : Blocked)
- Pages that load things then format (separate CSS, or just  
Javascript then CSS).
- Pages that rely on fast server round-trips (forum cookies for  
"read" for example).
- Pages that are too new or too old for the filter. (yes it expects  
html, not <insert new thing here>)
- Things that arrive out of order. (The filter appears to processes a  
page in order, one item at a time)


Do not believe the PHB's*. There are some problems that technology  
cannot solve. Yes, a pig can be made fly with enough thrust. This a  
bad idea because the landings are splat city on the pigs, and the  
pigs get nervous mid flight.


Yes you can filter EVERY packet on the internet. This will not make  
you friends or influence people. Because;
- it does not make you omnipotent. Information is not Understanding.
- the cost will be HUGE (unless I'm tendering for the project, then  
watch the overruns).
- you will only ever find what is programmed.
- you will only ever find what you found before.
- you will only ever find what the committee you appoint is looking  
for. (Spam this week, child porn the next.)


Mr Rudd, how many of those voters that gave you government are going  
to like someone telling them what they can and cannot look at on the  
net. I think you like China and their great firewall a little too much.


"Computers are useless, they cannot ask questions" - Picasso


PS looking for a non-PHB workplace.
*Pointy Haired Bosses
--
Michael Tomkins
gmail michft
html mich431.net
+61 (0) 408 172 142





More information about the linux mailing list