[clug] Re: linux Digest, Vol 70, Issue 48
Michael Tomkins
michft at gmail.com
Tue Oct 28 13:05:32 GMT 2008
On 28/10/2008, at 7:46 PM, linux-request at lists.samba.org wrote:
>
>> Going to www.theage.com.au involves 106 http requests, with a 3%
>> error
>> rate the chance of getting the whole page is only 97%^106=3.69%.
>
> Unlikely.
As someone who works behind such a clean-feed style filter, I can say
today I went and looked at a number of sites that were partially
blocked, I believe I average 20-50 false positives a day, the false
positives also tend to clump. The sheer complexity of a modern web-
page means a number of bits come from different sites and if ONE is
blocked it can;
- stop an image loading.
- stop a script from completing. or
- means browser just gets confused and shows a blank page AFTER
"finishing".
Google has ads that are blocked, Debian/Microsoft/IBM have forums
that are blocked, deity help you is you ever need to see a Wiki for
that code example / computing problem. Wikipedia's Page relative
search is blocked because it uses a remote page redirect.
http://en.wikipedia.org/wiki/Special:Search?search=Term&go=Go OK
http://en.wikipedia.org/w/index.php?title=Special%
3ASearch&search=Term&ns0=1&fulltext=Search Blocked
Oh but thats OK because benevolent master has thought of all that,
and won't block the "good" stuff. Anyway we are working on fixing the
rest.
So website X has a page with a script that loads ad A then loads item
B. The filter will just block A, that's the bad stuff, then the
script doesn't work and the item B is never seen. The page generally
hangs. (And I know this because I also can view the web from home).
For example I can load google maps in about 2 seconds from home, it
takes 1 min to timeout at work. That is not an 86% throughput. The
filter looks at every image and because maps.google dynamic they are
all "New". You can't do banking at Lunch time because bank X loads a
state blocker so you don't click withdraw/deposit twice. Everything
on the page is "new" and it's 1 min to timeout for it to work because
the filter looks at it all, IN SEQUENCE. With the extra lunch
congestion the net just doesn't work. (mostly by people loading the
SAME login page 2-3 times, or the page who's script has hung or the
formatting/images/layout that just doesn't make it through).
List of other things that don't work with said filter;
- Scripts that load sequential things. (Website : what browser's page
should I sent you? Work : Blocked)
- Pages that load things then format (separate CSS, or just
Javascript then CSS).
- Pages that rely on fast server round-trips (forum cookies for
"read" for example).
- Pages that are too new or too old for the filter. (yes it expects
html, not <insert new thing here>)
- Things that arrive out of order. (The filter appears to processes a
page in order, one item at a time)
Do not believe the PHB's*. There are some problems that technology
cannot solve. Yes, a pig can be made fly with enough thrust. This a
bad idea because the landings are splat city on the pigs, and the
pigs get nervous mid flight.
Yes you can filter EVERY packet on the internet. This will not make
you friends or influence people. Because;
- it does not make you omnipotent. Information is not Understanding.
- the cost will be HUGE (unless I'm tendering for the project, then
watch the overruns).
- you will only ever find what is programmed.
- you will only ever find what you found before.
- you will only ever find what the committee you appoint is looking
for. (Spam this week, child porn the next.)
Mr Rudd, how many of those voters that gave you government are going
to like someone telling them what they can and cannot look at on the
net. I think you like China and their great firewall a little too much.
"Computers are useless, they cannot ask questions" - Picasso
PS looking for a non-PHB workplace.
*Pointy Haired Bosses
--
Michael Tomkins
gmail michft
html mich431.net
+61 (0) 408 172 142
More information about the linux
mailing list