[clug] googlebot doing funny things in logs

Angus Gratton gus at projectgus.com
Tue Jun 14 20:53:58 MDT 2011


Hi Hal,

That's interesting, in a somewhat creepy way for you.

On Wed, 2011-06-15 at 11:55 +1000, Hal Ashburner wrote:
> Why on earth is google asking for my mythtv web interface in a pointed 
> way like it knows it's there?
> 

Well, it certainly knows you are there:

http://www.google.com.au/search?q=site:ashburner.info&hl=en&prmd=ivns&ei=QRz4TdDXAuTZiAK85an9DA&start=0&sa=N&filter=0&biw=1400&bih=698

First possibility I thought is someone linked to those pages. However,
searching link:ashburner.info/mythweb doesn't show anything.

Second possibility I can think of is you (or someone who accesses your
Mythweb) uses the Google Toolbar with the Google History feature[1], and
sent those URLs to Google as they were visited.

I think that's particularly likely given one action URL
(?RESET_SKIN&RESET_TMPL) is recorded there as well.

I actually thought Chrome sent this kind of URL info too, but according
to the Chrome privacy policy it doesn't seem likely. Chrome will send
URLs that 404, some other info.[2]

Given google seems to have indexed a lot of the URLs, but with no
content and probably not all the available URLs then my guess would be
that - someone's visited those pages and their browser history has
gotten back to Google, somehow.


> But why are engos, programming web indexing bots trying mess with my 
> myth settings? Push the door to see if it opens and collect stats is
> my 
> best guess. Anyone seen anything similar or have better ideas about
> what 
> is going on?

I can see why it's creepy, but I don't think it's targeted and I doubt
they tried /mythweb out of the blue (can you see it trying any other
speculative URLs?)[3] I'd predict GoogleBot just has a list of URLs and
it's going to see if it can index them (which it can, given that there's
no /robots.txt there.)

That is provided at some point someone agreed to whatever Google product
is sending those browsing URLs back in the first place. If you're sure
that never happened, then...


- Angus

[1]
http://www.google.com/support/toolbar/bin/answer.py?hl=en&answer=78184

[2]
http://www.google.com/chrome/intl/en/privacy.html

[3] If it makes you feel any better, my http logs show no speculative
requests for /mythweb


> 66.249.72.240 - - [13/Jun/2011:01:57:11 +1000] "GET 
> /mythweb/settings/database HTTP/1.1" 401 343 "-" "Mozilla/5.0 
> (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
> But why are engos, programming web indexing bots trying mess with my 
> myth settings? Push the door to see if it opens and collect stats is my 
> best guess. Anyone seen anything similar or have better ideas about what 
> is going on?
> 
> I'm a little weirded out by it, in truth.
> 
> Hal
> (One of the dreaded CLUG "list-only" members ;-) The list is brilliant 
> imho).




More information about the linux mailing list