[clug] googlebot doing funny things in logs

Hal Ashburner hal at ashburner.info
Tue Jun 14 19:55:30 MDT 2011

I changed my server on the weekend and after restoring the mythtv 
database forgot to put the usual
User-agent: *
Disallow: /
in a robots.txt file.
I was just glancing through some logs and in amongst one or two 
seemingly fairly unsophisticated attempts at entry, google bot made an 

It first asked for robots.txt, which seems good manners (and which 
wasn't present)
Then asked it for
robots.txt again
only then did it ask for

I think I've got all the mythweb stuff happily behind password 
protection so I'm guessing it got nothing out of those and the log says 
apache happily reuturned a 401 (Unauthorized).

But on an intellectual level, WTF?

later it asked for /mythweb/settings/weather

Ok I'm not a professional web sysadmin by any means so I'll ask people 
who here who might also run myth and know more about the specifics.
Why on earth is google asking for my mythtv web interface in a pointed 
way like it knows it's there? - - [13/Jun/2011:01:57:11 +1000] "GET 
/mythweb/settings/database HTTP/1.1" 401 343 "-" "Mozilla/5.0 
(compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Google employ some of the rudest and most dishonest recruiters who have 
ever contacted me, ok I can make peace with them over that because 
recruiters are what they are because of the structure of what they do. 
Sure it'd be a nice thing if google disrupted that industry or did 
something different in their recruiting...
But why are engos, programming web indexing bots trying mess with my 
myth settings? Push the door to see if it opens and collect stats is my 
best guess. Anyone seen anything similar or have better ideas about what 
is going on?

I'm a little weirded out by it, in truth.

(One of the dreaded CLUG "list-only" members ;-) The list is brilliant 

More information about the linux mailing list