[clug] googlebot doing funny things in logs

Edward C. Lang edlang at edlang.org
Thu Jun 16 18:01:32 MDT 2011


On Thu, Jun 16, 2011 at 01:08:47PM +1000, Hal Ashburner wrote:
> But back to the original point how does google even know that
> /mythweb exists, given nothing links to it, it's not my usual
> location for it, it is, and I believe always has been behind a
> password, and until I forgot it on the machine changeover on the
> weekend there was a robots.txt disallowing everything from anyone if
> they're remotely polite - which googlebot claims to be and usually
> seem to be.
> 1) I must have originally had it placed at /mythweb and linked from
> my front page and have forgotten I did this over 2 years ago while
> exposing it via the firewall.
> 2) I must have not had a robots.txt at that time as well as now.
> 3) I must have let the password protection down at that time.
> 4) Googlebot must have proceeded to do "it's normal practice"
> following links and indexing pages with all of these things
> simultaneously in effect and also remembered all this about my
> domain for over 2 years, then followed the memory of links rather
> than actual links, while keeping the memory of links in its index
> when refused access.

Possibly stupid question, because I've never used mythtv: when
installed, does it ever use or load remote services? Or leak the fact
that it's installed?

 - Edward.



