[clug] googlebot doing funny things in logs

Hal Ashburner hal at ashburner.info
Thu Jun 16 18:14:42 MDT 2011


Scrapes tv guides. In aus often using a prgram called shepherd. That
shouldn't leak its existence to any third party AFAICT.
On 17/06/2011 10:01 AM, "Edward C. Lang" <edlang at edlang.org> wrote:
> Hi,
>
> On Thu, Jun 16, 2011 at 01:08:47PM +1000, Hal Ashburner wrote:
>> But back to the original point how does google even know that
>> /mythweb exists, given nothing links to it, it's not my usual
>> location for it, it is, and I believe always has been behind a
>> password, and until I forgot it on the machine changeover on the
>> weekend there was a robots.txt disallowing everything from anyone if
>> they're remotely polite - which googlebot claims to be and usually
>> seem to be.
>>
>> 1) I must have originally had it placed at /mythweb and linked from
>> my front page and have forgotten I did this over 2 years ago while
>> exposing it via the firewall.
>> 2) I must have not had a robots.txt at that time as well as now.
>> 3) I must have let the password protection down at that time.
>> 4) Googlebot must have proceeded to do "it's normal practice"
>> following links and indexing pages with all of these things
>> simultaneously in effect and also remembered all this about my
>> domain for over 2 years, then followed the memory of links rather
>> than actual links, while keeping the memory of links in its index
>> when refused access.
>
> Possibly stupid question, because I've never used mythtv: when
> installed, does it ever use or load remote services? Or leak the fact
> that it's installed?
>
> - Edward.
>
> --
>
> http://edlang.org/


More information about the linux mailing list