[clug] googlebot doing funny things in logs
hal at ashburner.info
Tue Jun 14 21:08:06 MDT 2011
On 15/Jun/11 12:53 PM, Angus Gratton wrote:
> Hi Hal,
> That's interesting, in a somewhat creepy way for you.
> On Wed, 2011-06-15 at 11:55 +1000, Hal Ashburner wrote:
>> Why on earth is google asking for my mythtv web interface in a pointed
>> way like it knows it's there?
> Well, it certainly knows you are there:
Most of these pages I didn't actually know for certain (and some at all)
exited and I'm pretty sure I've never visited them.
I've learned something about mythweb features just by looking through
the 4 pages of links google lists there.
Nobody else has access.
> First possibility I thought is someone linked to those pages. However,
> searching link:ashburner.info/mythweb doesn't show anything.
> Second possibility I can think of is you (or someone who accesses your
> Mythweb) uses the Google Toolbar with the Google History feature, and
> sent those URLs to Google as they were visited.
I really don't /think/ so. I do use chrome and have 2 android devices.
> I think that's particularly likely given one action URL
> (?RESET_SKIN&RESET_TMPL) is recorded there as well.
> I actually thought Chrome sent this kind of URL info too, but according
> URLs that 404, some other info.
> Given google seems to have indexed a lot of the URLs, but with no
> content and probably not all the available URLs then my guess would be
> that - someone's visited those pages and their browser history has
> gotten back to Google, somehow.
Someone who isn't me? I don't think anyone else has used it...
>> But why are engos, programming web indexing bots trying mess with my
>> myth settings? Push the door to see if it opens and collect stats is
>> best guess. Anyone seen anything similar or have better ideas about
>> is going on?
> I can see why it's creepy, but I don't think it's targeted and I doubt
> they tried /mythweb out of the blue (can you see it trying any other
> speculative URLs?) I'd predict GoogleBot just has a list of URLs and
> it's going to see if it can index them (which it can, given that there's
> no /robots.txt there.)
> That is provided at some point someone agreed to whatever Google product
> is sending those browsing URLs back in the first place. If you're sure
> that never happened, then...
Well not 100% sure but I'm very sure I can't think of anything right now...
More information about the linux