[clug] googlebot doing funny things in logs
hal at ashburner.info
Wed Jun 15 21:08:47 MDT 2011
On 16/Jun/11 9:54 AM, Alex Satrapa wrote:
> On 15/06/2011, at 16:47 , Scott Ferguson wrote:
>> But please do implement .htaccess files, and check those permissions -
>> there's enough malware servers out there infecting Windows machines that
>> then slow my firewall down with their incessant attempts to spread their
>> diseases. ;-p
> I'd also suggest the following advice:
> Don't leave stuff on an Internet-facing host that you don't want to be accessible over the Internet. Your home network is not too small to matter. Your home network is not too small to be noticed.
> It's really simple: someone out there already knows a vulnerability which you and your OS publisher haven't heard of yet. If you start putting complex applications intended for individual use on an Internet-facing host, chances are you're opening a vulnerability which will end up being exploited by someone like lulzsec. The more junk you have installed on the Internet-facing host — regardless of whether it's listening to connections or just installed and "doing nothing — the more opportunities an intruder has of using your machine for their own purposes.
> There's a lot more to securing a machine than simply installing a firewall and DROPping every packet you don't like.
Tee hee! There's even more to securing a machine than that! :P
You've actually got to unplug it completely from the network "as Pwn2own
has shown", just because you're running no services doesn't mean you
can't be hacked. (Ooohhhh sorry ESR, cracked - now let me explain to the
world that you reckon cracked means hacked, while hacked means something
that doesn't lead the conversation to ESR being a gun nut who writes
articles with titles like "Sex tips for geeks") ;-)
Then, because of wireless, bluetooth, ir, van-eck phreaking etc you've
actually got to switch the machine off, unplug it and take out the
battery if there is one.
But this still leaves the physical access attack vector, so you've got
to make sure it can't be switched on. With an axe.
Drives could be swapped out to another machine - so they'd better be
unreadable too. Encrypted isn't good enough because they could torture
the password out of you even if the encryption scheme is "technically"
Or option B is to trade off a reasonable assessment of the risk and the
cost with the value of the service while trying to minimise the first
two to some reasonable degree then make your trade off. There are those
who do this professionally, some on this list, who can answer questions
about the practicalities of it better than me. Reckon you're one of
them Alex! So are you recommending nobody run services visible to the
web unless they treat they are experts who are willing to spend more
than N hours a week securing it?
Mythweb == evil ? ssh tunnel and use a curses interface (write it if it
doesn't exist) : ssh tunnel and use mythweb invisible to the web as
htdigest isn't remotely good enough;
DMZ the machine from the lan and keep nothing on the disks other than tv
Something else? What say you, i'd be interested to hear your (and
But back to the original point how does google even know that /mythweb
exists, given nothing links to it, it's not my usual location for it, it
is, and I believe always has been behind a password, and until I forgot
it on the machine changeover on the weekend there was a robots.txt
disallowing everything from anyone if they're remotely polite - which
googlebot claims to be and usually seem to be.
1) I must have originally had it placed at /mythweb and linked from my
front page and have forgotten I did this over 2 years ago while exposing
it via the firewall.
2) I must have not had a robots.txt at that time as well as now.
3) I must have let the password protection down at that time.
4) Googlebot must have proceeded to do "it's normal practice" following
links and indexing pages with all of these things simultaneously in
effect and also remembered all this about my domain for over 2 years,
then followed the memory of links rather than actual links, while
keeping the memory of links in its index when refused access.
And there's basically no enquiry I can make with google about it.
Any one of those I'd say, absolutely fair enough, I'm a goose and I make
mistakes and the mythweb setup was an experimental diversion, much as
the whole mythtv thing was and is.
Two of them, yeah why not, coins land heads 4 times in a row, sure.
Three, sure, slap my head about being a bigger goose than I thought but
we all have bad days, right? And I probably was having at least one or
two of those 2 years ago now I think about it.
3 + google having an index of the structure of links from over 2 years ago?
Well okay. It *is* possible. Interesting if that's what it is, though,
huh? I'd also have thought all that occurring simultaneously unlikely.
If there's no alternative explanation I guess I'd have been doing some
wrong thinking about that too.
As it is, there's no damage done, presumably the Goog will "forget" the
no longer relevant links that are on it's page for my domain one day
given they're not even indexed, but it might take more than 2 years.
It's just a bit weird. So yeah, the "best" explanation I have is is
"iSuck" and google odd.
And if nobody else sees anything like it in the world I guess we can
safely say that Googlebot is not conducting "research" or I can have
paranoid fantasies about being specifically targeted by googlebot which,
I'd have to say, is very unlikely. A lot less likely than them
acknowledging and apologising for their repeated telephone script "our
engineers were brainstorming and suggested you'd be ideal for google and
google wants you - but please read this ad and then passionately make
your case why google should deign to consider you." Then going on to
refusing to put me in touch with one of these engineers who they claim
are friends of mine and who personally recommended me to discuss why
they would think such a thing and /whether/ I'm any kind of fit for goog
- which I might well not be at all. Common sense tells you it's just a
bit of webstalking by paid placement consultants who are not google
employees priming applications - paid on commission for applicants who
get through weeks of interviews, still want the gig, and get hired - to
the tune of 25% of salary. A good probability high paying raffle ticket
for a couple of phonecalls and some web searching. If you toy with them
long enough they'll admit to the initial scripted dishonesty. The
placement consultant industry sucks very hard as most of us are aware -
Google are no different to the norm for large, poisonous corporations on
that front even if they do make claims to be different on other fronts.
I'd love it if they took to that wart on their face, head on, as it were
and disrupted the odiousness of the industry, wouldn't we all? Some also
want ponies, ponies that defecate world peace... ;-)
Thanks all for your thoughts.
More information about the linux