Clever girl...
7pm, 17th November 2006 - Geek, Interesting, Web, Security, Sysadmin
Just when you've got one in your sights, the other two attack you from the side.
I was surprised a couple of days ago to find a significant number of entries in the slow queries log on one of our web servers. While looking through them we discovered that they were caused almost exclusively by someone doing a search for a fairly unusual search string. The search string looked like this: "<A HREF = "http://example.com">example.com</A>" which, as I said, is fairly unusual. What that represents, for all you non-web-geeks out there is the HTML code that would create a link to the site example.com and would look like: example.com. A strange thing to enter into a search box to be certain.
Why would anyone type that into a search box? My first thought was that it was referrer log spam, or some variation on it. The idea being that every request that is made to a website is stored in a log, and some websites publish those logs or the statistics from those logs as part of the website itself. These statistics can be visited by real users who may click on the link or by search engine robots which can increase the pagerank of the site in the link. None of our websites do that however so that seemed unlikely to be the motivation.
I realised quickly after this that the motivation was probably a little more clever, but clearly unscrupulous so we decided to block his IP address. Strangely enough he seemed to be coming from a range of IP addresses. A class-C range to be precise, and pretty much randomly at that. He also had a user-agent string of "Slurp". A quick reverse-DNS lookup and we realised that this was the Yahoo! search engine's robot crawling our sites and doing these searches and therefore that blocking the IP addresses was not a good idea.
So why was Yahoo! doing searches for random bits of HTML on our sites? The answer was found within another site, found via Google that had a large list of links that when followed linked to a search results page on some of our sites. The idea was similar to the referrer log spam but rather than creating a bot that had a link in it's referrer string, this one used search engine bots to attempt to insert links into our search results and then index those pages and potentially increase the pagerank of the linked site. It's unlikely to fool real users but they were not the motivation here; this was all about getting higher in Yahoo! and Google's search results pages.
We couldn't let this continue, and the easiest solution was simply to disallow robots from indexing search results pages. This had the added advantage of reducing the load the server was being caused by running all those searches that no one was looking at anyway. Also, no one wants to find a search results page linked from Google. If you are using a search engine to search for a particular topic, you want pages on that topic, not pages that redirect you to pages that redirect you to pages on that topic. From now on, all search results pages that I deal with will be disallowed to all bots. The bots themselves won't be doing any searches, anybody that links to them is likely to be up to no good and there's no point in search engines indexing them anyway.
To finish off, I thought I'd leave you with another quote from the same movie that the quote in the title is from: "It's a UNIX system! I know this!"
Related posts:
Galumph went the little green frog one day.Submit, Reset.
MoneySavingExpert under DDoS attack
Internet Explorer exceeds all expectations.
How to recover your data after a crash
Older blog posts:
- 27th Jan, 2009: The Middle Name Guesser
- 15th Jan, 2008: The air powered car
- 30th Oct, 2007: MoneySavingExpert under DDoS attack
- 14th Oct, 2007: Little Bobby Tables
- 13th Oct, 2007: So many servers, all hacked.
- 23rd Sep, 2007: Security implications of data recovery
- 17th Sep, 2007: How to recover your data after a crash
- 16th Sep, 2007: Burning water not so hot after all
- 12th Sep, 2007: Swedish security researcher exposes plaintext passwords found while sniffing Tor
- 27th Aug, 2007: The smoking ban
- 31st Jul, 2007: Eating and watering and generally relaxing
- 29th Jul, 2007: Apocalypse tomorrow
- 2nd Jul, 2007: In search of an English summer
- 30th Jun, 2007: iPhone and Security: Spreading the FUD.
- 9th Jun, 2007: Galumph went the little green frog one day.
- 26th May, 2007: A tale of duelling GRUBs and boots.
- 2nd May, 2007: Distribution and layers
- 22nd Apr, 2007: Dave's rebuttal of Macrovision's response to Steve Jobs' open letter about DRM in iTunes
- 14th Apr, 2007: Much ado about DRM
- 29th Mar, 2007: It's all relative
- 25th Feb, 2007: Minimum wage: minimum job
- 5th Dec, 2006: They took my shower !
- 21st Nov, 2006: How different must a copy be before it is no longer a copy ?
- 17th Nov, 2006: Clever girl...
- 21st Oct, 2006: The Great Croatian Adventure (Part III - The Good Bits)
- 19th Oct, 2006: The Great Croatian Adventure (Part II - Getting back)
- 6th Oct, 2006: Oooooh, shiny !
- 24th Sep, 2006: The Great Croatian adventure (Part I - Getting there)
- 8th Sep, 2006: AAAarrrgh ! Human pop-ups !
- 1st Sep, 2006: Submit, Reset.
- 25th Aug, 2006: Internet Explorer exceeds all expectations.
- 18th Aug, 2006: Sudoku solving version alpha
- 6th Aug, 2006: I don't know whether to be proud or ashamed.
- 5th Aug, 2006: Time to move on
- 30th Jul, 2006: Another part comes to life.
- 10th Jul, 2006: How may I help you today ?
- 25th Jun, 2006: Clawing my way back on to the web
Comments
Be the first to comment !