Been busy last few weeks sorting out site stability after a WordPress 4 update caused issues, at least I thought it was WordPress 4 causing it…. Spider Bots
Did you know there’s 1000’s of bots like Google uses to trawl sites.. Some of them legitimate, a huge number aren’t.. All of them tie up resources and are annoying. Why would China’s top search engine want to know what I do for this town?
So they’ve mostly been banned.. Of course there’s a few what constantly change their IP or user agent type.. No doubt I’ll spot and dispense justice. The typical response from these search engines, some of whom are legitimate, is to add a disallow rule in robots.txt which their bot will stop and not go any further. Since the bots seem to ignore this simple rule due to their variation, reading abilities and probably hitting robots.txt last.. It doesn’t work very well..
WordPress has a number of plugins what will identify real spider bots and restrict the bad ones down, this will help if your site is being hit by them continuously.
Of course having better hosting will solve this problem for a while, however with multiple projects going on in the background and site re-structuring.. Do you really want every file and folder indexed for everyone to see?
From a SEO view, banning spider bots and crawlers is not okay. But if its indexing for Russian, Chinese and other countries you hardly get traffic from or deal with, then there’s not much reason for them to be spidering the site and drain server resources.
For an expansive list of spiders/crawlers, who they belong to, and originate from.. See PerishablePress’s article on banning naughty user-agents