User Comments
If it’s a search spider it is certainly behaving oddly. It has made 726 requests of just one page (/forum/rev/threads.asp). That’s definitely not normal! |
Further log analysis reveals that this is a spider, identified as msnbot/0.3 (+http://search.msn.com/msnbot.htm). So obviously I shouldn’t ban it. It’s annoying that it’s ignoring my “robots.txt” files, even though it is requesting them! I’m concerned by the unusual amount of requests for “/forum/rev/threads.asp” (726 at last count). Can anyone offer any reason why the bot is getting hung up on that page? |
People have been having all sorts of trouble with MSNbot ever since they started gearing up for the new launch (intended for early 2005?) back in late 2003. A few references for your reading pleasure:
But I think your problem is probably related to this:
Hope that helps! |
Wow! Thanks for the links Lachlan. There is certainly some interesting reading there. So it seems that, as with all their software ventures, Microsoft can’t even deploy a bug-free web spider. I guess this should surprise nobody. I agree with your suspicion Lachlan, I seem to have triggered the MSNbot’s “Fatal Flaw” with my forum system. So I have two options, I can exclude the bot from the forums or I can mod_rewrite my way out of the “?” flaw. Great, more work to do over Christmas. Thank you so much Microsoft. |
My suggestion would be to disallow via robots.txt temporarily to save on bandwidth (after all, only the tech preview is in use at the moment, so it isn’t like you’ll lose results). Then when you get a chance, do some testing and find a workaround. Obviously you’ll need to do that before they launch their own search results for real, but if you’re lucky they might get it fixed by then and you won’t have to worry about it. Unlikely, I know, but given the uncertain timeline for launch and the current testing (we have to assume a beta, surely) they may clear it up in the next month or two |
I can remove the “?” from forum queries as that’s what seems to cause the problem - it’s funny though that no other bot hangs up in my forums. In the short term I can also use the “msnbot” META tag to keep the bot out of the forums. The links you posted were really useful Lachlan and I thank you again. |
No worries, Jon. Glad I could help Not to wax overly lyrical, but that’s what our little standards-based bit of the web is all about, right? Helping folks |
Standards are there to be obeyed and people who flaunt this so fragrantly do not deserve the public service you offer. Anyone or anything which disobeys your robots.txt file should automatically be banned for a set period. Say a day or two. Perhaps longer. This is also a good way to catch malicious bots who will purposefully descend into directories they think they are not allowed to. Although saying this, you shouldn’t have publicly accessible directories that you don’t want to grant access to. Take a look at webstandards.org/robots.txt for an example of this. (http://webstandards.org/about/internal/) |
Hit with requests for what? Various pages? Then it looks like their building an index. Do you want to be on that index? Do people who use MSN Search deserve to read your site? 8^)