I was listening to Jeff Atwood and Joel Spolsky's podcast from a couple of weeks ago (podcast 22) and Jeff mentioned the Fight Club rule (The first rule of fight club is you do not talk about fight club) and how Stack Overflow is not intended to be a site that discusses itself. i.e. The first rule of Stack Overflow is you do not talk about Stack Overflow on Stack Overflow. This is also the reason that I'm not posting this on Stack Overflow and instead posting it here.
One of the reasons that Jeff doesn't want Stack Overflow discussion on Stack Overflow is that he doesn't want Google et al indexing Stack Overflow pages that discuss how the site works. When a user finds Stack Overflow through a Google search it will hopefully bring them to a page with a question and solution to a problem that they might be trying to solve. Not to a meta-discussion.
So how do you solve this problem. In my opinion fairly easily: You dynamically create the robots.txt file that's in the root of the site each time it's requested (instead of having a static one). You then query the database for all questions that are marked with the "stackoverflow" tag and add their URL's to the disallow section of the robots.txt content that you're going to return to the requesting search-bot. That way the meta-discussion won't be indexed by the search-bot.
After typing that I thought of what is probably an even easier solution. When you retrieve and construct the page from the DB just add the <META name="robots" content="NOINDEX,NOFOLLOW" /> to the <head> tag in the page if it has a "stackoverflow" tag and Google won't index that page.
Not sure why I got so carried away with dynamically generating the robots.txt page/file...
Yeah, your second idea is better since robots.txt is accessed <strong>a lot</strong>. Jeff has a demonstrated aversion to expensive queries if you follow his Twitter account.ReplyDelete