Saturday, June 21, 2008

Inktomi Slurp Confirm 404

Approximately once a month (or perhaps every 2 months) my web sites record a request for a page such as:

SlurpConfirm404.htm
SlurpConfirm404/weather/heavenbecauseofyou/hearzg.htm
SlurpConfirm404/thelmalouise/ALL-IMAGE-005-J/railways.htm

This is Yahoo checking to see that the web site correctly returns a 404 for pages that don't exist. To confirm that it really is Yahoo you should do a lookup on the IP where the request came from and check that it's INKTOMI CORPORATION.

I assume that this makes for better indexed web sites as the search engines can rely on your site to return the appropriate error codes for pages that have moved or are no longer there.

Here is some trivial information about the Inktomi bot's visit to the previously mentioned site. The first visit was recorded today (6/21/2008) at 6:31am ET and the last at 12:56pm. There were a total of 9 visits. The shortest time between visits was 9 minutes and the longest 79 minutes with an average of 48 minutes. Each IP address that each request came from was unique but they all fell in the 72.30.215.* block.

I've decided to try and table the visits to see exactly how often they check:

Yahoo/Slurp:
6/21/2008
10/15/2008

More visits:

2008-10-15T05:43:37Z /SlurpConfirm404/Honey.htm
2008-10-15T05:50:03Z /SlurpConfirm404/weather/heavenbecauseofyou/hearzg.htm
2008-10-15T05:52:43Z /SlurpConfirm404/fan/docs.htm
2008-10-15T05:58:59Z /SlurpConfirm404/dickg.htm
2008-10-15T06:30:23Z /SlurpConfirm404/thelmalouise/ALL-IMAGE-005-J/railways.htm
2008-10-15T06:43:15Z /SlurpConfirm404/pcwww.htm
2008-10-15T06:50:56Z /SlurpConfirm404/commandes.htm
2008-10-15T06:52:30Z /SlurpConfirm404/Patriotic/acarogicofla.htm
2008-10-15T06:59:11Z /SlurpConfirm404/adm_app/contact_me/ewebtur.htm
2008-10-15T07:03:34Z /SlurpConfirm404.htm
2008-10-15T07:04:00Z /SlurpConfirm404/narra/ArtEd/research_faculty.htm
2008-10-15T07:04:09Z /SlurpConfirm404/opinio/witzindex.htm
2008-10-15T07:28:16Z /SlurpConfirm404.htm
2008-10-15T07:30:11Z /SlurpConfirm404/wopabbswp/child-health/boys.htm
2008-10-15T07:42:09Z /SlurpConfirm404/wopsafiwp/wcwstinger/whoareyou.htm
2008-10-15T08:13:11Z /SlurpConfirm404/kurihara.htm
2008-10-15T10:40:58Z /SlurpConfirm404/diablo/WHW.htm
2008-10-16T06:30:39Z /SlurpConfirm404.htm
2008-10-16T07:56:24Z /SlurpConfirm404/MegRyan.htm
2008-10-16T08:48:33Z /SlurpConfirm404/handouts/loge/shackbar.htm
2008-10-16T09:01:04Z /SlurpConfirm404/aleapa_001.stats/manews.htm
2008-10-16T10:45:53Z  /SlurpConfirm404/minority.htm
2008-10-16T11:44:04Z  /SlurpConfirm404/BlushingAngel/ANIC/byblos.htm
2008-10-16T16:03:30Z  /SlurpConfirm404/cgiwrap/viten/INFO465-gs.htm
2008-10-16T17:30:22Z  /SlurpConfirm404/rant/business.demonizing_gat1a.htm
2008-10-16T18:36:13Z  /SlurpConfirm404/tma.htm
2008-10-17T02:19:56Z /SlurpConfirm404/drodgers/ppv.htm

6 comments:

  1. Thank you, I was wondering what this entry was.
    I have seen it a few times in the last week, but it does in fact seem to be the Yahoo spider.

    ReplyDelete
  2. Thanks for clearing this up!

    ReplyDelete
  3. Just come across this having checked log files. Interesting

    ReplyDelete
  4. Also thanks for this explanation! I found those Slurp Confirm 404s in my log files for my blog and my website for the first time and didn't know what the hell that is.

    ReplyDelete
  5. This is stupid, I got about 10 of these today. What happens if I block the spider to keep my inbox from filling up with notifications about stupid pointless error pages?

    ReplyDelete
  6. If you don't want your site indexed by Yahoo then you can block the spider using a directive in the robots.txt file in the root of your site. However, if you want search engine generated traffic I would not suggest that you do this. Instead why don't you setup an email filter that dumps those emails directly into the trash so you don't have to see them?

    ReplyDelete