Ignoring specific content on a page

July 28, 2006 in Google Mini,GSA,Spidering | Comments (2)

If you want your Google Mini or Search Appliance to ignore part of your page, you can use some special tags to stop the content being indexed (and therefore brought back in the search results.)

Surround the content you want ignored with the following tags:

<!-- googleoff: index --> <!-- googleon: index -->

So if you have

<!-- googleoff: index --> I like bees <!-- googleon: index -->

On your page and you search for ‘bees’, it won’t come up, even if the page has been spidered. The only people who will find out about your love of buzzing insects will be those who have found the page through other means.

This can be useful for excluding parts of your page that the appliance might find confusing, for instance ‘H’ wants to exclude his breadcrumb trail.

Comments (2)

RSS feed for comments on this post.

  1. Comment by Danny Dawson — July 28, 2006 @ 4:48 pm

    While this technique does seem to be the official method for excluding certain page content from appearing in GSA search results, there is another method which does not affect the amount of markup you serve to your regular visitors.

    As a GSA administrator, you have control over the GSA’s user-agent string. Even though it’s generally not a good idea to rely on user-agent sniffing for content delivery, in this case you’re the one with control over how the client (your GSA) identifies itself. As such, if you assign a unique user-agent to your GSA, you can then sniff for it server-side and omit only the bits of content you don’t want the GSA to see.

    For example, if you assign the user-agent “businessname-searchappliance”, you can use this php to omit content:
    if ( !strstr($_SERVER[‘HTTP_USER_AGENT’], ‘businessname-searchappliance’) ){
    // Content to omit goes in this block

  2. Comment by Michael — August 3, 2007 @ 7:04 pm

    A great tip; thanks!

Leave a comment

Sorry, the comment form is closed at this time.