<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Avoiding session IDs when spidering</title>
	<atom:link href="http://www.gsadeveloper.com/2006/04/20/avoiding-session-ids-when-spidering/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.gsadeveloper.com/2006/04/20/avoiding-session-ids-when-spidering/</link>
	<description>Google Search Appliance and Google Mini development</description>
	<pubDate>Mon, 08 Sep 2008 08:11:53 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Paul</title>
		<link>http://www.gsadeveloper.com/2006/04/20/avoiding-session-ids-when-spidering/#comment-510</link>
		<dc:creator>Paul</dc:creator>
		<pubDate>Thu, 15 Feb 2007 11:07:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.gsadeveloper.com/2006/04/20/avoiding-session-ids-when-spidering/#comment-510</guid>
		<description>If you don't exclude the session ID, it will spider the pages. I suggest you set the host load to be 1, which might help it keep to the same session ID as it effectively only sends one connection to spider the site rather than 4 (the default host load.)

You can't get it to ignore part of the URL, it only understands inclusion and exclusion based on parameters.

If you have to have a session ID, then you may need to look at changing the CMS or whatever runs the site so it will always feed the same session ID to the Mini when it is spidering. 

Basically, spiders hate session IDs, so if you have to have one, you're always going to be in a bit of trouble. Hmm... you could set up a page of links to every page on your site, all with a session ID appended to them, then set that page as the place the Mini starts spidering. That might then allow it access to the whole site without getting too confused. Unfortunately this is just guesswork from me, usually I deal with people who can get rid of the sessions, or places where we exclude them entirely.</description>
		<content:encoded><![CDATA[<p>If you don&#8217;t exclude the session ID, it will spider the pages. I suggest you set the host load to be 1, which might help it keep to the same session ID as it effectively only sends one connection to spider the site rather than 4 (the default host load.)</p>
<p>You can&#8217;t get it to ignore part of the URL, it only understands inclusion and exclusion based on parameters.</p>
<p>If you have to have a session ID, then you may need to look at changing the CMS or whatever runs the site so it will always feed the same session ID to the Mini when it is spidering. </p>
<p>Basically, spiders hate session IDs, so if you have to have one, you&#8217;re always going to be in a bit of trouble. Hmm&#8230; you could set up a page of links to every page on your site, all with a session ID appended to them, then set that page as the place the Mini starts spidering. That might then allow it access to the whole site without getting too confused. Unfortunately this is just guesswork from me, usually I deal with people who can get rid of the sessions, or places where we exclude them entirely.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ty C.</title>
		<link>http://www.gsadeveloper.com/2006/04/20/avoiding-session-ids-when-spidering/#comment-509</link>
		<dc:creator>Ty C.</dc:creator>
		<pubDate>Wed, 14 Feb 2007 23:37:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.gsadeveloper.com/2006/04/20/avoiding-session-ids-when-spidering/#comment-509</guid>
		<description>What if the website requires the session ID? Is there a way to tell GSA to ignore a specific querystring parameter but still index the page? Otherwise the entire site will be ignored, won't it?</description>
		<content:encoded><![CDATA[<p>What if the website requires the session ID? Is there a way to tell GSA to ignore a specific querystring parameter but still index the page? Otherwise the entire site will be ignored, won&#8217;t it?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
