Comments for GSA Developer Google Search Appliance and Google Mini development Fri, 14 Mar 2014 15:00:46 +0000 hourly 1 http://wordpress.org/?v=4.2.2 Comment on Avoiding session IDs when spidering by Paul /2006/04/20/avoiding-session-ids-when-spidering/comment-page-1/#comment-104197 Fri, 14 Mar 2014 15:00:46 +0000 /2006/04/20/avoiding-session-ids-when-spidering/#comment-104197 Hi Manu, I haven’t used the latest version of the GSA software, but in the versions I have used, the GSA never sets the sessionId itself, it has always been created by the server it is spidering.

If the sessionId that has been assigned to the spider is not automatically put in to the links on the page it is crawling, when the GSA spiders each of those links it could be that your server is giving it a new sessionId every time, as it would not have the sessionId that was assigned to the GSA spider copied in to the URL.

So what I’m saying is – the way your pages are coded is most likely the reason you’re seeing lots of different sessionIds, it’s not the GSA creating new sessionIds deliberately.

It would be a lot easier if spiders would keep the same sessionId across a site, but then it’s up to us as developers to make our sites work that way, or just not set sessionIds where they’re not necessary. As spiders can’t tell what is changed on the site in reaction to their session, they are always going to prefer to spider without having a session set, as that’s the most likely state for a searcher to turn up at the site in, whether they use a GSA or other search engine to find the page.

All that said, it’s still very annoying to have to code around!

]]>
Comment on Avoiding session IDs when spidering by Manu Garg /2006/04/20/avoiding-session-ids-when-spidering/comment-page-1/#comment-103988 Fri, 14 Mar 2014 04:23:16 +0000 /2006/04/20/avoiding-session-ids-when-spidering/#comment-103988 Thanks for the info, this was very helpfull. But my question how does GSA creates and manages session when it does the crawling of url’s. In other words the stuff explained above deals with the sessions in the destination url’s. I’ve seen that GSA also creates new sessionId for each url , even though all the urls falls under a specific set of pattern. Isn’t that a overhead on the spider as well. Won’t it be convenient for the search engine if it crawls all the url with same sessionid (dotcomsid) for a single request, irrespective of number of url’s to be crawled.

]]>
Comment on How do I access the XML from the Google Mini / GSA? by Sebastian Felix Schwarz /2006/01/26/how-do-i-access-the-xml-from-the-google-mini-gsa/comment-page-1/#comment-30450 Thu, 30 Aug 2012 14:48:51 +0000 /2006/01/26/how-do-i-access-the-xml-from-the-google-mini-gsa/#comment-30450 Hi, i have a problem getting the FULL XML from my GSA!
Putting my query in the Browser-Adress-Field it returns the right XML with all item. Using CURL only returns the META-Content WITHOUT the RES-Node.

0.022841
angelschein

What happend? I tried also: file_get_contents() … no success.

]]>
Comment on How do I access the XML from the Google Mini / GSA? by Paul /2006/01/26/how-do-i-access-the-xml-from-the-google-mini-gsa/comment-page-1/#comment-29221 Tue, 17 Jul 2012 10:07:24 +0000 /2006/01/26/how-do-i-access-the-xml-from-the-google-mini-gsa/#comment-29221 Hi Peter, I’ve updated the link to the new home for the GSA XML documentation

I do wish Google would put in re-directs for this stuff. You’d think they’d understand the need for that, what with being a search engine and suggesting other people do it.

]]>
Comment on How do I access the XML from the Google Mini / GSA? by Peter Knaggs /2006/01/26/how-do-i-access-the-xml-from-the-google-mini-gsa/comment-page-1/#comment-29180 Mon, 16 Jul 2012 09:09:05 +0000 /2006/01/26/how-do-i-access-the-xml-from-the-google-mini-gsa/#comment-29180 Both GSA XML reference links are broken.

]]>
Comment on Google Mini: Searching Subcollections from the frontend by Steve /2007/01/08/searching-subcollections-frontend/comment-page-1/#comment-24521 Tue, 29 Nov 2011 04:22:11 +0000 /2007/01/08/searching-subcollections-frontend/#comment-24521 This was exactly what I was looking for. I’m not very tech savvy but I managed this. Thanks.

]]>
Comment on When your GSA license runs out by Dave Watts /2008/03/27/when-your-gsa-license-runs-out/comment-page-1/#comment-21065 Wed, 01 Jun 2011 14:42:05 +0000 /2008/03/27/when-your-gsa-license-runs-out/#comment-21065 @ltman: it costs whatever you paid to purchase the license in the first place, more or less. The Google Search Appliance is licensed, not purchased.

]]>
Comment on Can you put new pages or applications on a Google Mini? by Stoett /2006/03/09/can-you-put-new-pages-or-applications-on-a-google-mini/comment-page-1/#comment-20474 Tue, 03 May 2011 18:09:46 +0000 /2006/03/09/can-you-put-new-pages-or-applications-on-a-google-mini/#comment-20474 I am saddened with the “No” answer but thank you for posting this. At least now, I found the answer to my question.

]]>
Comment on Custom meta tags in search results and full stops by Eric /2006/05/04/custom-meta-tags-in-search-results-and-full-stops/comment-page-1/#comment-20461 Mon, 02 May 2011 23:51:06 +0000 /2006/05/04/custom-meta-tags-in-search-results-and-full-stops/#comment-20461 Paul,

was wondering, do you know if you can specify an inmeta search… something like so:

q=conservation+inmeta:organization=foo

and also specify that all pages that are missing the organization meta tag be included in the search?

Thanks

]]>
Comment on Setting a unique user agent to help control spidering by Lalith /2006/01/23/user-agent-control-spidering/comment-page-1/#comment-20039 Thu, 14 Apr 2011 15:22:45 +0000 /2006/01/23/user-agent-control-spidering/#comment-20039 Hi
Thanks for the information!
I am trying to feed pdfs to GSA, the error I am getting is “Crawled with empty body: Conversion error.”

Can you help me resolving this issue?

]]>
Comment on When your GSA license runs out by Buzz /2008/03/27/when-your-gsa-license-runs-out/comment-page-1/#comment-11361 Tue, 08 Jul 2008 15:15:39 +0000 /2008/03/27/when-your-gsa-license-runs-out/#comment-11361 Well the GSA mini license _does_ run out … just not anytime soon

“License valid until: March 07, 9009″
“The license will expire in 2556939 days.”

Just intime for the 9010 Apocalypse, all hail our robot overlords … etc.

]]>
Comment on When your GSA license runs out by Itman /2008/03/27/when-your-gsa-license-runs-out/comment-page-1/#comment-10581 Tue, 15 Apr 2008 14:53:26 +0000 /2008/03/27/when-your-gsa-license-runs-out/#comment-10581 Hi,
does it cost 30 thousand dollars to renew the license?

]]>
Comment on You can’t spider XML with a Google Mini (so far) by Jason Grovert /2007/01/16/spider-xml-google-mini/comment-page-1/#comment-6622 Mon, 04 Feb 2008 18:57:47 +0000 /2007/01/16/spider-xml-google-mini/#comment-6622 We create a sitemap.xml file which contains all the pages on our site. We have ~7000 pages so far. Is there a way we can configure Google Mini to load a page (that we create using the sitemap.xml file) that has all 7000 links, and the Google Mini goes and crawls just those 7000 pages? (And nothing else that it finds in those 7000 pages)?

]]>
Comment on Relevance in Mini and GSA searches by Joel /2006/03/07/relevance-in-mini-and-gsa-searches/comment-page-1/#comment-5969 Thu, 10 Jan 2008 16:25:14 +0000 /2006/03/07/relevance-in-mini-and-gsa-searches/#comment-5969 Currently nutch-IICE open source project is similar with Google GSA. You can take a look at it.

http://nutch-iice.sourceforge.net/

]]>
Comment on How do I access the XML from the Google Mini / GSA? by hazarth /2006/01/26/how-do-i-access-the-xml-from-the-google-mini-gsa/comment-page-1/#comment-5483 Fri, 21 Dec 2007 10:42:00 +0000 /2006/01/26/how-do-i-access-the-xml-from-the-google-mini-gsa/#comment-5483 i implemented gsa search functionality to my appilications
while searching with “testing” the output is not comming

]]>