You can’t spider XML with a Google Mini (so far)

Posted on January 16th, 2007 in GSA, Google Mini by Paul

A question I’ve seen come up a lot which isn’t answered directly by my earlier post is whether the Google Mini or Search Appliance can spider raw XML. Unfortunately, no, it cannot.

The Mini / Search Appliance can read the XML, but it takes it in as straight text, so any searching you do will look at node names, attributes and content, rather than just content.

The best I can suggest is you have some scripting to run an XSL transform on your XML to turn it in to a small (or indeed large) site of web pages, then spider those with the appliance.

One Response to 'You can’t spider XML with a Google Mini (so far)'

Subscribe to comments with RSS or TrackBack to 'You can’t spider XML with a Google Mini (so far)'.


  1. on February 4th, 2008 at 6:57 pm

    We create a sitemap.xml file which contains all the pages on our site. We have ~7000 pages so far. Is there a way we can configure Google Mini to load a page (that we create using the sitemap.xml file) that has all 7000 links, and the Google Mini goes and crawls just those 7000 pages? (And nothing else that it finds in those 7000 pages)?

Post a comment