What the Google Mini will spider
Google have a helpful list of the file formats the Google Mini will spider.
It’s worth checking what you want to spider before you consider the other factors in why you are buying a search appliance. Checking through the various file formats I have, I was surprised to see the Mini supports .wps files written by Microsoft Works for DOS. It’s not a difficult format to read, but it is an old format now – Works v2 being copywrite 1988 if my memory serves me correctly. Personally I have a ton of old Works files and it’s nice to know something will still understand them, I told Google Desktop they were txt files with an odd extension, but that can have dubious results as some of the file is binary.
It’s my understanding that Google licenses a third party filter for its appliance (as almost all search indexing companies do). I don’t know which one Google licenses, but both Verity Keyview and Stellent OutsideIn support this format, along with lots of others.
[…] A question I’ve seen come up a lot which isn’t answered directly by my earlier post is whether the Google Mini or Search Appliance can spider raw XML. Unfortunately, no, it cannot. […]