If you have a protected area of your website, but still want to spider it and have the results available to searchers (e.g. if you have an extranet that you want to offer searching on) then you can give a Google Mini a username and password to give access to the area to spider.
For instance, you have used .htaccess and .htpasswd files to protect a directory on your website called ‘clientarea’, and you need to use the username ‘myuser’ and password ‘mypass’ to access it.
In the Mini’s Admin area, go in to ‘Configure Crawl’, if there isn’t a direct link from another area you are spidering, add the URL of the protected area to ‘Start Crawling from the Following URLs.’ Note: when I tried this, I had to add a direct link to a particular file, rather than just to the directory. So in this example we can add in:
http://webpositioningcentre.co.uk/clientarea/index.php
Click ‘Save URLs to Crawl’ and go to the ‘Crawler Access’ area.
In the area labelled ‘Users and Passwords for Crawling’ you need to put the URL of the protected area, and a username and password to access it. If you also need to set a domain for the protected area, fill in that box as well.
For URLs Matching Pattern: http://webpositioningcentre.co.uk/clientarea/
Use this user: myuser
With Password: mypass (same in Confirm Password)
Click on ‘Save Crawler Access Configuration’ and you’re ready to go. It will remove the password stars from the boxes, but will remember the password.
Next time you crawl your sites, it will access the protected area as if it was a user giving over the set username and password. In the search results it will show Titles, URLs and a snippet of the page as usual, but when a searcher clicks on the link they will need a correct username and password to access the area.
Warning: searchers will be able to click on the ‘Cached’ link and view the contents of the page. To stop the Google Mini caching the page, in the
area of the page in question put the following code:
<meta name="robots" content="noarchive" />
This will stop the Mini, and any other search engine you allow access to the pages from storing a cached version of the page. They will still show part of the page in the snippet as part of the results, but searchers won’t receive a ‘cached’ link to click on.