Relevance in Mini and GSA searches

March 7, 2006 in Google Mini,GSA | Comments (8)

A question from Jim Westergren caused by looking at an oddity in Google PR reporting prompted me to look at the relevance rating in Google Mini search results.

For each search you do on the Mini/GSA, you get back a variety of information for each page. Not all of this is immediately obvious – there’s a last modified date which is often pretty useless, and also there’s a relevancy rating for each page. If you’re using the XML API, you’ll find the relevancy within <RK>.

Jim asked if the RK rating was always the same, or different for each page. I’ve done some checking, and it is different in the Mini, and the rating for each page depends on what you have searched for. For instance, on one search, a page I was following was rated as ‘5’, in another, it was ‘0’.

This means the RK value could be useful in a set of results, although as by default the results are ranked by relevancy so what the box thinks is the most relevant is at the top of the results, it’s probably not greatly useful. When sorting by date it becomes more useful, as you can try to spot the most relevant page in whatever results you happen to be looking at.

It should not be seen as a version of PageRank (‘PR’) within the search appliances, because the value is not fixed across all searches.

Comments (8)

RSS feed for comments on this post.

  1. Comment by Dayo_UK — March 8, 2006 @ 11:42 am

    Hi, So results are returned in a relevacy order by default.

    So if you choose to display the value you will get results like this ?:-

    First Title
    First Desc
    First URL – 5

    Second Title
    Second Desc
    Second Url – 4

    etc

    So the seems to directly reflect the relevancy in a relevancy search ? – or can an of 2 show higher than an of 4 in a relevancy ordered search ?

  2. Comment by Dayo_UK — March 8, 2006 @ 11:44 am

    Oops – the comments does not expect tags – before the numbers in the above posts should be the rk tag. EG RK of 5, 4, 2 and 4.

    Cheers

    Dayo

  3. Comment by Paul — March 8, 2006 @ 12:01 pm

    Yup, the results are shown in relevancy order by default. You can get several of the same RK rating, so it must have a decimal level internally, or something else it sorts by as well, so you can get…

    First – 6
    Second – 6
    Third – 5
    Fourth – 5
    Fifth – 5
    Sixth – 2
    Seventh – 2
    Eighth – 0
    Ninth – 0
    etc.

    This is consistent from what I’ve seen, it’s not like PageRank in big Google where you can have a PR2 page come higher up than a PR5 page in a set of results.

    NB: The top result doesn’t always have an RK of 10, with the rest decreasing from that, so the Mini must have some sort of relevancy algorithm that says “this is the most relevant page for the search, but I still only give it a 5/10 for relevancy for the term.”

  4. Comment by Dayo_UK — March 8, 2006 @ 2:45 pm

    Ok, thanks.

    So there is little doubt that rk tag is a way of scoring a page – obv the way of scoring a page in Google Mini is a lot different to Big Google.

    Is it all on-page factors ?

    or would Google Mini recognize a more important document by number of references to it or where it sites in the directory tree ?

  5. Comment by Paul — March 8, 2006 @ 4:17 pm

    There’s no doubt ‘RK’ is a way of scoring a page in the Mini and GSA. What it may be in the main Google API is another matter entirely.

    Working out the ranking system in the Mini is one of the things on my ‘to do’ list, which is unfortunately filled with other stuff as well.

    From what I’ve seen, on page factors have a much greater effect than they do in big Google, but interlinking does still have an effect. I’m not sure about effects of where a document is in an overall site or directory tree yet, it can be difficult to assess that without outputting large amounts of test pages, which I haven’t had time for.

    Of course, it might be that interlinking doesn’t have much effect because there isn’t much interlinking in the relatively small datasets that I’m working with. That’s another thing I’m going to have to test. It could be that a relatively few links will have a very large effect, because there generally wouldn’t be a lot of linking to the same place on an intranet – which is what the search appliances were generally made to search.

  6. Comment by Jim Westergren — March 24, 2006 @ 8:32 pm

    Hi,

    Google has now put all RK values to zero for all URLs.

    Either some temporary glitch OR Google didn’t like that value to be public …

    Is it the same for the Google mini??

  7. Comment by Paul — March 24, 2006 @ 10:34 pm

    Hi Jim,

    Nope, I’d have to put a software update in to change something like that and there haven’t been any for the Mini recently. Funnily enough, I’ve just done a software update on the GSA I work on and RK is still there and acting like it did before.

    The Mini & GSA don’t use the same ranking algorithm as big Google, and you can use RK to give an indication of how good a result is for a search (i.e. turn it in to stars or something) so I doubt they’ll turn it off in the search appliances.

    It’s rather interesting that they have within the main API though. If it wasn’t doing something, you’d have thought they’d leave it on (and indeed if it wasn’t doing something, what’s it doing there in the first place?)

  8. Comment by Joel — January 10, 2008 @ 4:25 pm

    Currently nutch-IICE open source project is similar with Google GSA. You can take a look at it.

    http://nutch-iice.sourceforge.net/

Leave a comment

Sorry, the comment form is closed at this time.