Irony of Internet search is that - there
never is a paucity of information - but an overdose
of it. With databases that can keep the entire Web
at its fingertips - search engines almost always can
retrieve relevant pages but the challenge lies in
separating wheat from the chaff - keeping out unwanted
stuff. Most engines find more sites from a typical
search query than you could ever wade through and
so finding the relevant pages from its search result
looks more like proverbial needle in the haystack
situation.
We have discussed Boolean search in last
issue. Its a great tool in terms of simplicity and
speed - but incapable of differentiating search expressions
which have same keywords but in different order (hence
different meaning). So, Search expressions 'Dog Bites
Man' and 'Man bites Dog' retrieves virtually same
result (unless using exact phrase).
Search Engines are aware of this problem
and have tried to solve it in different ways. Directory
type search engines display search result in alphabetic
order. But they are extremely selective - so the search
result seldom goes beyond one or two pages.
Spider based search engines have no such
luck - so they employ what is called 'relevance score'
to sort search results.
Relevance score is a measure to bring
the most relevant pages at top of any search result.
Many search engines display relevant score of each
retrieved page.
Relevance scores reflect the number of
times a search term appears, where it appears (e.g.
in the title, in the meta tags, towards the beginning
of the document etc.), if all the search terms are
near each other and many other relevance parameters.
Each parameter has a different weightage. The pages
are sorted by final relevance score.
Since each search engine has its own
system of calculating relevance score - you find different
search result from different search engines even when
the search expression is same.