[Xapian-discuss] Xapian for Wikia?

Olly Betts olly at survex.com
Tue Feb 13 07:55:35 GMT 2007


On Thu, Feb 08, 2007 at 06:32:38PM +0100, Reini Urban wrote:
> What they want is a new search engine, wiki-like which probably needs
> some fast user feedback á la http://www.wikilens.org to give relevance
> feedback to adapt the ranking algorithms dynamically.

I'd argue that if you want to build a fast search engine with such
features, you'll get there substantially faster by starting from an
existing working codebase which you know performs well.  Xapian even has
relevance feedback features already!

Also, realistically it'll be a long while before you can get a useful
amount of user feedback on more than a small proportion of the web,
so you need good ranking to fall back on - if the underlying ranking
is poor, people won't get interested enough to contribute many user
rankings.

Google had around 8 billion pages when they stopped publicising their
size some time ago - compare that to wikipedia which has something like
4.5 million pages (that's from adding together the sizes given for the
major languages on their front page).  Admittedly a wikipedia page needs
(or should need!) more user input than a search ranking, but probably
not 2000 times more, and wikipedia has been going just over 6 years now.

> So it's a UI thing (fast ajax-like feedback) with an AI (artificial
> intelligence) part: rank adaption.

Rank adaption doesn't really require AI - it can be done very well using
statistical techniques.

(Well, "AI" is a much abused term because it sounds good in marketing
puff, but I wouldn't think of statistical techniques as AI.  But
Wikipedia's AI description includes "bayesian networks" which is clearly
statistical...)

> categorization clustering would probably help. maybe users will have
> to do that also.
> if the feedback pays back. say leads to better results.

The real challenge I can see is dealing with people trying to spam their
pages to the top of the rankings by automating the feedback process -
a problem which will only grow as popularity increases.  That's the part
of running a search engine I used to dislike most - it just seemed such
an unrewarding use of my time.

Cheers,
    Olly



More information about the Xapian-discuss mailing list