[Xapian-discuss] Strange Weighting issue

Kevin Duraj kevin.softdev at gmail.com
Tue Sep 22 00:30:34 BST 2009


I would like to share knowledge about how I use weighting scheme when
using indexscripts. Last month I started to build Social Network
Search Engine called   http://Find1Friend.com/   that index pages from
Facebook, MySpace, Hi5 and others. My goal is to weight users higher
who has their social network page updated more recently than others.
Because I need to run this against ex: 100 million of documents and
data is growing rapidly, I came with idea of using 300 indexscripts on
300 indexes, and each one has increased weighting from previous one
based on the user page age.

tags     : weight=3952 indexnopos field
title    : weight=1310 indexnopos field
body     : weight=434 unhtml index truncate=500 field
tags     : weight=5196 indexnopos field
title    : weight=1660 indexnopos field
body     : weight=530 unhtml index truncate=500 field

Because I must index millions of documents real fast, I am creating
300 indexes based on user page age and then classified them it into
particular indexscript_xxx with it's own weighting scheme. This way I
can still benefit from extreme performance using indexscript and have
documents weighted based on age, type of techniques that has been use
in real-time web search engines.

Then I merge all 300 indexes with xapian-compact and get one index
with implemented weighting scheme, with no performance penalty during
indexing and no performance penalty during searching if boost value
would be used to sort the result. This way I get more recent pages on
the top of the search result . As you can see I am going with my
weights 5000 and beyond. Please don't mess up the current Xapian
weighting, it works really well, better than boost, thank you.

PS: Try out this techniques and let me know how it works for you.

Kevin Duraj

On Tue, Sep 8, 2009 at 8:00 AM, John Wards <jwards at whiteoctober.co.uk> wrote:
> On Tue, Sep 8, 2009 at 3:49 PM, Richard Boulton<richard at tartarus.org> wrote:
>>> I need all types to return, but with those with a higher rank to be
>>> given a boost. The idea is to boost product pages over news pages for
>>> example, as even if the news page is textually more relevant the type
>>> of content is actually what the user is really searching for.
>> If you want to _always_ return one type of document before another type,
>> you'd be best using Enquire::set_sort_by_value_then_relevance() to sort
>> strictly by the type of document first, and then using relevance order
>> within documents of the same type.  Using weight is only appropriate if you
>> want the combination of ranking order to be somewhat fuzzy.
> Sorry a miss understanding, I want relevant documents to return
> higher, and I made sure the client knows why we are using Xapian for
> relevancy matching.
> However we do need to massage the relevancy of Xapian to say boost
> something from position 11 to position 3 etc.
> I am about to look at the 1.1 branch as that seems to be what I need.....I hope.
> Cheers
> John
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss

More information about the Xapian-discuss mailing list