[Xapian-discuss] Index-time weight of a document and weight per document field

Olly Betts olly at survex.com
Thu Feb 26 00:31:51 GMT 2009


On Wed, Feb 25, 2009 at 05:10:56PM +0100, Maciej Zi??ba wrote:
> Hello :-)
> 
> When indexing documents, I would like to influence future search results order. 
> I've used the verb "influence", because I don't want to change the ordering 
> completely but only to give a "hint" about it.
> 
> There are two ways in which I would like to do that:
> 
> 1. Weight of a document
> I would like to be able to say that some documents are more important than the 
> other and should therefore end up higher in the results. An example:
> - Document A has weight of 2
> - Document B has weight of 1
> - Document C has weight of 3
> - We search for "xyz" and find it in all 3 documents
> - The order in which results are given would be: C, A, B

There isn't really a clean way to do this in 1.0.x - the best I can think
of is to add a term to all documents you want to give a weight boost to
with a wdf which models this weight boost (say XBOOST) and then combine
this with the parsed query like so:

    Xapian::Query q = queryparser.parse_query(query_string);
    q = Xapian::Query(Xapian::Query::OP_AND_MAYBE, q, Xapian::Query("XBOOST));

With SVN trunk, you can use Xapian::PostingSource to do this:

http://trac.xapian.org/browser/trunk/xapian-core/docs/postingsource.rst

> 2. Weight of a field (per document, not in general)
> I would like to be able to say that a given field in a particular document is 
> more important than in another. An example:
> - Let's say that we have a "keywords" field
> - Document A has weight of 1 and it's keywords field has weight of 3
> - Document B has weight of 1 and it's keywords field has weight of 1
> - Document C has weight of 1 and it's keywords field has weight of 2
> - We search for "xyz" and find it in "keywords" fields of all 3 documents
> - The order in which results are given would be: A, C, B

http://trac.xapian.org/wiki/FAQ/ExtraWeight

> I guess this can't be done with any existing tool (for example with 
> scriptindex) and I would have to write my own indexer (I will try to use 
> Python bindings).  Am I right?

The "XBOOST" technique could be done by massaging the input file to
scriptindex and using a suitable index script.

The index-time extra weight technique described in the FAQ is supported
by scriptindex directly (weight=FACTOR).

Cheers,
    Olly



More information about the Xapian-discuss mailing list