[Xapian-discuss] Document weighting in 1.1

Richard Boulton richard at tartarus.org
Tue Sep 8 21:26:58 BST 2009


2009/9/8 John Wards <jwards at whiteoctober.co.uk>

> Okay, after a bit of googling I have discovered this:
>
> http://trac.xapian.org/browser/trunk/xapian-core/docs/postingsource.rst
>
> Which I think Richard was suggesting.


Great - you're on the right track...


> However I get to the examples
> section and it is just a FIXME.
>

... but that's partly why this isn't released as stable yet :(


> Could someone give an example on how to use a PostingSource as the
> second example given in the introduction as that seems exactly what I
> am trying to do. (To boost weight of documents)
>


> However from a bit more googling I get the impression that
> PostingSource is not useable in in the PHP bindings.


There is a restriction, but it should be perfectly usable for what you
want.  Basically, PostingSource is a base class, which can be derived from
to produce arbitrary sources of weights and terms for the match process.
This subclass can be in C++, or in _some_ of the wrapped classes.

However, there are some built-in subclasses, which can be used in all
wrapped languages.  The one you want is ValueWeightPostingSource, which
reads from a value slot and returns a weight based on the value found in
that slot.
It should be perfectly usable with PHP.


> Is it useable in
> Python bindings?


Yes - in the Python bindings, it's even possible to implement the
PostingSource subclass in Python (though the performance isn't great if you
do this).


> Also where would one start to make PostingSource work
> in PHP? Is this where I have to dust off my C skills from Uni and
> start making PHP extensions?
>

A brief example of what you need to do, written in Python because my PHP is
weak, but using only the bits which should work in PHP too:

Firstly, at index time, you need to put a weight into the slot.  We'll use
slot number 0, but any other slot could be used.  We need to convert the
weight to a string representation, and xapian provides a function to do
this: sortable_serialise:

>>> doc = xapian.Document()
>>> doc.add_value(0, xapian.sortable_serialise(weight))
>>> # add other stuff to the doc, then add it to the database.

Now, at search time, assuming that the main query is in a variable called
"query":

>>> weightsource = xapian.ValueWeightPostingSource(0)
>>> query = xapian.Query(xapian.Query.OP_AND_MAYBE, query,
xapian.Query(weightsource))

There is one wrinkle to be aware of (which the python bindings hide from
you, but I think could be a problem in PHP): you need to ensure that the
"weightsource" object isn't deallocated before the "query" object using it
is run.  The easiest way to do this is just to make sure that "weightsource"
doesn't go out of scope until the "get_mset" call has been made.

-- 
Richard


More information about the Xapian-discuss mailing list