[Xapian-discuss] Search::Xapian and term positions

Olly Betts olly at survex.com
Wed Jan 19 21:58:10 GMT 2005


On Wed, Jan 19, 2005 at 10:26:00PM +0100, Arne Georg Gleditsch wrote:
> Has anyone seen this before?  This is Xapian 0.8.5 and Search::Xapian
> 0.8.4.

I've not seen it.  Ideally all the methods should have feature tests,
but I've not managed to add these at quite the same rate I've been
adding wrappers.

> Perhaps, before I walk this line further: is the positionlist going to
> be useful for me?  Not having gotten far enough to see what it looks
> like, I get the impression that it is an index into the sequence of
> tokens that a file is parsed to, is that correct?

It returns the position values you passed to add_posting() for the
specified term and document, in ascending order.

> Can this number be manipulated when a file is indexed, and what would
> be the consequence of doing so?  (I.e. letting it be <line number>*100
> + <token position in current line> or something?)

Sure - you can pass whatever values you like.  Phrase searching relies
on them being adjacent for terms which should be treatable as a phrase,
but the technique you suggest would work fine.  In fact, omindex and
scriptindex do something similar to prevent phrases overlapping between
different fields.  E.g. a title of "Hello" and first paragraph starting
"World" shouldn't match a phrase search for "hello world".

It's better to make the gap size modest to help the compression (100
is reasonable).

Cheers,
    Olly



More information about the Xapian-discuss mailing list