[Xapian-discuss] Search::Xapian and term positions
Olly Betts
olly at survex.com
Wed Jan 19 21:58:10 GMT 2005
On Wed, Jan 19, 2005 at 10:26:00PM +0100, Arne Georg Gleditsch wrote:
> Has anyone seen this before? This is Xapian 0.8.5 and Search::Xapian
> 0.8.4.
I've not seen it. Ideally all the methods should have feature tests,
but I've not managed to add these at quite the same rate I've been
adding wrappers.
> Perhaps, before I walk this line further: is the positionlist going to
> be useful for me? Not having gotten far enough to see what it looks
> like, I get the impression that it is an index into the sequence of
> tokens that a file is parsed to, is that correct?
It returns the position values you passed to add_posting() for the
specified term and document, in ascending order.
> Can this number be manipulated when a file is indexed, and what would
> be the consequence of doing so? (I.e. letting it be <line number>*100
> + <token position in current line> or something?)
Sure - you can pass whatever values you like. Phrase searching relies
on them being adjacent for terms which should be treatable as a phrase,
but the technique you suggest would work fine. In fact, omindex and
scriptindex do something similar to prevent phrases overlapping between
different fields. E.g. a title of "Hello" and first paragraph starting
"World" shouldn't match a phrase search for "hello world".
It's better to make the gap size modest to help the compression (100
is reasonable).
Cheers,
Olly
More information about the Xapian-discuss
mailing list