[Xapian-discuss] Match positions of a queryresult

Olly Betts olly at survex.com
Wed May 22 04:28:57 BST 2013


On Wed, May 15, 2013 at 04:11:07AM +0200, Cséri Tamás wrote:
> I've indexed many text files (using a TermGenerator from std::string), each
> document in my database is a single file on the disk.
> The search works pretty well and finds the files that match the query
> string, but I can't figure out how I can determine the location of the
> actual matched terms. I want to show the user the row and column number of
> the match (to somehow highlight the match).
>
> So far I haven't found a solution. The closest I've got is the
> Enquire::get_matching_terms_* functions but this does not really work for
> phases and I'm still far from character positions.

Xapian doesn't store character positions (only word positions), so it
isn't able to tell you character positions for matches.

We could potentially record the word position at which we found a phrase
match, but we stop once we find the first instance of a phrase match, so
you'd only get one matching instance of the phrase per document, or else
the matcher would have to keeping looking for matching phrases, even for
documents which don't ultimately make the top N, which would mean slower
searches.

> I hope someone can give me some hints where to look to begin with.

Generally people just reparse the document to highlight matches - this
has the benefit that you don't need to worry about the offsets being
wrong if the document has been updated on disk since it was last
indexed.

You might find the resources linked to from here useful if you're
wanting to highlight matches:

http://trac.xapian.org/wiki/FAQ/Snippets

Cheers,
    Olly



More information about the Xapian-discuss mailing list