[Xapian-discuss] Get term from document by position

Sun Aug 2 16:44:08 BST 2015

On 26 Jul 2015, at 19:39, john.alveris at Safe-mail.net wrote:

>> Actually, when I run it I get 0 matches, which would explain why you’re just getting the start of the document. However if I adjust things (match the stemming strategy for TermGenerator to that for QueryParser), it still gives me the opening rather than a useful snippet.
> 
> Sorry, my mistake. The modified test.cpp file should [have]
>    indexer.set_stemming_strategy(Xapian::TermGenerator::STEM_ALL_Z)

John — that gave a single match, as expected. I played around with the Snipper (under python, but that won’t make a difference), indexing each page as a separate document, and it does give query-aware snippets, however:

1. It only provides one, where your approach can provide “first instance…second instance…” kinds of snippet (which in some circumstances is considerably more useful).

2. It didn’t reliably find what I’d consider the “best” single snippet.

I don’t understand the approach that’s being used in Snipper so I don’t know if it’s a question of tuning the approach, making some algorithmic part of it more flexible or swappable, or if we need multiple ways of attacking the problem dependent on details of the data and queries; although from looking at the code it does share some of the things that you’re doing, and if you haven’t looked it the source it’s probably worth it to see how it works with term positions (even though it may turn out to be no more efficient that what you’re doing).

J

-- 
 James Aylett, occasional trouble-maker
 xapian.org