help improving relevance of snippets displayed by Omega
Olly Betts
olly at survex.com
Sat Sep 19 06:31:37 BST 2020
On Fri, Sep 18, 2020 at 08:33:49PM -0400, Michael Decerbo wrote:
> But expanding the sample seems like the wrong solution. Is there a way to
> instead pass a hit or hits from the document to snippet generation?
I'm not sure what you have in mind, but the only way I can see that
working is if it read all the positional data for all the terms in
the document, and then sorted it to essentially reconstruct the
document text. However (a) that gives you the text without
capitalisation and without punctuation which doesn't look very good
and (b) it tends to be rather slow because the positional data is
primarily ordered by document for efficient searching, so there's
poor locality of reference for this use (and large documents would
make that worse).
The "xapian-pos" debug tool effectively does this text reconstruction
to help visualise the positional data, so you can see what the
reconstructed text would look like using that - e.g.:
Gap of 1 unused positions
1 Sbath
2 Ssomerset
3 bath
4 somerset
5 coordinates
6 51
7 23
8 n
9 2
10 22
11 w
12 51.38
13 n
14 2.36
15 w
16 51.38
17 2.36
18 bath
19 ˈbɑːθ
20 or
21 ˈbæθ
22 latin
23 aquae
24 sulis
25 welsh
26 caerfaddon
27 is
28 a
29 city
...
I've tried this approach on a project, but it didn't work out. Storing
a larger sample is definitely what I'd recommend (or if you have the
text stored in another system, you could pass that to the
MSet::snippet() method, but there isn't a way to do that with omega
unless you modify the code).
Cheers,
Olly
More information about the Xapian-discuss
mailing list