help improving relevance of snippets displayed by Omega

Olly Betts olly at survex.com
Sat Sep 19 06:31:37 BST 2020


On Fri, Sep 18, 2020 at 08:33:49PM -0400, Michael Decerbo wrote:
> But expanding the sample seems like the wrong solution. Is there a way to
> instead pass a hit or hits from the document to snippet generation?

I'm not sure what you have in mind, but the only way I can see that
working is if it read all the positional data for all the terms in
the document, and then sorted it to essentially reconstruct the
document text.  However (a) that gives you the text without
capitalisation and without punctuation which doesn't look very good
and (b) it tends to be rather slow because the positional data is
primarily ordered by document for efficient searching, so there's
poor locality of reference for this use (and large documents would
make that worse).

The "xapian-pos" debug tool effectively does this text reconstruction
to help visualise the positional data, so you can see what the
reconstructed text would look like using that - e.g.:

Gap of 1 unused positions
1       Sbath
2       Ssomerset
3       bath
4       somerset
5       coordinates
6       51
7       23
8       n
9       2
10      22
11      w
12      51.38
13      n
14      2.36
15      w
16      51.38
17      2.36
18      bath
19      ˈbɑːθ
20      or
21      ˈbæθ
22      latin
23      aquae
24      sulis
25      welsh
26      caerfaddon
27      is
28      a
29      city
...

I've tried this approach on a project, but it didn't work out.  Storing
a larger sample is definitely what I'd recommend (or if you have the
text stored in another system, you could pass that to the
MSet::snippet() method, but there isn't a way to do that with omega
unless you modify the code).

Cheers,
    Olly



More information about the Xapian-discuss mailing list