help improving relevance of snippets displayed by Omega

Michael Decerbo michaeldecerbo at gmail.com
Sun Sep 20 03:56:30 BST 2020


Olly,

Thanks again very much for helping me improve my understanding of Xapian
and Omega. Thanks especially for pointing out that my idea of trying to
generate a snippet from stemmed text lacking capitalization and punctuation
would probably not produce a user-friendly result.

But I'm still doubtful that expanding the sample size could be the right
way to obtain excerpts from the document that are relevant to the query.
Suppose that the sample size were even as big as 10% of the average
document size, queries contained only a single term, and a typical query
term appeared on average only once per document. In that case, it seems to
me that nine out of ten samples would not contain the single query term, so
that nine times out of ten the snippet generated from the sample would not
contain the query term. Is my thinking accurate about this, or am I again
missing something?

In general, I'm wondering how best to use Xapian so that, at query time, my
application can display an excerpt that is relevant to the query, not a
sample chosen at indexing time without regard to the query that may or may
not contain the query term(s). For example, TheyWorkForYou.com is listed on
xapian.org as a site using Xapian, and when I enter a single-term query on
that site the document excerpts provided as part of the search results
invariably include highlighted words, possibly stemmed, responsive to the
query. That's the effect I would like to achieve.

If you can think of any sample code that I should refer to, or even if you
could just suggest the broad outlines of a solution, I would be very
grateful.

Thanks again!


Michael


>
>
>


More information about the Xapian-discuss mailing list