help improving relevance of snippets displayed by Omega

Matthew Somerville matthew at mysociety.org
Mon Sep 21 09:30:52 BST 2020


Whoops, forgot to say, as Olly said, much of that could probably now be
simplified with Xapian's snippet() function, which I can only assume did
not exist back when all this was written! :)

ATB,
Matthew

On Mon, 21 Sep 2020 at 09:28, Matthew Somerville <matthew at mysociety.org>
wrote:

> Hi,
>
> Ha, I was reading this thread thinking TheyWorkForYou (which I help
> maintain) does highlight terms wherever they are, and then you mentioned it
> :)
> The code is open source; it is quite old and probably hair-raising, but as
> you say it does basically do what you want.
> Our Xapian database stores terms/boolean terms/values for the text, and
> for the document itself stores only an identifier.
> It works by doing the Xapian search, then fetching the resultant IDs from
> the database, then it boils down to calling
> prepare_search_result_for_display on each result:
>
> https://github.com/mysociety/theyworkforyou/blob/master/www/includes/easyparliament/hansardlist.php#L1279-L1306
> Which uses two functions to then work out the extract,
> position_of_first_word:
>
> https://github.com/mysociety/theyworkforyou/blob/master/www/includes/easyparliament/searchengine.php#L562
> and highlight:
>
> https://github.com/mysociety/theyworkforyou/blob/master/www/includes/easyparliament/searchengine.php#L475
> They stem the entered words and loop through the speeches to find the
> match/thing to highlight.
>
> ATB,
> Matthew
>
> On Sun, 20 Sep 2020 at 03:57, Michael Decerbo <michaeldecerbo at gmail.com>
> wrote:
>
>> In general, I'm wondering how best to use Xapian so that, at query time,
>> my
>> application can display an excerpt that is relevant to the query, not a
>> sample chosen at indexing time without regard to the query that may or may
>> not contain the query term(s). For example, TheyWorkForYou.com is listed
>> on
>> xapian.org as a site using Xapian, and when I enter a single-term query
>> on
>> that site the document excerpts provided as part of the search results
>> invariably include highlighted words, possibly stemmed, responsive to the
>> query. That's the effect I would like to achieve.
>>
>> If you can think of any sample code that I should refer to, or even if you
>> could just suggest the broad outlines of a solution, I would be very
>> grateful.
>>
>


More information about the Xapian-discuss mailing list