[Xapian-tickets] [Xapian] #211: Dynamic summaries / snippets
Xapian
nobody at xapian.org
Sat Dec 26 01:30:09 GMT 2015
#211: Dynamic summaries / snippets
-------------------------+------------------------------
Reporter: olly | Owner: olly
Type: enhancement | Status: assigned
Priority: normal | Milestone: 1.3.4
Component: Library API | Version: SVN trunk
Severity: minor | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+------------------------------
Comment (by olly):
Current status:
The approach from the paper Mihai doesn't directly consider the query, but
instead looks at the top few documents matched and builds a document
language model. In theory this is a nice approach - it has a sound
theoretical basis, and it will consider interesting terms outside of the
query. In practice, this turns out to have a serious drawback - it
sometimes selects a snippet which doesn't contain any of the query terms,
and users find that surprising (quite reasonably I think). It's also
slower than is ideal.
We also had a patch for generating snippets from fastmail, but that has
different drawbacks - for example: its segmenting of text doesn't exactly
match what `TermGenerator` produces, so it fails to highlight in some
cases; also it considers each term in turn, so doesn't prefer a snippet
containing more terms from the query.
So I've taken the best ideas from each, and implemented a new snippet
generating algorithm. A key design choice is that it makes a single pass
over the text we're generating the snippet from (with scope to terminate
early). It prefers occurrences of the query terms in contexts containing
"interesting" non-query terms. And it also handles exact phrases and
wildcards (both selecting snippets based on them, and highlighting them).
This code is in production with one of my clients and seems to be working
well, so I think we should merge this for 1.4.0. Leaving pegged on 1.3.4
for now, but I'm happy to slip it to a later 1.3.x if it's holding up
1.3.4.
--
Ticket URL: <http://trac.xapian.org/ticket/211#comment:12>
Xapian <http://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list