[Xapian-tickets] [Xapian] #211: Dynamic summaries / snippets

Xapian nobody at xapian.org
Sat Dec 26 01:30:09 GMT 2015


#211: Dynamic summaries / snippets
-------------------------+------------------------------
 Reporter:  olly         |             Owner:  olly
     Type:  enhancement  |            Status:  assigned
 Priority:  normal       |         Milestone:  1.3.4
Component:  Library API  |           Version:  SVN trunk
 Severity:  minor        |        Resolution:
 Keywords:               |        Blocked By:
 Blocking:               |  Operating System:  All
-------------------------+------------------------------

Comment (by olly):

 Current status:

 The approach from the paper Mihai doesn't directly consider the query, but
 instead looks at the top few documents matched and builds a document
 language model.  In theory this is a nice approach - it has a sound
 theoretical basis, and it will consider interesting terms outside of the
 query.  In practice, this turns out to have a serious drawback - it
 sometimes selects a snippet which doesn't contain any of the query terms,
 and users find that surprising (quite reasonably I think).  It's also
 slower than is ideal.

 We also had a patch for generating snippets from fastmail, but that has
 different drawbacks - for example: its segmenting of text doesn't exactly
 match what `TermGenerator` produces, so it fails to highlight in some
 cases; also it considers each term in turn, so doesn't prefer a snippet
 containing more terms from the query.

 So I've taken the best ideas from each, and implemented a new snippet
 generating algorithm.  A key design choice is that it makes a single pass
 over the text we're generating the snippet from (with scope to terminate
 early).  It prefers occurrences of the query terms in contexts containing
 "interesting" non-query terms.  And it also handles exact phrases and
 wildcards (both selecting snippets based on them, and highlighting them).

 This code is in production with one of my clients and seems to be working
 well, so I think we should merge this for 1.4.0.  Leaving pegged on 1.3.4
 for now, but I'm happy to slip it to a later 1.3.x if it's holding up
 1.3.4.

--
Ticket URL: <http://trac.xapian.org/ticket/211#comment:12>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list