[Xapian-discuss] GSoC Reunion + Documentation Sprint

Olly Betts olly at survex.com
Sat Oct 25 09:20:00 BST 2014


On Thu, Oct 23, 2014 at 04:58:49PM -0400, Bron Gondwana wrote:
> Since I'm in town for once in forever, I'd love to meet up just to say
> hello - and also to see if we can get the FastMail patches for snippet
> support either into Xapian or an alternative implementation that does
> what's needed.

So the current state is that Mihai's GSoC snippet project is merged to
trunk.  This builds a language model from the top ranked documents and
then uses that to select segments of text to form a snippet.

Gaurav's been doing some testing of this on a real system, and we found
that it can select text containing none of the query terms, and this
seems to happen a bit too often.  So he tried adding some code to select
based on both the language model and query terms, which worked quite
nicely, but the speed wasn't so great.  And while profiling, he found
that dropping the language model completely didn't affect the snippets
much, but was significantly faster.

We haven't quite decided where this leaves us (we only got to this
point about a week ago) - the language model is conceptually a nice
idea, and I'm generally a fan of approaches with a sound theoretical
basis, but if it takes significant time without giving the sort of
snippets we want, it's not a good solution.

I've not looked at Greg's snippet patches for a while, but perhaps
they're actually a better starting point after all.

> Neil and I from FastMail are around for the Inbox Love conference on
> Wednesday.

OK, I'll talk to you off list to see if we can arrange something.

Cheers,
    Olly



More information about the Xapian-discuss mailing list