[Xapian-discuss] GSoC Reunion + Documentation Sprint

Bron Gondwana brong at fastmail.fm
Sat Oct 25 17:04:48 BST 2014


On Sat, Oct 25, 2014, at 04:20 AM, Olly Betts wrote:
> On Thu, Oct 23, 2014 at 04:58:49PM -0400, Bron Gondwana wrote:
> > Since I'm in town for once in forever, I'd love to meet up just to say
> > hello - and also to see if we can get the FastMail patches for snippet
> > support either into Xapian or an alternative implementation that does
> > what's needed.
> 
> So the current state is that Mihai's GSoC snippet project is merged to
> trunk.  This builds a language model from the top ranked documents and
> then uses that to select segments of text to form a snippet.
> 
> Gaurav's been doing some testing of this on a real system, and we found
> that it can select text containing none of the query terms, and this
> seems to happen a bit too often.  So he tried adding some code to select
> based on both the language model and query terms, which worked quite
> nicely, but the speed wasn't so great.  And while profiling, he found
> that dropping the language model completely didn't affect the snippets
> much, but was significantly faster.
> 
> We haven't quite decided where this leaves us (we only got to this
> point about a week ago) - the language model is conceptually a nice
> idea, and I'm generally a fan of approaches with a sound theoretical
> basis, but if it takes significant time without giving the sort of
> snippets we want, it's not a good solution.

I absolutely agree about sound theoretical models.  They make reasoning about everything so much easier.

> I've not looked at Greg's snippet patches for a while, but perhaps
> they're actually a better starting point after all.

I'm honestly not sure - but I know something is often better than nothing.  I'm quite embarrassed on my own behalf at letting the Cyrus IMAPd project go 4 years without having a major stable release, because I wanted everything to be perfect first.  I had to take a pretty sharp axe to the list of features I wanted to add, and work out what was stable enough for inclusion.  Nearly there now.

But I would love to have the Xapian based search engine available for everyone, particularly since iOS 7+ devices just do a BODY search on every mailbox to IMAP servers.  That's kinda abusive, but we work around it at FastMail by converting the body searches in FUZZY BODY if we detect iOS clients, and then the Xapian database gets used, and all is good.

So if it's possible to get Mihai's work to a stage that it's production ready soon, I'd be happy to switch - but if not, I'd love you to include Greg's work, so we can make the next major Cyrus release have good search, and build against Xapian upstream rather than needing our own patched version.

Cheers,

Bron.

-- 
  Bron Gondwana
  brong at fastmail.fm



More information about the Xapian-discuss mailing list