[Xapian-discuss] [ANN] mu-0.2, maildir indexer/searcher with xapian support

Olly Betts olly at survex.com
Sat Sep 13 05:33:26 BST 2008


On Thu, Sep 11, 2008 at 08:17:22PM +0300, djcb wrote:
> b) In the not-too-distant future I'd like to be able to generate some
>    aggregate information about queries; so after you search for all
>    mails containing 'wombat OR unicorn', you could get information like:
>    - the oldest/newest mail that matched; the average size
>    - number of messages per sender;
>    - number of messages per thread;
>    - average number of To:, Cc: recipients

You can do these with a MatchSpy, but that work hasn't been released yet.

One advantage of this approach is you don't need to actually form the
full result set - you're scanning each match as it is found, and if it
doesn't rank highly enough, it can then simply be discarded.

>    Not sure if all of these are so useful, but in general SQL seems a
>    bit better at expressing non-literal search criteria, ie. searches
>    that depend on search results -- joins and so on.

Again unreleased, but PostingSource allows a sort of join-like operation
with an external source.

> (BTW: post-processing is pretty easy; I store the SQLite database IDs in
> the Xapian DB, and simply add the ID of match docs in a 
>     'WHERE messsage.id IN (....)')

That's going to suck when you have millions of matching documents
though.

Cheers,
    Olly



More information about the Xapian-discuss mailing list