[Xapian-discuss] faceted searches

Olly Betts olly at survex.com
Mon Sep 3 02:02:59 BST 2007


On Tue, Aug 28, 2007 at 09:47:26AM +0100, Richard Boulton wrote:
> Alexander Lind wrote:
> >Will it also be possible to facet on indexed keywords (COLORblue eg.)?  
> 
> Currently, I've added a matchspy which does this.  However, Olly has 
> expressed doubts about the wisdom of including it, I assume because it 
> is likely to be very inefficient.

Yes - it requires reading the whole termlist for ever document
considered by the matchspy, which is potentially a lot of data.

> On the other hand, I've found that it works fairly well in practice for 
> a smallish (ie, 1 million document) database, and is very convenient, so 
> he and I will have to look into whether it should be included in the 
> release.

The problem is that people will inevitably try to use it on large
databases and get poor performance and denounce Xapian as slow.

Also I've been wondering about allowing the termlist to be stored
in the same place as the document data.  Currently they're in
separate tables, both zlib compressed.  If they were stored together
then zlib should be able to compress the combination significantly
better, plus you avoid the "wood" from having two Btrees, and probably
other efficiency gains from having one fewer Btree to deal with.  We
often want the termlist data for showing which terms match a document.
The main downside I see is that you'd then need to read a document's
data to delete it.  So possibly this should be an option - some
profiling is needed I think.

Anyway, adding a matchspy which used terms would work even worse if we
made this change!

> >Any idea of when we might see Xapian 1.0.3 in the wild? :)
> 
> Not really - Olly is away again this week I believe, and it may take a 
> while to settle exactly what these new APIs should look like, so it 
> could be a couple of months - hopefully sooner than that, though.

I'd hope it won't be months!  I'm quite keen to try to get a new point
release out every one to two months as the wiki RoadMap suggests:

http://wiki.xapian.org/RoadMap

It's natural that we'll get longer gaps when people tend to take
holidays (so over the Summer and New Year), but the last release will be
two months old in two days time.

The RoadMap said "early September" for 1.0.3, though I wrote that back
in July and there's more mail and bugs to look at than I'd expected, so
I've now removed the "early".

Cheers,
    Olly



More information about the Xapian-discuss mailing list