[Xapian-discuss] Tag-based filesystem with xapian, advice?

Olly Betts olly at survex.com
Fri Mar 6 02:00:31 GMT 2009


On Wed, Mar 04, 2009 at 11:05:13PM +0100, Karel Marissens wrote:
> 1) I read there's a limit of 240 characters on a term. In theory, a  
> path might be longer. Any idea's how to deal with that?

It's currently 245 for flint, but that doesn't really help if there are
paths on potentially any length.

What you can do is split long paths over several terms:

* if the path length is < 240, index and search using: "P" + path

* if the path length is >= 240 but < 480, index:
  "XA" + path.substr(0, 240)
  "XB" + path.substr(241)
  and search for those same terms ANDed together.

* if the path length is >= 480, do a similar thing but split over more
  terms...

> 2) Is there a built-in method somewhere to find the total amount of  
> terms in the database? For Enquire::get_eset() I just want ALL terms.

There isn't ("delve -v DB" will report it, but it iterates over all of
them to calculate it which isn't a good approach in general).

For calling get_eset() you can just pass 0xffffffff as that's the max
value of Xapian::termcount unless you've recompiled with larger types.
I suspect you'll run into problems elsewhere if you actually get back 4
billion terms anyway!

Cheers,
    Olly



More information about the Xapian-discuss mailing list