[Xapian-discuss] Limitation of the terms size

Karel Marissens karel.marissens at gmail.com
Thu Mar 26 18:28:24 GMT 2009


On 25 Mar 2009, at 04:18, Olly Betts wrote:

> <snip>
>
>> One solution here is to to
>> what we do in omega with URIs, and use a reduced version (including a
>> hash of the complete one or the redacted information) for the term if
>> it's going over the length limit.
>
> The approach omindex takes is to not care if two long URLs with the  
> same
> first N characters have a hash collision, which beats refusing to  
> index
> them, but isn't ideal (if two documents collide, we only index one of
> them).
>
> You can also handle really long paths by splitting them over multiple
> terms, as I described here recently:
>
> http://article.gmane.org/gmane.comp.search.xapian.general/7126

I ended up splitting my paths on the '/' and saving each folder in a  
separate term with an index number.
So "/photo's/2008/christmas" gets translated to "/0/photo's", "/ 
1/2008" and "/2/christmas". The term size thus only limits the folder  
name's length, not the total path.

As I wanted to be able to get a list of all files in a directory or  
any of its subdirectories, this solution is easy. I just need to split  
the path of the base directory, apply the same "syntax" and combine  
these terms with an AND. So if I want all photo's from 2008, I can  
search for "/0/photo's AND /1/2008".

If you only need to match a whole path, this might not be the perfect  
solution, but I thought I'd share this as it might give you some  
ideas...

Karel



More information about the Xapian-discuss mailing list