[Xapian-discuss] Is there a 64 character term size limit? In Ruby bindings?

Olly Betts olly at survex.com
Tue Jun 8 02:32:41 BST 2010


On Mon, Jun 07, 2010 at 07:38:08PM +0100, Francis Irving wrote:
> I've just found some items in my Xapian database which aren't being
> indexed, when the terms are quite long. 
> 
> Example term:
> Frotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust
> 
> It represents that the Freedom of Information request was made to a
> particular public body. It results in pages like this not correctly
> showing results:
> 
> http://www.whatdotheyknow.com/body/rotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust
> 
> As far as I can tell the terms aren't being indexed when they are
> longer than 64 characters. They don't get put in the Xapian database
> at all.

TermGenerator ignores terms over that size to avoid indexing a lot of junk
terms if it gets fed things like base64 data or uuencode.

This term looks like a filtering term, in which case it would make more
sense to add it with Document:::add_term().  That doesn't have a limit
on term size itself, though the backends have a limit of around 245
bytes.

Cheers,
    Olly



More information about the Xapian-discuss mailing list