[Xapian-discuss] Is there a 64 character term size limit? In Ruby bindings?

Francis Irving francis at flourish.org
Wed Jun 16 19:23:48 BST 2010


On Tue, Jun 08, 2010 at 02:32:41AM +0100, Olly Betts wrote:
> On Mon, Jun 07, 2010 at 07:38:08PM +0100, Francis Irving wrote:
> > I've just found some items in my Xapian database which aren't being
> > indexed, when the terms are quite long. 
> > 
> > Example term:
> > Frotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust
> > 
> > It represents that the Freedom of Information request was made to a
> > particular public body. It results in pages like this not correctly
> > showing results:
> > 
> > http://www.whatdotheyknow.com/body/rotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust
> > 
> > As far as I can tell the terms aren't being indexed when they are
> > longer than 64 characters. They don't get put in the Xapian database
> > at all.
> 
> TermGenerator ignores terms over that size to avoid indexing a lot of junk
> terms if it gets fed things like base64 data or uuencode.
> 
> This term looks like a filtering term, in which case it would make more
> sense to add it with Document:::add_term().  That doesn't have a limit
> on term size itself, though the backends have a limit of around 245
> bytes.

Just to say - thanks Olly for working that out. I've changed to use
Document::add_term and it works perfectly now.

Francis



More information about the Xapian-discuss mailing list