[Xapian-discuss] Is there a 64 character term size limit? In Ruby bindings?
Francis Irving
francis at flourish.org
Wed Jun 16 19:23:48 BST 2010
On Tue, Jun 08, 2010 at 02:32:41AM +0100, Olly Betts wrote:
> On Mon, Jun 07, 2010 at 07:38:08PM +0100, Francis Irving wrote:
> > I've just found some items in my Xapian database which aren't being
> > indexed, when the terms are quite long.
> >
> > Example term:
> > Frotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust
> >
> > It represents that the Freedom of Information request was made to a
> > particular public body. It results in pages like this not correctly
> > showing results:
> >
> > http://www.whatdotheyknow.com/body/rotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust
> >
> > As far as I can tell the terms aren't being indexed when they are
> > longer than 64 characters. They don't get put in the Xapian database
> > at all.
>
> TermGenerator ignores terms over that size to avoid indexing a lot of junk
> terms if it gets fed things like base64 data or uuencode.
>
> This term looks like a filtering term, in which case it would make more
> sense to add it with Document:::add_term(). That doesn't have a limit
> on term size itself, though the backends have a limit of around 245
> bytes.
Just to say - thanks Olly for working that out. I've changed to use
Document::add_term and it works perfectly now.
Francis
More information about the Xapian-discuss
mailing list