[Xapian-discuss] Is there a 64 character term size limit? In Ruby bindings?
Olly Betts
olly at survex.com
Tue Jun 8 02:32:41 BST 2010
On Mon, Jun 07, 2010 at 07:38:08PM +0100, Francis Irving wrote:
> I've just found some items in my Xapian database which aren't being
> indexed, when the terms are quite long.
>
> Example term:
> Frotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust
>
> It represents that the Freedom of Information request was made to a
> particular public body. It results in pages like this not correctly
> showing results:
>
> http://www.whatdotheyknow.com/body/rotherham_doncaster_and_south_humber_mental_health_nhs_foundation_trust
>
> As far as I can tell the terms aren't being indexed when they are
> longer than 64 characters. They don't get put in the Xapian database
> at all.
TermGenerator ignores terms over that size to avoid indexing a lot of junk
terms if it gets fed things like base64 data or uuencode.
This term looks like a filtering term, in which case it would make more
sense to add it with Document:::add_term(). That doesn't have a limit
on term size itself, though the backends have a limit of around 245
bytes.
Cheers,
Olly
More information about the Xapian-discuss
mailing list