[Xapian-discuss] Not separating words when parsing HTML inOmega

Crowell, Brian BCrowell at barbnet.com
Thu Feb 10 16:43:25 GMT 2011


> On Wed, Feb 09, 2011 at 03:11:18PM -0600, Crowell, Brian wrote:
> > We noticed, when indexing a Word 2007 document, that two words in
> > adjacent paragraphs got melded together in the Xapian database. For
> > example:
> 
> What version of Omega is this with?  I have a feeling I fixed
something
> to do with running words together fairly recently, but I'm not seeing
> it in the ChangeLog.

It was 1.2.4.


> > I could send a sample document that produces the error, if that
> helps.
> 
> That would be useful if you have something you don't mind making
> public.
> Bonus points if you're happy to license it for use in a testsuite!

Feel free, it's just a two-liner. On ours, "invspread" wasn't indexed,
but "searchinginvspread" was. (see attached)

--Brian Crowell
  Developer, Barbnet Investments



More information about the Xapian-discuss mailing list