[Xapian-discuss] Not separating words when parsing HTML inOmega

Crowell, Brian BCrowell at barbnet.com
Thu Feb 10 17:08:48 GMT 2011


I don't suppose that attachment went through....?

--Brian


> -----Original Message-----
> From: xapian-discuss-bounces at lists.xapian.org [mailto:xapian-discuss-
> bounces at lists.xapian.org] On Behalf Of Crowell, Brian
> Sent: Thursday, February 10, 2011 10:43 AM
> To: Xapian Discussion
> Subject: Re: [Xapian-discuss] Not separating words when parsing HTML
> inOmega
> 
> > On Wed, Feb 09, 2011 at 03:11:18PM -0600, Crowell, Brian wrote:
> > > We noticed, when indexing a Word 2007 document, that two words in
> > > adjacent paragraphs got melded together in the Xapian database.
For
> > > example:
> >
> > What version of Omega is this with?  I have a feeling I fixed
> something
> > to do with running words together fairly recently, but I'm not
seeing
> > it in the ChangeLog.
> 
> It was 1.2.4.
> 
> 
> > > I could send a sample document that produces the error, if that
> > helps.
> >
> > That would be useful if you have something you don't mind making
> > public.
> > Bonus points if you're happy to license it for use in a testsuite!
> 
> Feel free, it's just a two-liner. On ours, "invspread" wasn't indexed,
> but "searchinginvspread" was. (see attached)
> 
> --Brian Crowell
>   Developer, Barbnet Investments




More information about the Xapian-discuss mailing list