[Xapian-devel] Proposed changes to omindex

Sun Aug 27 19:12:45 BST 2006

On Sun, Aug 27, 2006 at 03:27:08PM +0100, James Aylett wrote:
> Most filters would accept a patch to work from stdin if they don't
> already, and it wouldn't be too difficult to do. That would benefit
> everyone, if we run into some common ones.

Not all file formats can be sanely decoded without seeking though (and
some are more efficient to decode if you can seek).

> I've no idea whether it actually will help, in practice. I suspect
> that in most cases, it's not actually going to win you much because
> the file buffering will do the right thing already.

Indeed.

> If we retain omindex's approach for HTML (which it understands
> natively) and anything that filters to plain text, and just allow
> people to write filters that generate scriptindex input files (with
> the filter being associated with an index script), then we get more
> flexibility in omindex without having to sacrifice efficiency of
> indexing in the common case.

I'm not sure I can visualise how a merged indexer would look right
now, but I think this isn't something for the short term anyway -
sorting out utf-8 and flint are more important currently.

> > > I'd certainly favour having a way of running the query parser that
> > > didn't need R-terms, [...]
> > 
> > There already is: QueryParser::set_stemming_strategy() can be called
> > with STEM_NONE or STEM_ALL (the default is STEM_SOME).
> 
> Ah, excellent. Is this documented anywhere? Can't remember seeing it...

Hmm, only rather tersely:

http://www.xapian.org/docs/apidoc/html/classXapian_1_1QueryParser.html#c7dc3b55b6083bd3ff98fc8b2726c8fd

I'll try to flesh that out.

Cheers,
    Olly