[Xapian-discuss] Spelling and term prefixes

Olly Betts olly at survex.com
Fri Jun 12 02:08:44 BST 2009


On Thu, Jun 11, 2009 at 10:01:52PM +0100, Simon Roe wrote:
> According to http://xapian.org/docs/spelling.html, "Currently spelling
> correction ignores prefixed terms.".
> 
> I'd like to know why this is, and if there is a workaround for it.

It wasn't totally obvious how it should operate, so we punted on it at
the time.  I guess someone needs to write a coherent spec for how this
should work and then implement it.

> For example, if I index 2 distinct fields, with two prefixes ('title'
> and 'body'), I'd like to be able to do (in python):
> 
> query = qp.parse_query(query_string, DEFAULT_SEARCH_FLAGS, 'title')
> 
> And have spelling work on it.  From the end users point of view, this
> makes sense, but I understand it might be tricky tracking the
> spellings for each prefix.

That's not hard actually, but you seem to be assuming that the spellings
should just be tracked separately per prefix.

In some cases, that doesn't make much sense - you'll probably get better
results by using the same dictionary for two free-text fields in the
same language.  Certainly if you're loading in a static dictionary, it's
really unhelpful if you are forced to load it once for each prefix you
want to use it for.

Conversely, for a field in a different language, or a field with a
distinctly different vocabulary (e.g. "author name"), then whether
tracking spelling dynamically or from a static dictionary you would want
a per-prefix dictionary.

Also, some fields may not be appropriate for spelling correction.

Cheers,
    Olly



More information about the Xapian-discuss mailing list