[Xapian-discuss] Getting spelling to work
James Aylett
james-xapian at tartarus.org
Tue Jan 8 21:15:05 GMT 2008
On Tue, Jan 08, 2008 at 03:55:07PM -0500, Deron Meranda wrote:
> "The suggestions are generated dynamically from the
> data that has been indexed,"
If you generate terms using the TermGenerator, it can add them to the
spelling dictionary automatically.
> This seems to imply that the term/postings are used as the
> basis for spelling, but in reality it looks like the spelling "index"
> is actually quite separate from the term/positing index.
> Is that true? And why?
Yes, it's separate; you might not want it to be automatically filled
with every word generated from your corpus (for instnace if your
corpus has lots of spelling mistakes in it).
> So assume I want the spelling dictionaryto be based upon all the
> terms in the documents (and not some predefined dictionary).
That will depend on your application, but that's a reasonable approach
to take.
> How does the spelling word "frequency" affect things? I would
> assume that if there are multiple spelling suggestions, that the
> one with the highest frquency would be returned (as the most
> likely spelling). This is sort of implied but not actually stated
> anyplace I can find.
Pass. Richard?
> Then, most importantly, how does one then populate the spelling
> dictionary when indexing documents? Since every time you do
> add_spelling() the frequency is incremented; what happens if I
> want to re-index some document (or remove a document)? For
> the terms and postings, this is a valid thing to do. Re-indexing
> a document as many times as you want doesn't change things.
> But if you're also adding it's terms to the spellings, then re-indexing
> can seriously skew the frequencies it would seem.
Umm, no idea. Richard?
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list