[Xapian-discuss] Index indexed words

Michel Pelletier pelletier.michel at gmail.com
Tue Jan 19 17:44:45 GMT 2010


If you just want to partially match the end of a word, then
FLAG_PARTIAL works awesome (we use it to great effect for much the
same thing).  However, if you want to match substrings within a term,
where 'abc' matches against 'fooabcbar', then you will need to take an
approach similar to what double suggested.  The book "Managing
Gigabytes" has a good description of a solution where you split terms
into 'bigrams' so that, for example, the term 'fooabcbar' becomes [$f
fo oo oa ab bc cb ba ar r$] where $ indicates the beginning or end of
a word.  Then you can split your search term similarly, either with or
without the $ marker depending on your needs.  for pure subscript
searching, 'abc' becomes ['ab' 'bc'] which matches two of the bigrams
in the original term.

MG also goes into some details on how you can use this method to do
pretty nice general wildcard searching like *abc*, f?o*, *abcba?, etc.

-Mike

On Mon, Jan 18, 2010 at 1:04 PM, double <ninive at gmx.at> wrote:
> Hello,
>
> We would like to create Google or Firefox like "search hints".
> If someone types "abc", the search system should name
> some possible hints.
>
> I think, Firefox does it by indexing 3-characters of the domain
> name. If you enter parts, you get some hints.
>
> Thank you very much
> Marcus
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list