[Xapian-discuss] Autocomplete for phrases

Nick Griffiths nick at playlouder.com
Mon Dec 7 17:18:41 GMT 2009


Hi all.

We are trying to use Xapian to index the titles of around ten million
documents (music catalogue) and then do auto-complete style search on
this index.  After poking around we can't quite get the combination of
QueryParser flags and search query that will give back the results we
want.  The closest we have can take a very long time (because it is
searching for, and building a list of, possible completions?) and gives
back results unnecessary results.  

We want to allow searching via an incomplete chunk of the title,
starting from the beginning of any word in the title and ending
anywhere. For example, say we have a set of document titles, like so:

[1] - "welcome to the jungle",
[2] - "welcome to thesaurus",
[3] - "welcome all to the jungle",
[4] - "all jungle"

When passing the following queries to parse_query with QUERY_PARTIAL, we
are seeing...

w                     -> 1, 2, 3
welcome to the        -> 1, 2, *3
jung                  -> 4, 1, 3
all jungle            -> 4, *3, *1

Which for our purposes is returning non-matching results (marked with an
asterisk).  If we switch to using the QUERY_PHRASE flag, we avoid the
extraneous results, but we lose the partial results.  Also, with a large
database (4mil documents) the first query takes an incredibly long time.

One idea is to add all possible incomplete versions of each term to the
document's term list.  For the term 'welcome' we'd also add  'Sw',
'Swe', 'Swel' ... 'Swelcome'.  That way we can adjust the user's query
to insert the prefix on the final word and then search via a standard
phrase search.  Not sure what the performance or index size implications
of this might be, but we'd be willing to have a shot if this is a good
approach.


Any advice would be appreciated,
Nick




More information about the Xapian-discuss mailing list