[Xapian-discuss] Underscores and colons

James Aylett james-xapian at tartarus.org
Fri Oct 28 09:45:42 BST 2005


On Thu, Oct 27, 2005 at 04:50:12PM -0700, John Wang wrote:

> Is there a way to make underscores and colons in terms behave like
> letters?  It would be nice for query terms like doc_id and
> Search::Xapian to be treated as one term, not two. The results would
> be a lot more relevant for some queries.

Hi, John. Xapian itself doesn't care what terms look like - however
the commonly-used QueryParser that ships with Xapian, and the omega
indexers (scriptindex and omindex) generate their terms in a certain
way. If you want terms including underscores and colons, you'll need
to write your own word generator (that goes through text and figures
out what the words are before optionally passing them to a stemmer to
make terms), to use both while indexing and while compiling searches.

However it's worth pointing out that the query parser will often turn
queries into PHRASE queries in these cases, which is actually more
helpful - it means you can search for the fragments that make up the
larger unit, as well as for the entire unit. I can't remember in
detail how this works, however (and I don't read lemony, so I can't
figure it out from source), so someone else will have to fill you in
here.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list