[Xapian-discuss] Adding terms of more than one word with PHP bindings

James Aylett james-xapian at tartarus.org
Sun Sep 14 19:14:12 BST 2008


On Sun, Sep 14, 2008 at 10:00:12AM -0500, Yannick Warnier wrote:

[Multi-word terms]
> My use case is that I offer a set of scripts whereby the users can add
> "tags" to the documents they index. These tags are then kept in the
> Xapian database using the terms feature.
> Some of these tags are using multiple words (let's say "summer
> holiday"). I then offer a search interface which allow for a search
> based on a combination of tags (boolean search) and normal (statistical)
> search.
> The tags are stored correctly using the XapianDocument::add_term()
> method. They are retrieved correctly using the
> XapianDatabase::allterms_begin() method.
>
> However, when trying to query the Xapian database for my search string
> (see code appended below), the search string syntax (the first parameter
> of my xapian_query function), something like 
> 
>   sea sex sun T:summer holiday T:beach
> 
> doesn't get the tag "summer holiday".

The query parser isn't really going to help you here, because of word
splitting. The trouble is, there's no difference in syntax between the
desired semantic 'tag "summer holiday"' and the desired semantic 'tag
"summer" AND holiday'. Or rather, you want the syntax above to mean
the former, but it actually means the latter.

You may be better off word splitting your tag field and treating it as
probabilistic rather than boolean, ie making it a freetext metadata
field rather than a tag. (At least from the point of view of search.)
You can do that using the TermGenerator, if that's what you're using
for your regular text.

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list