[Xapian-discuss] Always returning ALL the documents matching a query

James Aylett james-xapian at tartarus.org
Mon Dec 29 16:19:08 GMT 2008

On Mon, Dec 29, 2008 at 10:39:23AM -0500, tata 668 wrote:

> I'm still a litlle bit confused. What are the diferrence between a
> "value", a "posting" and a "term"?

See <http://www.xapian.org/docs/glossary.html> for these (and more).

> - Being able to restrict the search on multiple criterias?

Restrictions in searches are (usually) done using terms.

> I have to prefixe all the terms in the documents (all the words?) by
> a prefixe and then specify this prefixe to the queryparser before
> searching?

You don't have to use a prefix in all cases; it is common to have
"general" text turned into unprefixed terms when it isn't
stemmed. The QueryParser can both be given a default prefix to apply to the
query it's building, and given a list of prefixes that can be used
explicitly in the query (so you can do things like author:Orwell but
store the term in the database as Aorwell, for instance).

for more information.

> - Being able to sort the result?
> I have to add "values" to the documents and then use a "sorter" to
> sort the documents by specifying which "value" to use for the
> sorting?


> I'm really not sure. I would like to see an example of this kind of
> use case in the "quickstart" guide! :-)

The quickstart guide isn't really the place for this, because it
should be short and get you up and running with basic Xapian usage
quickly. However it would be great to have example code showing how to
use the various more powerful features of Xapian.

You can add notes about documentation that doesn't exist but you feel
should do at <http://trac.xapian.org/wiki/MissingDocumentation>; for
sample code, anyone can link in examples at
<http://trac.xapian.org/wiki/SampleCode> -- we're aware that it's not
as easy to find as it could be.

There's actually a ticket for more sample code:
<http://trac.xapian.org/ticket/281>, so noting specific things you'd
like to see sample code for there would be helpful.

> The following example would be really appreciated. For a forum
> search page, how to:
> - Index 3 forum posts (in a way that the following search is possible)
> - Find which post(s) contain the phrase "hello word", have been posted
>   by user "john doe" and have been created february 12th 2008.
> - Return them sorted by their last modification date (may be different
>   than the creation date) then by their id.

It could be a bit simpler, by not thinking in terms of 'forum
posts'. Possibly the easiest way of doing that would be to provide a
scriptindex index script alongside the search code, or to extend
simpleindex to store the needed values and terms.

One of the difficulties here is that there are different ways of
tackling the specific problem above. If you're only ever searching for
posts on a single date, you might tackle that part differently to if
you also need to search across a date range. Similarly, user "john
doe" can be represented in different ways depending on how your system
works (is the user name invariant? the displayed name? just some
opaque user identifier?). For this reason, it might be better to break
that example into several different pieces: ordering by modification
date, searching by date, searching by date range, and restricting to
specific creating users in this case.

(To get you started, for those four cases you may want to use: sorting
by date/id using a Sorter object; D-prefixed terms; values and a
DateValueRangeProcessor; A-prefixed terms, although you'll need to
decide what you put into the terms, and you may need to do build a
more complex Query object than the QueryParser will do for you. I
might knock out a couple of examples if I can reduce them to simple
enough problems, but I'm in work-avoidance mode so I really should get
on with something else :-)

I'm aware this sounds slightly negative, but this is part of the
problem with providing sample code that is simple enough to understand
quickly, when people generally have more complex problems they are
actually trying to solve. Feel free to keep bugging me about it though.


  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org

More information about the Xapian-discuss mailing list