[Xapian-discuss] omega number range searches - query

Olly Betts olly at survex.com
Tue Sep 4 02:17:41 BST 2007


On Tue, Jul 24, 2007 at 04:57:14AM +0100, Richard Boulton wrote:
> Eike wrote:
> >These text files contain various numbers of dates in the form of YYYY, 
> >mostly in free-text rather than formatted fields. I was hoping to be
> >able to use the cgi to limit searches of these documents by
> >date/number ranges e.g. "catalogue 1890..1910", but I not getting
> >what I expected.

We don't attempt to pull dates out of free text.  It might be possible
to for full dates, but I suspect it wouldn't work well for just years
- there are many other reasons why a 4 digit number might appear in a
document.

> >Can you help me with 2 questions:
> >1) Should the omega cgi interface support number range searches 
> >with/without additional configuration?
> 
> If you use omindex, only the last modified date is stored.  It _should_ 
> be possible to do a date range search using this value, by setting the 
> START and END cgi parameters.

Currently Omega defaults to implementing date range filtering using
terms rather than a value.  If you want to use a value, you also need
to set the CGI parameter DATEVALUE to the value number of the date
slot.

Using a value is nicer because you can set different dates in different
value slots, and specify date+time to allow filtering with a finer
granularity than a day.  Also, we want to store the date as a value
anyway for "sort by date".

I've not profiled using terms vs. using a value, but assuming that
using a value isn't slower, that will become the default at some
point, and the term method will be deprecated and later removed.
If it is slower, we should look for ways to optimise it.

> I don't believe sorting or range restriction is possible currently
> with any other numeric value.

Just to be clear, sorting on a value is supported - set the CGI
parameter SORT to the value number to sort on.  But omindex only sets
the date value and the MD5 value, and sorting by MD5 gives an
essentially random order so isn't useful.  If you have more structured
data to index, you might find scriptindex useful.

Other range restrictions aren't currently supported (and that includes
using a date range restriction in the query string, such as
`1/1/1900..31/12/1999').

Cheers,
    Olly



More information about the Xapian-discuss mailing list