[Xapian-discuss] Searching date range on a custom field

Olly Betts olly at survex.com
Fri Jun 1 01:13:41 BST 2007


On Thu, May 31, 2007 at 03:12:18PM -0700, Matt Barnicle wrote:
> <meta name="dateBegin" content="20070210" />
> <meta name="dateEnd" content="20070217" />
> 
> These correspond to the start and end dates of an event.  I also have a 
> tag for when the rendered page is an event (we have many types of pages 
> on the site):
> 
> <meta name="pageType" content="event" />
> 
> So, I need to search on events given a date.  Say the date is 20070213, 
> how do I search for event pages where the supplied date is within the 
> dateBegin .. dateEnd range?

This is backward to how Omega's date range feature works - that expects
that each document has a date and the user wants to restrict their
search to documents within a specified date range.

> I'm using htdig to crawl the site, and htdig2omega to create the index.  
> The index creation and field mapping works just fine, and so does 
> searching on the boolean page type.  Here is my htdig2omega.script file:
> 
> url : field=url hash boolean=Q unique=Q
> title : weight=3 index truncate=80 field=title
> lastMod : field=lastmod
> size : field=size
> sample : index truncate=300 field=sample
> metaDesc : field=metadesc index
> pageType : field=pageType boolean=XPT
> eventName : field=eventName weight=3 index
> dateBegin : field=dateBegin date=yyyymmdd
> dateEnd : field=dateEnd date=yyyymmdd

The scriptindex "date" action is designed to allow you to do date range
filtering when each document has a single date, so this won't really
work.

You could make it work if you ran the date action on every date in the
range, but if your ranges are long, that's going to generate a lot of
terms.

> I found some posts from the list archives that discuss date ranges, but 
> I can't figure out if they will help me in this situation or not..  I 
> think they're talking about searching on date ranges on indexed 
> documents, that is, the date the document was indexed.

Yes, they are.

What I'd suggest you do is to put the dateBegin and dateEnd into
document values, so you can access them quickly during the match
process.  For example:

  dateBegin : field=dateBegin value=0
  dateEnd : field=dateEnd value=1

And then write a little MatchDecider subclass which checks takes
a date and checks if a document's date range includes it.  Something
like this totally untested code:

class DateRangeMatchDecider : public Xapian::MatchDecider {
    string date;

  public:
    DateRangeMatchDecider(const string & date_)
	: date(date_) { }

    bool operator()(const Xapian::Document &doc) const {
	return doc.get_value(0) <= date && date <= doc.get_value(1);
    }
};

(You might want to swap the order of the checks, depending whether you
expect user dates are more likely to fall before or after events in
the database.)

Then you can instantiate this class with the date the user wants to
search for and pass it to Enquire::get_mset().  You'll also want to
OP_FILTER with XPTevent to only consider events.

If you want the user to be able to search for any event happening within
a range of dates, you can easily extend the above class to take a pair
of dates and check if it overlaps with the document's range.

Cheers,
    Olly



More information about the Xapian-discuss mailing list