[Xapian-discuss] a strange type of alias/expanded term
Andreas Marienborg
andreas at startsiden.no
Thu Oct 16 09:58:56 BST 2008
On Oct 16, 2008, at 5:32 AM, Olly Betts wrote:
> On Mon, Oct 13, 2008 at 03:56:23PM +0200, Andreas Marienborg wrote:
>> I was wondering if there is any way I can coach queryparser into
>> something like this, so I don't have to pre-parse the query myself:
>> (pseudo code)
>>
>> my $query_string = 'jazz oslo today';
>>
>> $qp->add_alias('today' => 'D20081013');
>>
>> my $q = $qp->parse($query_string);
>>
>> is($q->get_description, '(jazz AND oslo AND D20081013)');
>
> This version is arguably slightly better since the date should act
> as a
> boolean filter term:
>
> ((jazz AND oslo) FILTER D20081013)
>
True, the D will be a boolean-filter, so that would most likely be the
end result?
> Both will match the same documents, but the weightings will be
> slightly
> different.
>
> Not sure about the FILTER version, but the AND version can probably be
> achieved using synonyms:
>
> http://xapian.org/docs/synonyms.html
>
> Untested, but try something like:
>
> # Only need to do this once per day...
> $db->clear_synonyms("today");
> $db->add_synonym("today", "D20081013");
>
> $qp->set_database($db);
> my $q = $qp->parse_query($query_string,
> FLAG_PHRASE|FLAG_BOOLEAN|FLAG_LOVEHATE|FLAG_AUTO_SYNONYMS);
>
Yes, this hit me last night as well, that I can just keep changing the
synonyms each day. Nice to get
your input that that might indeed be the best way, I'll def. try that
route now.
>> basicly I want to somehow expand today to todays date, this week to a
>> range, tomorrow to something etc, but not sure how I might best do
>> it?
>
> If you define multiple synonyms for the same word (by calling
> add_synonym() multiple times with the same first argument), they're
> ORed, and multi-word synonyms are supported with
> FLAG_AUTO_MULTIWORD_SYNONYMS), so `this week' is doable by defining it
> as a synonym for 7 D-prefix terms. For `this year' you probably
> want to
> add Y-prefix terms with just the year to avoid an OR of 365 or 366
> date
> terms...
>
Yeah, I usually add Y M D on all documents, so that wouldn't be too
hard. I guess I could also add W for instance, but it might be harder
conceptually, so seven D might be just as good.
>> the other option, to pre-process, is doable I guess, but it might be
>> more error-prone?
>
> Yes, preprocessing input to the QueryParser like that is best avoided.
>
Good, then I will strive to avoid that :)
Thanks for your help!
- andreas
More information about the Xapian-discuss
mailing list