[Xapian-discuss] Question: Query weights, Rset usage, lowercase

Andrey Kong alpha04 at netvigator.com
Sat Dec 9 04:23:44 GMT 2006


thank you very much for the quick reply

it helps and i am appreciated

(forgive my english)

Andrey K.


----- Original Message ----- 
From: "Olly Betts" <olly at survex.com>
To: "Andrey Kong" <alpha04 at netvigator.com>
Cc: <xapian-discuss at lists.xapian.org>
Sent: Saturday, December 09, 2006 11:54 AM
Subject: Re: [Xapian-discuss] Question: Query weights, Rset usage, lowercase


> On Sat, Dec 09, 2006 at 09:55:11AM +0800, Andrey Kong wrote:
>> 1)How much cost if I put the Descriptions inside the
>> Xapian.document.data field? (assume the Descriptions are unHTML
>> contents of web pages), will the Xapian DB become very big and
>> affects the preformance? (i have 1M docs when testing)
>
> Assuming the usual pattern of searching for 10 or so matches, this
> shouldn't be a problem at all.
>
> The document data is stored in a separate file, so there should be
> no effect on matching, aside from competing for disk cache.  You'll have
> similar competition for disk cache anyway if you're pulling the same
> data from an SQL database hosted on the same machine.
>
>> 2)Since now i am able to search the Title(prefix PT, weight=20) and
>> Descriptions(no prefix, weight=1) of the database, I begin wondering
>> how to assign different weights to the Query. How to achive:
>>
>> Query using "OR" (Microsoft , Keyboard , Mouse)
>>
>> which the term "Microsoft" =weight 5 | "Keyboard" = wieght 1 | "Mouse" = 
>> weight 1
>
> Just set the within query frequency (wqf) - e.g. Query("microsoft", 5).
>
>> Because its normal that ppl will type in the most important terms
>> first and then the less important terms later, so i want to make the
>> query in the same approach.
>
> I have my doubts about this idea.  The risk is that you'll improve
> results for some queries while making others worse.
>
> I think people tend to enter queries with the natural word order.
> Sometimes the more important terms will be first, and sometimes they
> won't.
>
> In this case, "Microsoft" is performing an adjectival role by defining
> a narrower scope for the words which follow, which is why it's perhaps
> more important.  But this varies between languages - in spanish it would
> probably be "mouse de Microsoft" not "Microsoft mouse".
>
>> 3)Since I add my own prefixes manually, I wonder does Xapian change
>> all Terms into lowercase automatically? Or I need to do it manually?
>
> Xapian treats terms as opaque pieces of data, so you'll need to
> lowercase them yourself if that's what you want.  Otherwise it wouldn't
> be possible to implement a case-sensitive search.
>
>> 4)when i query ("search engine") , if  I add 3 docs to the Rset, does
>> this "Rset related to -search engine-" remains in the database? So
>> next time I have the same query "search engine", the 3 docs in the
>> Rset can be retrived from the database? how to do that?
>
> The RSet isn't stored in the database.  The RSet represents a set of
> relevance judgements which a user has made pertaining to a particular
> query (or more generally to a particular "information need").  If you
> want to store it, it almost certainly needs to be per user and probably
> per query too.  In a web application, I'd suggest storing it in a cookie.
>
>> I think it will be even more great, if there are 2-5 lines of example
>> of usage in the API document.
>
> Yes, that would be good (though I think many would need a larger
> example to be useful).  However, it would be a substantial amount of
> work and we're all already very busy.  Patches are welcome of course
> (if anyone wants to work on this, please add examples to the doxygen
> comments in the headers, not the HTML documentation which is
> automatically generated from them!)
>
>> If every function has a 3-5 lines of codes of example of usage, we can
>> understand the function and usage in 5secs. Without the example, I say
>> I used 3-5 Hours to test it out myself, some just gave up...
>
> I'd suggest you simply search the examples (or failing that, Omega) for
> the particular method you want to see in context.  For most methods,
> that will find you an actual working piece of code using the method.
>
> Cheers,
>    Olly 




More information about the Xapian-discuss mailing list