[Xapian-discuss] Counting and statistics

Andreas Marienborg andreas at startsiden.no
Thu Mar 29 08:31:04 BST 2007


On Mar 28, 2007, at 8:30 PM, Olly Betts wrote:

> On Wed, Mar 28, 2007 at 01:00:27PM +0200, Andreas Marienborg wrote:
>> I have managed to get some sort of result, by adding every doc in the
>> RSet, then using that to build an ESet.
>
> If you mean you're adding every document in the database to the RSet,
> that doesn't really achieve anything - the terms are generated by
> looking at differences between "RSet" documents and the collection as
> a whole, so if the RSet and the collection are the same, you won't get
> good results!
>

Well, I am not adding everything form the database, just everything  
from the last month (or whatever I choose to look at)

I am trying to figure out terms that are "popular" within a given set  
of documents.

>> Is there any way to "skip" some terms when building the ESet? I tried
>> with:
>>
>> 	my $eset = $enquire->get_eset(10, $rset, sub { my $term = shift;
>> warn "in decider!"; return 1; });
>>
>> but that just gives me the following error upon execution:
>>
>> 	Usage: Search::Xapian::Enquire::get_eset(THIS, maxitems, rset) at ./
>> script/nyheter_search_word_count.pl line 74.
>
> ExpandDecider isn't wrapped by Search::Xapian yet.  The wrapper should
> be very similar to that for MatchDecider, which was wrapped as of
> 0.9.10.0, so if you know any XS you could probably add a wrapper  
> easily
> enough.  Otherwise, feel free to file a bug and I'll take a look once
> Xapian 1.0 is taken care of.
>

I will see what I can do. I haven't done any XS, but I suppose its  
never to late to learn :)

>> Also, on an ESetIterator, it is not possible to get the number of
>> occurances, or number of documents containing it, just the weight?
>
> You can call Database::get_collectionfreq() and  
> Database::get_termfreq()
> with the termname to find these out.  They aren't stored in the ESet
> though.
>

But will theese work on a set, or the complete database? I want to  
know how many times a term occured within a given searchresult.

>> Where can I read about how this weight is calculated?
>
> http://www.xapian.org/docs/intro_ir.html
>
> especially the section:
>
>     Using the weights: the E set
>

Ok, will look into it, thanks :)



- andreas



More information about the Xapian-discuss mailing list