[Xapian-devel] GSoC 2011 Weighting Schemes

wuwenjin kevin.wu86 at gmail.com
Wed Mar 30 13:35:14 BST 2011


I am trying to read source code and implementing DPH. and come across some
questions. could anyone give some help? thanks

Question about  "Weighting Schema" source code.
The following code  is from "\xapian-core-1.2.4\include\xapian\weight.h",
*
*

>     /// An lower bound on the maximum length of any document in the
> database.

    Xapian::termcount doclength_lower_bound_;


>     /// An upper bound on the maximum length of any document in the
> database.

    Xapian::termcount doclength_upper_bound_;


>     /// An upper bound on the wdf of this term.

    Xapian::termcount wdf_upper_bound_;


.........................
..........................

    /** Allow the subclass to perform any initialisation it needs to.
     *
     *  @param factor  Any scaling factor (e.g. from OP_SCALE_WEIGHT).
     */
    virtual void init(double factor) = 0;

     /** Calculate the weight contribution for this object's term to a
document.
     *
     *  The parameters give information about the document which may be used
     *  in the calculations:
     *
     *  @param wdf    The within document frequency of the term in the
document.
     *  @param doclen The document's length (unnormalised).
     */
    virtual Xapian::weight get_sumpart(Xapian::termcount wdf,
       Xapian::termcount doclen) const = 0;

    /** Return an upper bound on what get_sumpart() can return for any
document.
     *
     *  This information is used by the matcher to perform various
     *  optimisations, so strive to make the bound as tight as possible.
     */
    virtual Xapian::weight get_maxpart() const = 0;

    /** Calculate the term-independent weight component for a document.
     *
     *  The parameter gives information about the document which may be used
     *  in the calculations:
     *
     *  @param doclen The document's length (unnormalised).
     */
    virtual Xapian::weight get_sumextra(Xapian::termcount doclen) const = 0;

    /** Return an upper bound on what get_sumextra() can return for any
     *  document.
     *
     *  This information is used by the matcher to perform various
     *  optimisations, so strive to make the bound as tight as possible.
     */
*
    virtual Xapian::weight get_maxextra() const = 0;
*


*Q1:* what is the purpose of "
virtual Xapian::weight get_maxpart() const = 0;
" and "
 virtual Xapian::weight get_maxextra() const = 0;

 " ? when do these methods be called ?

*Q2:* In Xapian, BM25Weight is the fault weighting method. I want to know
when and where and how
BM25Weight  is used in Xapian's source code?  maybe this question involved
many codes. I think that Weighting  happens after submitting query terms,
and  during the match. for example in  "multimatch.cc
 void
 MultiMatch::get_mset(...)", but this method is quite complex. I am not sure
about it.

*


Wenjin Wu*




2011/3/29 wuwenjin <kevin.wu86 at gmail.com>

> hi, Olly
> I have submitted my proposal for "Weighting Schema" . if you get some time
> to read my proposal, I will appreciate your suggestions about it.
>
> http://socghop.appspot.com/gsoc/proposal/review/google/gsoc2011/kevinking/1001#
>
>
> <http://socghop.appspot.com/gsoc/proposal/review/google/gsoc2011/kevinking/1001#>
> Regards
> *
> *
> *Wenjin Wu*
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20110330/2d700e3f/attachment.htm>


More information about the Xapian-devel mailing list