[Xapian-devel] Patch for Initial Prototype implementation of Unigram Langauage Modelling in xapian-core.
Olly Betts
olly at survex.com
Tue Apr 17 03:36:48 BST 2012
On Sun, Apr 15, 2012 at 06:39:33AM +0530, Gaurav Arora wrote:
> I have implemented initial prototype of the Xapian::Weight subclass for
> Unigram Language Modelling to support UnigramLM weighing in xapian.Other
> changes include adding collection_frequency to TermFreqs struct to store
> collection frequency of terms and some changes to support it xapian
> Framework,Changing simplesearch.cc to search using UnigramLMWeight class.
>
> Following issues have not being addressed in this patch(I am working on
> following issues):
>
> 1. Log trick for handling multiplication for LM need to made more robust
> than just adding some random number to avoid rejecting document due to
> negative value returned by log.
BTW, log() in C/C++ is natural logarithm (so base e), so 10 seems
particularly arbitrary to add. Log to base 10 is log10().
I'm not sure what the best answer is here though.
> PFA 5 patches for the initial prototype implementation of Unigram Language
> Model in Xapian.
Thanks for the patches. They look good, though I didn't try them out
yet. Three minor things:
You shouldn't commit the .Plo files - they're generated during the
build.
It's only really meaningful to mark a constructor as "explicit" if it
takes (or has optional parameters such that it can take) a single
argument. The "explicit" marking means it would be use to implicitly
convert a value. So if you had an array class that could be initialised
with a size:
Array::Array(size_t size);
If you don't mark that as explicit, then the user could pass an integer
where an Array was expected, and the compiler would create a temporary
array and pass it in, which isn't something you want to happen for this
sort of case.
And in the final patch some of the comments aren't actually multi-line
but instead are really one long line which looks like a multi-line
comment if viewed wrapper at 80 columns. If you look at the diff itself
you will probably see what I mean.
Cheers,
Olly
More information about the Xapian-devel
mailing list