[Xapian-devel] Regarding the Proposal

Shaohuan Li shaohuan.li at gmail.com
Wed Apr 4 22:29:01 BST 2012


Dear Olly,

Thank you for your reply. For the tf-idf weighting, I was referring to the
traditional tf-idf weighting scheme that multiplies the term frequency and
the term rareness, trying both intersection queries and mulitword queries
approach.

This shouldn't take long. However, since I will be having exams on the
first two weeks, I planned a bit long( two weeks) for building the scheme
and testing.

For the DFR schemes, I was thinking that instead of simply assuming the
type of the randomness to be the binomial distribution or  geometric
distribution, I can use the Monte Carlo idea to approximate the
probabilities. There can be many different approximations. I will probably
try Monte Carlo complete path(Simulate N = mn runs of the random
walk initiated at each page exactly m times) and Monte Carlo Complete Path
Stopping at dangling nodes(Simulate N = mn runs of the random
walk initiated at each page exactly m times and stopping when it reaches a
dangling node) which will probably be faster than MC end-point with random
start approach(Simulate N runs of the random walk initiated at a randomly
chosen page).

For the relevance feedback techniques, I will probably focus more on the
explicit feedback which will take users' feedback on the document or
query's relevance into account. I'll try implicit feedback, too. For
instance, the time duration before one user moves to another page.

There are quite a numbers of parameters I will need to fine-tune. It is
possible that more than one weighting schemes will be used in the search
engine. Thus I'll need to set some parameters on how much weight I shall
put to a certain weighting scheme. Say I used both tf-idf weighting and
explicit feedback weighting, I will need to add both weighting, not exactly
1:1 but some better parameters. Simply multiplying two weights won't work
as different weighting schems should share different importance according
to the results we get.

Shall I put these to the proposal, too?

I am sorry I am currently traveling overseas and not that responsive. But
I'll definitely improve the proposal based on your feedbacks in these two
days.


Best Regards,

Shaohuan

On Wed, Apr 4, 2012 at 10:37 AM, Olly Betts <olly at survex.com> wrote:

> On Wed, Apr 04, 2012 at 10:12:53AM +0200, Shaohuan Li wrote:
> > I am trying to apply for the weighting scheme project and I've submitted
> a
> > proposal according to the template few days ago. I think the proposal is
> > still not good enough, can you help give some suggestions on what else
> > shall I  do more research on&include in the proposal?
>
> Check your proposal - I have already made some comments on it.
>
> Cheers,
>    Olly
>



-- 
Best Regards,
Shaohuan Li
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20120404/99d25eb3/attachment.htm>


More information about the Xapian-devel mailing list