[Xapian-devel] Project Discussion for GSOC
prateek.football at gmail.com
Wed Apr 6 14:35:59 BST 2011
I have proposed an idea to work on ranking/weighting schemes with Xapian
using Xapian's suggested ideas list and also putting in a bit of my own
input. I am pasting my project proposal here because I wanted to discuss it
Project Discussion: When results are indexed into the search index, they
must be ranked before hand for faster information retrieval. Now this is
where I plan to work, on ranking/weighting schemes. I picked up this idea
mainly from Xapian's suggested ideas and clubbed my idea into it. So I
believe that if we have n number of search results and we want to rank them,
we should start of by the most efficient ranking algorithm to determine
their ranks. It might happen, that two or more documents end up getting the
same weight or rank and this is where layering comes into picture. For all
those documents where a clash would occur, we can implement another ranking
algorithm which would be the next most efficient one at our so called 2nd
level of ranking. This can be done further and further till the user wants
or maybe till all the algorithms get used up.
I have studies these algorithms but I have no idea if this scheme has been
suggested or used before since most search engines never reveal their
working. So we can provide the user a choice to choose his own custom
hierarchy of algorithms if he has knowledge about them or this can be
automated in cases where the user doesn't know about these algorithms.
Vector space models, TF-IDF schemes can be implemented and in case of
hyperlinked documents link analysis algorithms like Pagerank, HITS, Distance
rank, Snorm , Trust rank could be brought into picture.
I would love to discuss about the project and would love any advice given
to me on how to break it down the timeline. Please let me know if I need to
add in any more information aswell.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Xapian-devel