[Xapian-devel] [GSOC 2014] Indexing INEX dataset

Parth Gupta pargup8 at gmail.com
Tue Mar 11 11:02:15 GMT 2014


During the indexing with omindex, only you need to make sure is indexing
with prefix 'S' for title as explained here in Letor documentation:
xapian-letor/docs/letor.rst

Previously when I edited omindex.cc it was modified as can be seen
here<http://trac.xapian.org/browser/svn/branches/gsoc2011-parth/xapian-applications/omega/omindex.cc>on
line 838 and block 1532-1559.

But now we have the same as xapian-letor/bin/xapian-letor-update.cc so
before starting with questletor.cc you need to run it once for each db and
in this case all  you need to make sure is below line in omindex.cc while
indexing.

indexer.index_text(title, 1,"S");

you can also check the index to see if it is proper by inspecting index
using xapian-core/bin/xapian-delve.cc

In order to investigate this, hand create 5 XML/HTML documents of INEX
format with one line of content and title, index them, inspect the index
using xapian-delve.

Also while working with INEX, you can make a small index by indexing 2/3
parts out of 4 during development and once everything behaves normally,
switch to the bigger index.

Cheers,
Parth.


On Tue, Mar 11, 2014 at 8:11 AM, Jiarong Wei <vcamx3 at gmail.com> wrote:

> Hi Parth,
>
> I've implemented SVMRanker class and also sorted out most of current Letor
> APIs.
> Now I'm trying to use INEX dataset to verify my implement. But I stuck in
> the indexing part. You said in the documentation that we have to add prefix
> when indexing. Also I notice that you set some metadata in omindex.cc of
> your version. But the omindex.cc has changed since 2011. I think that's why
> my result is always wired. Could you give me some suggestions about how to
> index INEX dataset properly?
>
> Thank you!
>
> Jiarong Wei
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20140311/5d16ba17/attachment.html>


More information about the Xapian-devel mailing list