[Xapian-devel] Need Beginner Guide for Matcher Optimisations Project

aarsh shah aarshkshah1992 at gmail.com
Sun Mar 10 13:42:01 GMT 2013


Hi Lokesh .:) This is a very good place to start if you want the understand
the basics of Xapian. It's quite informative and has some good examples::-
http://getting-started-with-xapian.readthedocs.org/en/latest/

Also,if your interested in IR theory, I personally think this is one of the
best books out there,it's detailed and starts from the basics.A couple of
friends of mine who have taken IR courses at their university have also
recommended it.:

http://nlp.stanford.edu/IR-book/

On Tue, Mar 5, 2013 at 5:30 PM, <xapian-devel-request at lists.xapian.org>wrote:

> Send Xapian-devel mailing list submissions to
>         xapian-devel at lists.xapian.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://lists.xapian.org/mailman/listinfo/xapian-devel
> or, via email, send a message with subject or body 'help' to
>         xapian-devel-request at lists.xapian.org
>
> You can reach the person managing the list at
>         xapian-devel-owner at lists.xapian.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Xapian-devel digest..."
>
>
> Today's Topics:
>
>    1. Need Beginner Guide for Matcher Optimisations Project
>       (Lokesh Basu)
>    2. Corrected errors in TradWeight test as per feedback . (aarsh shah)
>    3. Re: Reading a password-protected PDF (Olly Betts)
>    4. Remote database & local database, and adding new weight found
>       vtable error (??)
>    5. Please take a look at the TfIdf patch (aarsh shah)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 4 Mar 2013 18:07:20 +0530
> From: Lokesh Basu <lokesh.basu at gmail.com>
> Subject: [Xapian-devel] Need Beginner Guide for Matcher Optimisations
>         Project
> To: xapian-devel at lists.xapian.org
> Message-ID:
>         <
> CAF8SOyuYVSsQKTPdHjLWzHTBwxV64t3pU2Mj+m7JQFjRD49_nQ at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hi,
>
> While searching for a project which matches my interest andskill level, I
> found this project named Matcher Optimization. This project is really
> challenging and excting from my view point and I would like to be a part of
> this project.
>
> Optimization techniques metioned in the reference links provided will take
> some time for me to have a good understanding about them. But I am trying
> to get my head into it.
>
> I am a Computer Science undergraduate so I have a good knowledge about
> programming languages, algorthms, copilers, logics and data structures, but
> still I'm not into real world development sphere.
>
> So it was a bit difficult for me to get my hands on the real code, as I
> couldn't find a section which was completely dedicated to New Developers
> Just Trying To Start with Xapian.
>
> Also, since my timezone is +5:30 GMT, so it's not very easy to get a quick
> response on IRC. Hence, it's really difficult for student like me, who
> needs help regarding alost everything.
>
> So I would be thankful to you if I can get any kind of help regarding my
> step towards learning more about deleoping for Xapian.
>
>
> Some idea about the contents of the section that I need is:
>
>
>
> For Beginners :
>
> Prerequisit for developing for Xapian.
> Step wise installation of tools for development including source code?
> Detailed build instruction for build.
> How to start with code evaluation?
> How to fix a bus[possibly a start bug like "Hello Xapian" :-) ] ?
> How to submit a bug?
> Possible reading material for reference regarding development and for
> increasig your knowledge about searching.
> etc.
> [These are just what I could think of, there could be many other thing as
> well.]
>
>
> *Lokesh Chandra Basu*
> B. Tech
> Computer Science and Engineering
> Indian Institute of Technology, Roorkee
> India(GMT +5hr 30min)
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.xapian.org/pipermail/xapian-devel/attachments/20130304/cfbd65d8/attachment.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Mon, 4 Mar 2013 19:13:38 +0530
> From: aarsh shah <aarshkshah1992 at gmail.com>
> Subject: [Xapian-devel] Corrected errors in TradWeight test as per
>         feedback .
> To: Xapian Development <xapian-devel at lists.xapian.org>
> Message-ID:
>         <
> CABz8NmRkUK+R2nfg6nSF8pNUs3DNSzfiTnAc8P-UnkiwVYMpNg at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hey guys,Hi. :) I've corrected the mistakes in the TradWeight test commits
> that Olly pointed out and have added them to the pull request. Thank you
> for the feedback .
>
> -Regards
> -Aarsh
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.xapian.org/pipermail/xapian-devel/attachments/20130304/f1b26c95/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 3
> Date: Tue, 5 Mar 2013 06:23:38 +0000
> From: Olly Betts <olly at survex.com>
> Subject: Re: [Xapian-devel] Reading a password-protected PDF
> To: Zaim Zuhuri <mzaimz at gmail.com>
> Cc: xapian-devel at lists.xapian.org
> Message-ID: <20130305062338.GG27289 at survex.com>
> Content-Type: text/plain; charset=us-ascii
>
> On Wed, Feb 27, 2013 at 03:06:29PM +0800, Zaim Zuhuri wrote:
> > I was wondering if it is possible for xapian to read a password-protected
> > PDF.
> [...]
> > 2. all PDF is set with the same password.
> > 3. only the content of the PDF is encrypted, not the metadata.
> >
> > If it is possible could you guys point me in the right direction.
>
> Xapian runs pdftotext to extract text from PDF files, so the question
> really is "can pdftotext read a password-protected PDF?"
>
> Looking at pdftotext --help, I see:
>
>   -opw <string>     : owner password (for encrypted files)
>   -upw <string>     : user password (for encrypted files)
>
> Not sure what the difference is, but I'd try both and see which works.
>
> So I'd try creating a simple wrapper script so when omindex runs
> pdftotext it runs your wrapper instead, which runs pdftotext with
> extra command line arguments:
>
> #!/bin/sh
> exec /usr/bin/pdftotext -upw 'secret-password' "$@"
>
> Save that as (say) /home/zaim/pdftotext-wrapper/pdftotext, then make it
> executable and add that directory to PATH before you run omindex:
>
> chmod a+x /home/zaim/pdftotext-wrapper/pdftotext
>
> env PATH="/home/zaim/pdftotext-wrapper:$PATH" omindex [...]
>
> Cheers,
>     Olly
>
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 5 Mar 2013 15:54:59 +0800
> From: ?? <leedeetiger at gmail.com>
> Subject: [Xapian-devel] Remote database & local database,       and adding
>         new weight found vtable error
> To: Xapian Development <xapian-devel at lists.xapian.org>
> Message-ID:
>         <CALXXG0Ar=
> TBktqRd03Dkm1FACSHou55JzFXmGedtk55xLp1KHw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello, guys.
> Q1.
> now I have load all the docid and its document data into a dictionary for
> faster loading data instead of calling
>  Xapian::MSetIterator i;
> i.get_document().get_data();
>
> but I was happened to discover that the dictionaries got by such two method
> were different:
>
> both methods use DB1, DB2
>
> method 1:
>
> Xapian::Database db = Xapian::Database(the path of DB1);
> Xapian::Database db2 = Xapian::Database(the path of DB2);
>
> db.add_database(db2);
>
> I pre-load the docid and its document data into dictionary DT1;
>
> method 2:
>
> DB1 and DB2 are opened by Xapian-tcpsrv for remote access.
>
> Xapian::Database dbr = Xapian::Remote::open(host of DB1, port of DB1, 0,
> 0);
> Xapian::Database dbr2 = Xapian::Remote::open(host of DB2, port of DB2, 0,
> 0);
>
>  db.add_database(dbr2);
>
> Here dictionary DT2 holds the docid and its document data.
>
>
> =============
>
> Q2.
> I want add a weight scheme into Xapian-1.2.12;
> I have implemented it by add Myweight.cc in ./weight/ with weight.cc, etc.
> and I also added necessary declaration in ./include/xapian/weight.h for
> Myweight.
>
> by add Myweight.cc into ./weight/Makefile.am I have successfully compiled
> the source code and got the dynamic library.
>
> But when I call Myweight scheme in my own c++ program, the compiler told me
> that "undefined reference to 'vtable for Xapian::Myweight'".
>
> I have checked all the virtual function include the destructor and all of
> them were re-implemented in Myweight.cc.
>
> Then I replace all the code about TradWeight by Myweight;
> and unfortunately all works well by calling "TradWeight" which actually is
> implemented Myweight scheme.
>
> I know it maybe not the problem of Xapian but my lack of C++ skills.
> It would be nice for all your answers.
>
>
> --
> Ronghua Lin
> College of Computer Science and Technology, Zhejiang University
> Hangzhou, China, 310027
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.xapian.org/pipermail/xapian-devel/attachments/20130305/b0ee7d6e/attachment-0001.htm
> >
>
> ------------------------------
>
> Message: 5
> Date: Tue, 5 Mar 2013 17:08:54 +0530
> From: aarsh shah <aarshkshah1992 at gmail.com>
> Subject: [Xapian-devel] Please take a look at the TfIdf patch
> To: Xapian Development <xapian-devel at lists.xapian.org>
> Message-ID:
>         <CABz8NmSP132wn5M7mZKhqmh8z=+
> zFCKhLwxoA8016oskjLFf7A at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> Hello guys, :) Please do take a look at the pull request for the TfIdf
> patch Ive sent because I want to start working on writing DFR schemes for
> us and want to incorporate the feedback into making a good hack for the DFR
> schemes.The patch incorporates all normalizations possible with our current
> statistics and passed all the tests I wrote for it.Have also attached  the
> tests with the pull request.
>
> -Regards
> -Aarsh
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.xapian.org/pipermail/xapian-devel/attachments/20130305/20cdaf11/attachment-0001.htm
> >
>
> ------------------------------
>
> _______________________________________________
> Xapian-devel mailing list
> Xapian-devel at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-devel
>
>
> End of Xapian-devel Digest, Vol 95, Issue 6
> *******************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20130310/fe838119/attachment.htm>


More information about the Xapian-devel mailing list