[Xapian-discuss] How to speed up indexing ?

cel tix44 celtix44 at gmail.com
Mon Aug 25 00:21:36 BST 2008


There was a logical bug in my indexing function (below) --- I was
reusing the same document instance without clearing its terms. Having
added doc.clear_terms() prior to index.index_text(), I saw the
expected throughput of ~4000 doc/sec.

Thanks everyone for your time & advice.

Regards
Celto


/////////////////////////////////////////////
void XXIndexRecord(char* text)
/////////////////////////////////////////////
{

/* !!!!!!!!!!!! CLEAR DOCUMENT TERMS !!!!!!!!!!!!!!!!!!!!!!! */
doc.clear_terms();
/* !!!!!!!!!!!! CLEAR DOCUMENT TERMS !!!!!!!!!!!!!!!!!!!!!!! */

       indexer.index_text(text);       // index text
       // Add the document to the database
   xdb->add_document(doc);
       xrn++ ;

       if (xrn > 200000) {
               //MessageBox(NULL, "committing transaction", "msg", 0);
               xrn = 0;
               xdb->commit_transaction();
               xdb->begin_transaction(true);
       }
}


On Sun, Aug 24, 2008 at 11:24 PM, Olly Betts <olly at survex.com> wrote:
> On Fri, Aug 22, 2008 at 07:16:57PM -0700, mark wrote:
>> I have the exact  same problem in x86_64 fedora core 9 linux, 16GB
>> RAM, dual quad core, using python xappy library.
>
> There's a separate mailing list for xappy, which is a better place to
> bring up issues you have when using xappy unless you can also reproduce
> them directly with Xapian.
>
> Cheers,
>    Olly
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list