[Xapian-discuss] How to Retrieve content of the document?

Thu Apr 21 17:27:57 BST 2011

Rohit,

I am currently having single 600GB index on MyHealthcare.com and
working towards 700GB index. The scary part is that, I do not see
search performance deterioration between 6GB index and 600GB index,
and you can try it yourself. On other hand, when using Lucene, Solr
and other implementations, searching gets so slow that developers must
place index from hard disk to expensive solid-state drive.  Of course
Xapian index index of 600GB takes longer to build, it but once is
done, it runs nearly as fast as the small index.

Thanks,
Kevin Duraj
http://myhealthcare.com

On Thu, Apr 21, 2011 at 2:45 AM, Rohit <76.rohit at gmail.com> wrote:
> Hi,
> Another question although I have already read it somewhere i just need
> clarification. Xapian is able to handle data of the size of about 6 gb
> right?
>
> Rohit.
>
> On Thu, Apr 21, 2011 at 5:43 AM, Rohit <76.rohit at gmail.com> wrote:
>
>> Oops my bad.. Noobie mistake indeed.. Thanks for the prompt reply much
>> appreciated..
>>
>> Cheers,
>> Rohit.
>>
>>
>> On Thu, Apr 21, 2011 at 5:39 AM, Sym Roe <sym.roe at talusdesign.co.uk>wrote:
>>
>>> On Thu, Apr 21, 2011 at 10:24 AM, Rohit <76.rohit at gmail.com> wrote:
>>> > This returns to me 8 documents which I know is the correct answer
>>> becuase I
>>> > have made a search engine which gives me the same results. The problem
>>> is i
>>> > only get the document numbers(ids) but not the content. the
>>> > $doc->get_data(); is supposed to give me the content if i am not
>>> mistaken.
>>> > It isnt doing so. Any help would be appreciated.
>>>
>>> I don't know perl, so forgive me if I make an obvious mistake here, but
>>> this:
>>>
>>> > if ($doc->set_data("$File::Find::name")){
>>>
>>> Looks like it's setting the file name as the document data, and then
>>> $doc->get_data() is correctly returning the file name you set.
>>>
>>> So everything is working fine, you're just not actually settings the
>>> data to what you want.
>>>
>>> Am I missing something here?  You'll need to read the file content and
>>> store that, or, when the results are used you could open the file
>>> based on the file name you're storing (this would save index size).
>>>
>>>
>>> --
>>> E: sym.roe at talusdesign.co.uk
>>> M: 07742079314
>>>
>>
>>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>