[Xapian-discuss] How to Retrieve content of the document?

Sym Roe sym.roe at talusdesign.co.uk
Thu Apr 21 12:19:28 BST 2011


On Thu, Apr 21, 2011 at 12:07 PM, Rohit <76.rohit at gmail.com> wrote:
> When you say the terms and values are already stored , can you tell me how i
> can retrieve the words of a particular document(assuming i dont store
> anything in the documents data).

You're adding every word in a document to the term list (in your case,
with positional information, by calling add_posting).  You may want to
look at http://xapian.org/docs/apidoc/html/classXapian_1_1TermGenerator.html
for stemming and automatic term creating, but that's nor essential to
answering your question.

Once you have a list of documents (returned from a query, as above)
you can just open each file (based on the file name you're storing in
the document data) and display what you want form there.  If you want
to display the terms searched for (like google results) then I expect
you could do something with the term positions you're storing, but
xapian wont do this for you, even if you store the entire text in the
document data.

> On Thu, Apr 21, 2011 at 6:12 AM, Sym Roe <sym.roe at talusdesign.co.uk> wrote:
>>
>> On Thu, Apr 21, 2011 at 10:45 AM, Rohit <76.rohit at gmail.com> wrote:
>> > Hi,
>> > Another question although I have already read it somewhere i just need
>> > clarification. Xapian is able to handle data of the size of about 6 gb
>> > right?
>>
>> Yes, I expect there are much bigger indexes out there.  Having said
>> that, your (total) index size will be much bigger if you store every
>> document you are searching through in the document data.  You don't
>> actually need to store *anything* in the document's data, as the terms
>> and values are already stored for each document.
>>
>>
>> > On Thu, Apr 21, 2011 at 5:43 AM, Rohit <76.rohit at gmail.com> wrote:
>> >>
>> >> Oops my bad.. Noobie mistake indeed.. Thanks for the prompt reply much
>> >> appreciated..
>> >>
>> >> Cheers,
>> >> Rohit.
>> >>
>> >> On Thu, Apr 21, 2011 at 5:39 AM, Sym Roe <sym.roe at talusdesign.co.uk>
>> >> wrote:
>> >>>
>> >>> On Thu, Apr 21, 2011 at 10:24 AM, Rohit <76.rohit at gmail.com> wrote:
>> >>> > This returns to me 8 documents which I know is the correct answer
>> >>> > becuase I
>> >>> > have made a search engine which gives me the same results. The
>> >>> > problem
>> >>> > is i
>> >>> > only get the document numbers(ids) but not the content. the
>> >>> > $doc->get_data(); is supposed to give me the content if i am not
>> >>> > mistaken.
>> >>> > It isnt doing so. Any help would be appreciated.
>> >>>
>> >>> I don't know perl, so forgive me if I make an obvious mistake here,
>> >>> but
>> >>> this:
>> >>>
>> >>> > if ($doc->set_data("$File::Find::name")){
>> >>>
>> >>> Looks like it's setting the file name as the document data, and then
>> >>> $doc->get_data() is correctly returning the file name you set.
>> >>>
>> >>> So everything is working fine, you're just not actually settings the
>> >>> data to what you want.
>> >>>
>> >>> Am I missing something here?  You'll need to read the file content and
>> >>> store that, or, when the results are used you could open the file
>> >>> based on the file name you're storing (this would save index size).
>> >>>
>> >>>
>> >>> --
>> >>> E: sym.roe at talusdesign.co.uk
>> >>> M: 07742079314
>> >>
>> >
>> >
>>
>>
>>
>> --
>> E: sym.roe at talusdesign.co.uk
>> M: 07742079314
>
>



-- 
E: sym.roe at talusdesign.co.uk
M: 07742079314



More information about the Xapian-discuss mailing list