[Xapian-discuss] get the title from the document

jack young young.2004 at yahoo.com
Mon Nov 5 03:07:06 GMT 2012


Hi James,

Thank you for your quick reply.
Now I figure out I need to create a json structure to store what I wanna display.
Furthermore, I put the information into the document *data*, not in *values*.

Then another question turns up intermediately: which part is going to be used for terms.

For instance, in my json data, i store two parts:

1. filename
2. content (from file)
Then given a specific keyword, the program is supposed to ONLY look for this keyword via the content, *NOT* via the filename. In other words, how can I build my database and search the information only from content?


This is the typical code for building the index:
******************************
# Load content content = open(filePath).read()
# Get the file name
fileName = os.path.basename(filePath)
# save in json and document
json_data = content + fileName   document = xapian.Document() document.set_data(json_data)

# Index document indexer.set_document(document) indexer.index_text(content) # Store indexed content in database database.add_document(document)

******************************

what else do I need to process?
did i need to change 

indexer.index_text(json_data)

to:
indexer.index_text(content)



OR:
doc.add_term(content)

which one is correct? any thought?
I have looked fro possible solutions from online documents, but nothing found.

Jack






________________________________
 寄件者: James Aylett <james-xapian at tartarus.org>
收件者: jack young <young.2004 at yahoo.com> 
副本: "xapian-discuss at lists.xapian.org" <xapian-discuss at lists.xapian.org> 
寄件日期: 2012/11/4 (週日) 1:11 AM
主旨: Re: [Xapian-discuss] get the title from the document
 
On 3 Nov 2012, at 04:36, jack young <young.2004 at yahoo.com> wrote:

> I am working on a very simple project, in which I wanna get the title from the document.

Jack – generally, everything you want to use for displaying results (or whatever you do with them at search time) should be stored in document data, *not* in values (which are for other purposes). See <http://getting-started-with-xapian.readthedocs.org/en/latest/concepts/indexing/documents.html>; you'll need to use some sort of structured format (eg: JSON, YAML or similar) to store multiple pieces of information.

J

-- 
James Aylett, occasional trouble-maker
xapian.org


More information about the Xapian-discuss mailing list