[Xapian-discuss] get the title from the document
jack young
young.2004 at yahoo.com
Mon Nov 5 03:07:06 GMT 2012
Hi James,
Thank you for your quick reply.
Now I figure out I need to create a json structure to store what I wanna display.
Furthermore, I put the information into the document *data*, not in *values*.
Then another question turns up intermediately: which part is going to be used for terms.
For instance, in my json data, i store two parts:
1. filename
2. content (from file)
Then given a specific keyword, the program is supposed to ONLY look for this keyword via the content, *NOT* via the filename. In other words, how can I build my database and search the information only from content?
This is the typical code for building the index:
******************************
# Load content content = open(filePath).read()
# Get the file name
fileName = os.path.basename(filePath)
# save in json and document
json_data = content + fileName document = xapian.Document() document.set_data(json_data)
# Index document indexer.set_document(document) indexer.index_text(content) # Store indexed content in database database.add_document(document)
******************************
what else do I need to process?
did i need to change
indexer.index_text(json_data)
to:
indexer.index_text(content)
OR:
doc.add_term(content)
which one is correct? any thought?
I have looked fro possible solutions from online documents, but nothing found.
Jack
________________________________
寄件者: James Aylett <james-xapian at tartarus.org>
收件者: jack young <young.2004 at yahoo.com>
副本: "xapian-discuss at lists.xapian.org" <xapian-discuss at lists.xapian.org>
寄件日期: 2012/11/4 (週日) 1:11 AM
主旨: Re: [Xapian-discuss] get the title from the document
On 3 Nov 2012, at 04:36, jack young <young.2004 at yahoo.com> wrote:
> I am working on a very simple project, in which I wanna get the title from the document.
Jack – generally, everything you want to use for displaying results (or whatever you do with them at search time) should be stored in document data, *not* in values (which are for other purposes). See <http://getting-started-with-xapian.readthedocs.org/en/latest/concepts/indexing/documents.html>; you'll need to use some sort of structured format (eg: JSON, YAML or similar) to store multiple pieces of information.
J
--
James Aylett, occasional trouble-maker
xapian.org
More information about the Xapian-discuss
mailing list