[Xapian-discuss] get the title from the document

James Aylett james-xapian at tartarus.org
Mon Nov 12 10:05:22 GMT 2012


On 5 Nov 2012, at 03:07, jack young <young.2004 at yahoo.com> wrote:

> Then another question turns up intermediately: which part is going to be used for terms.
> For instance, in my json data, i store two parts:
> 1. filename
> 2. content (from file)
> Then given a specific keyword, the program is supposed to ONLY look for this keyword via the content, *NOT* via the filename. In other words, how can I build my database and search the information only from content?

Searches work using terms, so just don't put the filename in as a term or terms. Anything you put in document data is *not* used in searching.

> This is the typical code for building the index:
> ******************************
> # Load content
> content = open(filePath).read()
> # Get the file name
> fileName = os.path.basename(filePath)
> # save in json and document
> json_data = content + fileName 

I don't think you understand what JSON is. You seem to be using Python, so check out the `json` module.

> document = xapian.Document()
> document.set_data(json_data)
> 
> # Index document
> indexer.set_document(document)
> indexer.index_text(content)
> # Store indexed content in database
> database.add_document(document)
> 
> ******************************
> 
> what else do I need to process?

Nothing. You've indexed the content (using a `TermGenerator`, I'm assuming). All should be well.

> did i need to change 
> indexer.index_text(json_data)
> 
> to:
> indexer.index_text(content)

You want to index the content, not whatever you've put in `json_data`. So your earlier code is correct.

> OR:
> doc.add_term(content)

This would (try to) add a single term containing the entire content. That generally won't work as there is a limit on the length of the term. It also doesn't make much sense with text documents. Using `TermGenerator` is the correct approach here.

J

-- 
 James Aylett, occasional trouble-maker
 xapian.org




More information about the Xapian-discuss mailing list