Fwd: R bindings for Xapian: API modifications

James Aylett james-xapian at tartarus.org
Tue May 3 15:01:57 BST 2016


On Tue, May 03, 2016 at 06:13:44PM +0530, Amanda Jayanetti wrote:

> > >but it looked like
> > >you were suggesting that (for instance) the ID column in the data
> > >frame would only be specified by numeric index.
> 
> The parameter idField is only used to allow the user to specify a column
> whose row values will be used as unique identifiers. If it's required to
> index the idField then it should be separately included in the indexFields
> list as shown below.
> 
> indexFields <-list( list(0,"S","id_NUMBER"), list(2,"S","Title"),
> list(8,"XD","Description"))

Somewhere we have some miscommunication, and it doesn't help that
examples without clear explanation leave me guessing at what all the
pieces are doing. I assume what's happening here is that the following
columns in the data table are indexed in the following way:

id_NUMBER   indexed with a prefix of 'S' and wdf=0 (so terms, not
	    posting)
Title	    indexed with a prefix of 'S' and wdf=2
Description indexed with a prefix of 'XD' and wdf=8

It's possible that isn't what you mean; having looked back over your
proposal, I see that in indexing you were setting prefix to field name
maps:

f1 <- c("Title", "S")

but you don't have to do that while indexing, only while searching.

If that is what you meant though, then it's an unusual example for a
couple of reasons:

1. The id_NUMBER would be more likely to be indexed with a prefix of
   'Q' (this is common for Xapian applications, although not required)

2. The title is usually given more emphasis than the description, not
   the other way round.

I assume that the `indexFields` you're setting up are intended to
drive a Xapian::TermGenerator. However it's common to add some boolean
terms as well (indeed, Q terms are generally boolean terms, as at the
bottom of the sample code at
https://getting-started-with-xapian.readthedocs.io/en/latest/practical_example/indexing/writing_the_code.html).
These won't be put through the TermGenerator; they're added directly
to the document.

If you're working on this somewhere online (Google doc, gist, github
repo, scratchpad or anything), can you link it to your wiki project
page? It feels like there are details that I'm either not
understanding or just haven't read.

J

-- 
  James Aylett, occasional trouble-maker
  xapian.org



More information about the Xapian-devel mailing list