Fwd: R bindings for Xapian: API modifications
James Aylett
james-xapian at tartarus.org
Tue May 3 15:01:57 BST 2016
On Tue, May 03, 2016 at 06:13:44PM +0530, Amanda Jayanetti wrote:
> > >but it looked like
> > >you were suggesting that (for instance) the ID column in the data
> > >frame would only be specified by numeric index.
>
> The parameter idField is only used to allow the user to specify a column
> whose row values will be used as unique identifiers. If it's required to
> index the idField then it should be separately included in the indexFields
> list as shown below.
>
> indexFields <-list( list(0,"S","id_NUMBER"), list(2,"S","Title"),
> list(8,"XD","Description"))
Somewhere we have some miscommunication, and it doesn't help that
examples without clear explanation leave me guessing at what all the
pieces are doing. I assume what's happening here is that the following
columns in the data table are indexed in the following way:
id_NUMBER indexed with a prefix of 'S' and wdf=0 (so terms, not
posting)
Title indexed with a prefix of 'S' and wdf=2
Description indexed with a prefix of 'XD' and wdf=8
It's possible that isn't what you mean; having looked back over your
proposal, I see that in indexing you were setting prefix to field name
maps:
f1 <- c("Title", "S")
but you don't have to do that while indexing, only while searching.
If that is what you meant though, then it's an unusual example for a
couple of reasons:
1. The id_NUMBER would be more likely to be indexed with a prefix of
'Q' (this is common for Xapian applications, although not required)
2. The title is usually given more emphasis than the description, not
the other way round.
I assume that the `indexFields` you're setting up are intended to
drive a Xapian::TermGenerator. However it's common to add some boolean
terms as well (indeed, Q terms are generally boolean terms, as at the
bottom of the sample code at
https://getting-started-with-xapian.readthedocs.io/en/latest/practical_example/indexing/writing_the_code.html).
These won't be put through the TermGenerator; they're added directly
to the document.
If you're working on this somewhere online (Google doc, gist, github
repo, scratchpad or anything), can you link it to your wiki project
page? It feels like there are details that I'm either not
understanding or just haven't read.
J
--
James Aylett, occasional trouble-maker
xapian.org
More information about the Xapian-devel
mailing list