R bindings for Xapian: API modifications

Amanda Jayanetti amandajayanetti at gmail.com
Sat Apr 30 16:02:54 BST 2016


I'm currently reviewing my originally proposed API design and I added two
new fields(idField, stemmer) to the xapian_index() function. As my next
task I'm planning to determine the output data structure and format of
xapian_search() function. Afterwards I will focus back on xapian_index()
function and review the format of valueSlots parameter.

An outline of 'simple indexing' functionality:

xapian_index(dbpath=””, datapath=””, idField=c(0), indexFields=NULL,
stemmer=””,valueSlots=NULL, …)

dbpath: Path to a xapian database
datapath: Path to a data source
idField: Column number of a column in the data frame whose row value will
be used as a unique identifier
indexFields: A list of character vectors each containing a field name and a
stemmer: language stemmer

xapian_index() function can be used to index the content of a data frame.

Convert the data frame(df) to a csv. (Skip this step if data source is
already a csv file):

>>  write.csv(df, ”location/of/data.csv”)

>>  f1 <- c(“Title”,”S”)

>>  f2<- c(“Description”,”XD”)

>>  fields<- list(f1,f2)

>>  idField <-c(0)

>>  xapian_index(“path/to/database”,”location/of/data.csv”, idField=c(0),

For indexing multiple data frames of similar format:

>>  dataLoc <-c(“path1”,”path2”,”path3”, …)

>>  for(dataSource in dataLoc){
           xapian_index(“path/to/database”,dataSource, idField=c(0),

Best regards,

