Formulating Advanced Queries with Xapian-Omega

Giulio Teslano giulio.teslano at virgilio.it
Thu Dec 22 10:51:35 GMT 2016


Hello,

 

We have Xapian-Omega installed (Linux) and working in default mode and
have 

browsed several interesting pages on the main site, at trac.xapian.org

(the wiki) and in the mailing list. Having tested various search
options 

(up to now only for Omega) we would like to ask a couple of questions.

 

1. Is it possible to search for Patterns of Text with Omega and/or
Xapian Queries ?

 

ie : Given a file archive containing a series of documents, (Business
Office 

documents of various formats) and that one would like to find all
documents 

which contain text of a certain 'format', such as some type of ID
Code, an 

example might be ISBNs for Libraries or some other custom Contact ID
etc.

 

Can, or could, one construct a query so that Omega (Xapian) can handle
this ?

... perhaps with some type of Regex ?

 

It would seem that Wild Cards fall short here.

If it is possible but not immediately available what would one have to
do 

to enable this option ? Are there any working examples, HowTos, Faqs ?

 

(We read about a couple of Omega options hinting about this :

$match{REGEX,STRING[,OPTIONS]},
$transform{REGEXP,SUBST,STRING[,OPTIONS]}

but it is not immediately clear (to us at least) how to implement them
and 

we have not seen any examples from which to learn.

 

Is transform{} only a post query option acting on the result set ?

 

If none of the above are possible for Omega, can one manage this with
Xapian, 

or do something similar ?

and 

Again any links to working examples etc. would be most appreciated.

 

2. Suggestions for Indexing files of Miscellaneous Types

On the Xapian site several pages put emphasis on the importance and
way in which 

the database and index are created with custom fields, in particular
using 

(semi)structured files (for files with a regular, recognizable common
format) 

which are well disposed to this type of field indexing. .csv etc.

 

Could someone offer any comments regarding the best way to prepare
Omega and/or 

Xapian for File Archives (where the files are of miscellaneous type)
and where 

internal fields are not always obvious and metadata/tags etc. are
quite often 

lacking in homogeneity, if indeed present at all ?

 

Thank you very much for any feedback.

Best wishes,

Giulio

 



More information about the Xapian-discuss mailing list