[Xapian-discuss] Searching with the Omega

Olly Betts olly at survex.com
Fri May 4 03:30:01 BST 2007


On Thu, May 03, 2007 at 07:56:21AM -0700, Tong Sun wrote:
> I successfully followed the OmegaExample from
> http://wiki.xapian.org/OmegaExample
> 
> and have a working Xapian & Omega. However, I have
> several questions on
> indexing/searching with the Omega.
> 
> . The syntax for omindex is:
> 
>  omindex [OPTIONS] --db DATABASE --url BASEURL
> [BASEDIR] DIRECTORY
> 
>  - what if I need to search for more than one
> directories?

Then either index a directory which contains them all (perhaps via
symlinks), or run the indexer once for each with the
--preserve-nonduplicates option.  The --preserve-nonduplicates option
is a bit of a kludge - Omega ought to support a configuration file to
allow this situation to be supported more cleanly.

>  - what if I do another omindex on another
> directories, without specifying
>    the --overwrite parameter?

It runs in update mode instead, so new documents are added, existing
ones updated, and (unless you specify --preserve-nonduplicates)
documents which are no longer on disk are removed from the index.

> . I searched the word "tance", which was randomly
> picked from /book/ci_09.htm,
>   
>   - the result was, 
> 
>   /book/ci_27.htm is 100% relevant, /book/ci_09.htm is
> only 79% relevant
> 
>   - but the fact is that /book/ci_27.htm doesn't have
> the word "tance",
    
I just checked, and it does here on page 255:

    want to be sponsored. In the end, he accepted some financial assis-
    tance and all of the rousing support of the Portsmouth people.

(so it's actually the hyphenated word "assistance" being split into two).

>     /book/ci_09.htm does. 
> 
>   - same thing happens when searching for "Kermath",
> which was also randomly
>     picked from /book/ci_09.htm

You don't say where "Kermath" is found but apparently doesn't exist, so
that's hard for me to refute without going through the whole example
myself...

> . Even if the word "tance" is in /book/ci_09.htm, it
> is not returned in 
>   search result synopsis, nor it is highlighted. 

The synopsis is statically generated from the start of the document at
present (generating a dynamic synopsis is a feature yet to be
implemented).

And if a word isn't in the synopsis, we obviously can't highlight it!

> . The search result ranking seems odd to me. 
> 
>   - I first searched for word "test", the first
> several hits don't even have 
>     the word "test" highlighted in search result
> synopsis. The page that 
>     actually get the word "test" highlighted was
> returned around the 10th hit.

This doesn't seem to be a useful criterion for judging the results.  The
question you should ask yourself is "which of the documents were most
relevant to my query".

> . Just a quick question, is there any way to choose
> "Matching all words" as
>   default when first launching
> http://localhost/cgi-bin/omega.cgi?

Use the URL:

http://localhost/cgi-bin/omega.cgi?DEFAULTOP=and

Or if you're using a static search form, include this in your form:

<input type="hidden" name="DEFAULTOP" value="and">

Cheers,
    Olly



More information about the Xapian-discuss mailing list