[Xapian-discuss] Searching with the Omega
Olly Betts
olly at survex.com
Fri May 4 03:30:01 BST 2007
On Thu, May 03, 2007 at 07:56:21AM -0700, Tong Sun wrote:
> I successfully followed the OmegaExample from
> http://wiki.xapian.org/OmegaExample
>
> and have a working Xapian & Omega. However, I have
> several questions on
> indexing/searching with the Omega.
>
> . The syntax for omindex is:
>
> omindex [OPTIONS] --db DATABASE --url BASEURL
> [BASEDIR] DIRECTORY
>
> - what if I need to search for more than one
> directories?
Then either index a directory which contains them all (perhaps via
symlinks), or run the indexer once for each with the
--preserve-nonduplicates option. The --preserve-nonduplicates option
is a bit of a kludge - Omega ought to support a configuration file to
allow this situation to be supported more cleanly.
> - what if I do another omindex on another
> directories, without specifying
> the --overwrite parameter?
It runs in update mode instead, so new documents are added, existing
ones updated, and (unless you specify --preserve-nonduplicates)
documents which are no longer on disk are removed from the index.
> . I searched the word "tance", which was randomly
> picked from /book/ci_09.htm,
>
> - the result was,
>
> /book/ci_27.htm is 100% relevant, /book/ci_09.htm is
> only 79% relevant
>
> - but the fact is that /book/ci_27.htm doesn't have
> the word "tance",
I just checked, and it does here on page 255:
want to be sponsored. In the end, he accepted some financial assis-
tance and all of the rousing support of the Portsmouth people.
(so it's actually the hyphenated word "assistance" being split into two).
> /book/ci_09.htm does.
>
> - same thing happens when searching for "Kermath",
> which was also randomly
> picked from /book/ci_09.htm
You don't say where "Kermath" is found but apparently doesn't exist, so
that's hard for me to refute without going through the whole example
myself...
> . Even if the word "tance" is in /book/ci_09.htm, it
> is not returned in
> search result synopsis, nor it is highlighted.
The synopsis is statically generated from the start of the document at
present (generating a dynamic synopsis is a feature yet to be
implemented).
And if a word isn't in the synopsis, we obviously can't highlight it!
> . The search result ranking seems odd to me.
>
> - I first searched for word "test", the first
> several hits don't even have
> the word "test" highlighted in search result
> synopsis. The page that
> actually get the word "test" highlighted was
> returned around the 10th hit.
This doesn't seem to be a useful criterion for judging the results. The
question you should ask yourself is "which of the documents were most
relevant to my query".
> . Just a quick question, is there any way to choose
> "Matching all words" as
> default when first launching
> http://localhost/cgi-bin/omega.cgi?
Use the URL:
http://localhost/cgi-bin/omega.cgi?DEFAULTOP=and
Or if you're using a static search form, include this in your form:
<input type="hidden" name="DEFAULTOP" value="and">
Cheers,
Olly
More information about the Xapian-discuss
mailing list