[Xapian-discuss] A few questions wrt Xapian

Felix Antonius Wilhelm Ostmann ostmann at websuche.de
Mon Nov 3 15:53:21 GMT 2008


That sounds exactly like our first Xapian-Project (also with perl) :) 
But we dont have the problem with the cluster. All other works very well!

merging many indexs
unique results (with MatchDecider, dont know how that works with cluster).

You should have a look at Omega.

I see, my answers are not very usefull, i hope someone else will answer 
all the questions exact :-/


Henka schrieb:
> Greetings all,
>
> I'm about to evaluate Xapian for a future project and would appreciate  
> a few comments from those in the know:
>
> Indexing
>
> 1.  Is Xapian similar to Lucene in the sense that you can define as  
> many fields as you want, and assign various weights (which influence  
> search result sorting) to these fields?  I gather from the docs that  
> you can, but I just need confirmation.
>
> 2.  Let's say you're indexing websites; can you then merge/combine  
> many smaller indexes into larger ones for later searching?
>
>
> Searching
>
> 1.  I gather from the docs that you can sort results according to your  
> own field/s, followed by the default document scoring (think  
> "page-rank").  Correct?
>
> 2.  ~/docs/remote.htm mentions distributed searching - we want to  
> spread the search load around our cluster by splitting the index into  
> many manageable-sized indexes (to ensure sub-second performance), with  
> a "master" node which combines search results and end-users see.  Is  
> my understanding correct and are there any pitfalls/bottlenecks?
>
> 3.  Removing duplicates:  this can be done programmatically I know  
> (but is slow on our chosen platform - Perl), but does Xapian provide  
> this mechanism built-in?  For example:  a search result might return  
> several pages from a web site, but we want to remove these dups and  
> only provide a single result (highest ranking) per website (eg, with a  
> link for "More from this site..." - al-la Google, which will be a  
> separate search displaying all the site-duplicates).
>
> 4.  If the mechanism to remove duplicates exists, will this still work  
> cluster-wide in distributed searching?
>
> 5.  Does Xapian provide a mechanism for identifying the actual field  
> in a search result which triggered the hit?  eg, let's say you have  
> TITLE, BODY, OTHER as fields in your index.  If a search found your  
> term in the BODY field, does Xapian provide this as feedback?
>
> 5.  This is difficult I know:  how does Xapian compare  
> performance-wise?  Has anyone done any basic benchmarking?
>
>
>
> Thanks for any information you can provide.
>
> Regards
> Henry
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
>
>   


-- 
Mit freundlichen Grüßen

Felix Antonius Wilhelm Ostmann
--------------------------------------------------
Websuche   Search   Technology   GmbH   &   Co. KG
Martinistraße 3  -  D-49080  Osnabrück  -  Germany
Tel.:   +49 541 40666-0 - Fax:    +49 541 40666-22
Email: info at websuche.de - Website: www.websuche.de
--------------------------------------------------
AG Osnabrück - HRA 200252 - Ust-Ident: DE814737310
Komplementärin:     Websuche   Search   Technology
Verwaltungs GmbH   -  AG Osnabrück  -   HRB 200359
Geschäftsführer:  Diplom Kaufmann Martin Steinkamp
--------------------------------------------------




More information about the Xapian-discuss mailing list