[Xapian-devel] Opensource Websearch Engine Project

Charlie Hull charlie at juggler.net
Wed Oct 27 09:15:24 BST 2010


On 26/10/2010 17:55, Pierre-Louis Dehapiot wrote:
>
> Hi,
>
> I'm Pierre-Louis Dehapiot from Paris, France. I am studying computing programming at the ECE (a french school) and this year, the topic of my project is "google and indexing".
> To summarize, it deals with creating my own google in only one year :p !
> I saw that you made yourself an opensource websearch engine written in C (Xapian).
> I already made the php/CSS interface for my own project only in French for the moment but in English soon ! (you can have a look here : http://pti.pl4tipus.com)
> As you can see, it's very "google-like" : this is what the topic deals with.
> If you have few minutes to answer me, I think I need some tips about "how to make an indexing engine".
> I know how it works approximately but i need more details about the difficulties of the project. All the tips you can give me can be very useful.
> Can you help me ?"
> I am glad of your future support.
>
> Pierre-Louis Dehapiot


Hi Pierre,

(Apologies, I posted this to xapian-discuss by mistake)

You may be interested to know that Xapian was originally created to 
power a web search engine (half a billion web pages or thereabouts).

You've got a pretty steep learning curve to be honest: you're first 
going to need to learn about web crawling (note that Xapian does not 
include a web crawler, although there are plenty of open source ones out 
there - Heretrix is a good example), and how to keep your index clean 
and current. Indexing webpages into Xapian isn't that hard - Xapian's 
Omega application will do that for you if you don't want to control 
Xapian directly. You can then use Xapian's PHP bindings to hook up to 
your existing front end. Depending on how many pages you want to index, 
you may also have to learn how to spread your index across multiple 
machines.

I wish you luck with your project - I would start by reading about how 
to build and use web crawlers, then try creating a small searchable 
index using Xapian. I'm sure others on this list will help with any 
questions, but you should do some research first.

Cheers

Charlie
>
>
>
> _______________________________________________
> Xapian-devel mailing list
> Xapian-devel at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-devel
>




More information about the Xapian-devel mailing list