[Xapian-discuss] [Xapian-devel] Opensource Websearch Engine Project

Charlie Hull charlie at juggler.net
Tue Oct 26 22:18:46 BST 2010


On 26 October 2010 17:55, Pierre-Louis Dehapiot <dehapiot at ece.fr> wrote:

>
> Hi,
>
> I'm Pierre-Louis Dehapiot from Paris, France. I am studying computing
> programming at the ECE (a french school) and this year, the topic of my
> project is "google and indexing".
> To summarize, it deals with creating my own google in only one year :p !
> I saw that you made yourself an opensource websearch engine written in C
> (Xapian).
> I already made the php/CSS interface for my own project only in French for
> the moment but in English soon ! (you can have a look here :
> http://pti.pl4tipus.com)
> As you can see, it's very "google-like" : this is what the topic deals
> with.
> If you have few minutes to answer me, I think I need some tips about "how
> to make an indexing engine".
> I know how it works approximately but i need more details about the
> difficulties of the project. All the tips you can give me can be very
> useful.
> Can you help me ?"
> I am glad of your future support.
>
> Pierre-Louis Dehapiot
>

Hi Pierre,

You may be interested to know that Xapian was originally created to power a
web search engine (half a billion web pages or thereabouts).

You've got a pretty steep learning curve to be honest: you're first going to
need to learn about web crawling (note that Xapian does not include a web
crawler, although there are plenty of open source ones out there - Heretrix
is a good example), and how to keep your index clean and current. Indexing
webpages into Xapian isn't that hard - Xapian's Omega application will do
that for you if you don't want to control Xapian directly. You can then use
Xapian's PHP bindings to hook up to your existing front end. Depending on
how many pages you want to index, you may also have to learn how to spread
your index across multiple machines.

I wish you luck with your project - I would start by reading about how to
build and use web crawlers, then try creating a small searchable index using
Xapian. I'm sure others on this list will help with any questions, but you
should do some research first.

Cheers

Charlie



>
>
> _______________________________________________
> Xapian-devel mailing list
> Xapian-devel at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-devel
>


More information about the Xapian-discuss mailing list