[Xapian-discuss] hi all

syed ahmed syed_ahmed_uos at hotmail.com
Mon Jan 29 13:49:58 GMT 2007


  hi

does n e one have an idea to program a web crawler with the following 
functions

1)to extract all the links from the webpage and further extract links from
each extracted link till depth level 3 and store them in a
database.
2) further functions of this crawler it should sort for each file type
.html, .pdf, .ps, ..txt etc..
3)to extract the meta information such as title, abstract and urls atleast
for two file types such as .pdf and .html
4) to calculate the MD5 or Sha 1 for every distinct entry in database.
5) the databse sys used shall be Mysql

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/




More information about the Xapian-discuss mailing list