IR on full web

Sagar Acharya sagaracharya at tutanota.com
Sat Jul 8 15:19:57 BST 2023


What I require is to index as much web as I can with a certain limit on database size. How do I achieve that?

I want to crawl html web pages and when the format is reasonably good as with my webpage below, I want to parse and store it in xapian glass.

How do I do such a thing? I just want text, no images or videos.
Thanking you
Sagar Acharya
https://humaaraartha.in



More information about the Xapian-discuss mailing list