[Xapian-discuss] scriptindex memory usage
Kevin Duraj
kevin.softdev at gmail.com
Wed Nov 21 03:50:03 GMT 2007
Dear Jim,
My scriptindex uses 7.2 GB of memory when indexing 56 millions of
documents. Xapian memory indexing usage is based on
XAPIAN_FLUSH_THRESHOLD envrionment variable. The default is 10K, mine
is 1 million. I have switch all memory slots to 2GB memory modules and
have been throwing 500MB memory modules to garbage. If you send me
self adress envelope with postage I will send you back couple of 500MB
memory modules. It will be more than double what you need.
Cheers
Kevin Duraj
http://UncensoredWebSearch.com
On Nov 19, 2007 2:39 PM, Jim Spath <jspath at pangeamedia.com> wrote:
> Jim Spath wrote:
> > quiz_id : field=quiz_id unique=Q boolean=Q
> > quiz_title : field=title weight=4 index index=XTITLE
> > quiz_path : field=path
> > tags : weight=3 index index=XTAGS
> > questions : weight=2 index index=XQUESTIONS
> > answers : weight=1 index index=XANSWERS
> > adult : field=adult index boolean=XADULT
> > type : field=type boolean=XTYPE
> > create_date : value=0
> > language_string : field=language_string boolean=L
>
> Looking my indexer_script over, I saw a some optimizations I could make
> and have lowered the amount of memory scriptindex is using by over 100MB:
>
> VIRT RES SHR
> previously: 236m 227m 1504
> currently: 138m 129m 1508
>
> My indexer_script now looks like:
>
> quiz_id : field=quiz_id unique=Q boolean=Q
> quiz_title : field=title weight=4 index=XTITLE
> quiz_path : field=path
> tags : weight=3 index=XTAGS
> questions : weight=2 index=XQUESTIONS
> answers : weight=1 index=XANSWERS
> adult : boolean=XADULT
> type : boolean=XTYPE
> create_date : value=0
> language_string : boolean=L
>
> The resulting database files are much smaller now too:
>
> position: 59M vs 148M
> postlist: 51M vs 89M
> record: 4.4M vs 7.7M
> termlist: 50M vs 67M
> value: 1.3M vs 1.5M
>
> I'm still worried about resource use as the amount of data grows, but I
> guess I'm somewhat better off now.
>
> Are there some generally accepted "best practices" for indexing large
> datasets?
>
>
> - Jim
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
More information about the Xapian-discuss
mailing list