[Xapian-discuss] scriptindex memory usage

Jim Spath jspath at pangeamedia.com
Mon Nov 19 22:39:06 GMT 2007


Jim Spath wrote:
> quiz_id : field=quiz_id unique=Q boolean=Q
> quiz_title : field=title weight=4 index index=XTITLE
> quiz_path : field=path
> tags : weight=3 index index=XTAGS
> questions : weight=2 index index=XQUESTIONS
> answers : weight=1 index index=XANSWERS
> adult : field=adult index boolean=XADULT
> type : field=type boolean=XTYPE
> create_date : value=0
> language_string : field=language_string boolean=L

Looking my indexer_script over, I saw a some optimizations I could make 
and have lowered the amount of memory scriptindex is using by over 100MB:

             VIRT  RES  SHR
previously: 236m 227m 1504
currently:  138m 129m 1508

My indexer_script now looks like:

quiz_id : field=quiz_id unique=Q boolean=Q
quiz_title : field=title weight=4 index=XTITLE
quiz_path : field=path
tags : weight=3 index=XTAGS
questions : weight=2 index=XQUESTIONS
answers : weight=1 index=XANSWERS
adult : boolean=XADULT
type : boolean=XTYPE
create_date : value=0
language_string : boolean=L

The resulting database files are much smaller now too:

  position: 59M  vs 148M
  postlist: 51M  vs 89M
  record:   4.4M vs 7.7M
  termlist: 50M  vs 67M
  value:    1.3M vs 1.5M

I'm still worried about resource use as the amount of data grows, but I 
guess I'm somewhat better off now.

Are there some generally accepted "best practices" for indexing large 
datasets?

- Jim



More information about the Xapian-discuss mailing list