[Xapian-discuss] scriptindex memory usage
Jim Spath
jspath at pangeamedia.com
Mon Nov 19 22:39:06 GMT 2007
Jim Spath wrote:
> quiz_id : field=quiz_id unique=Q boolean=Q
> quiz_title : field=title weight=4 index index=XTITLE
> quiz_path : field=path
> tags : weight=3 index index=XTAGS
> questions : weight=2 index index=XQUESTIONS
> answers : weight=1 index index=XANSWERS
> adult : field=adult index boolean=XADULT
> type : field=type boolean=XTYPE
> create_date : value=0
> language_string : field=language_string boolean=L
Looking my indexer_script over, I saw a some optimizations I could make
and have lowered the amount of memory scriptindex is using by over 100MB:
VIRT RES SHR
previously: 236m 227m 1504
currently: 138m 129m 1508
My indexer_script now looks like:
quiz_id : field=quiz_id unique=Q boolean=Q
quiz_title : field=title weight=4 index=XTITLE
quiz_path : field=path
tags : weight=3 index=XTAGS
questions : weight=2 index=XQUESTIONS
answers : weight=1 index=XANSWERS
adult : boolean=XADULT
type : boolean=XTYPE
create_date : value=0
language_string : boolean=L
The resulting database files are much smaller now too:
position: 59M vs 148M
postlist: 51M vs 89M
record: 4.4M vs 7.7M
termlist: 50M vs 67M
value: 1.3M vs 1.5M
I'm still worried about resource use as the amount of data grows, but I
guess I'm somewhat better off now.
Are there some generally accepted "best practices" for indexing large
datasets?
- Jim
More information about the Xapian-discuss
mailing list