[Xapian-discuss] Feature request: Ligthen pressure on backup

Jesper Krogh jesper at krogh.cc
Mon Mar 24 06:07:38 GMT 2008


Hi.

This i a small feature request for Xapian. Currently I have a 
xapian-database with >5m records, the files fills around 124GB in the
Xapian catalog. With a few "quite large" files:

# du -sh *
0       flintlock
4.0K    iamflint
1000K   position.baseA
63G     position.DB
716K    postlist.baseA
624K    postlist.baseB
45G     postlist.DB
8.0K    record.baseA
385M    record.DB
240K    termlist.baseA
15G     termlist.DB
12K     value.baseB
696M    value.DB

(And it is my impression that I have a quite small record.DB-file)
The idea comes from PostgreSQL's filesystem layout, it has a (probably 
historic) filesize of 2GB, but it helps the backup significantly.

This layout, gives some "challenges" to backup systems since the daily 
incremental runs basically now has to backup the complete set => 124GB 
even if only a single new document has been merged.

The suggesting would be to split the files in several smaller files. I 
know that the algorithms for searching the binary trees probably would 
be a bit more complex, but it could result in that changes only touches 
a subset of the files, thus letting the backup proceed easier.

Another solution could be to let Xapian query several databases and 
"merge" the result. Then I could make a new database each day and merge 
once a week (or another timepattern that would fit the purpose).

Other suggestions are welcome.

Thanks.

Jesper
-- 
Jesper



More information about the Xapian-discuss mailing list