Saturday, November 03, 2007

block header

My objective now is to pull out some of the concepts from AgileWiki to create a very fast and reliable database.

I find myself inspired by the work done at Sun Microsystems on a reliable file system, in particular the idea of doing copy-on-write and having a checksum in each block. I also like the ability to update a backup file quickly.

The database would use fixed-size blocks which are moderately large. Now when I tuned AgileWike, I found that a block size of 32K seemed to work best with the algorithms I was using, so lets use that number as the default block size, at least for now.

Here then is a proposal for the block header:
  • Checksum of the Block
  • Timestamp
  • Data Length
  • Data

The checksum would be on the timestamp, data length and the (variable length) data. And let's use something like java.util.zip.Adler32, which is a reasonably fast checksum. And timestamp would be of the last time the block was updated. This layout should make it reasonably easy to update a backup copy of the database, copying only the blocks with a timestamp greater than the previous update. We can also detect corrupted data when we read a block, as well as making it easy to check to see if a database contains any bad blocks.

More about copy-on-write next time.

0 Comments:

Post a Comment

<< Home