Saturday, May 24, 2008

corruption, fragmentation and scalability

Disk space management is the foundation on which AgileWiki is being built. And there are three which need to be addressed: corruption, fragmentation and scalability. Until these are addressed there is little point in continuing.

The form of corruption which concerns me most is pending errors. This occurs when one leg of a
fork frees a block which is still in use by the other and then a system-level free of the block occurs. Unfortunately the API lends itself to this error. But it is likely quite repairable--the block is no longer in use and only we have not removed the block from the pending list. On the other hand, since it is an easy error to make, we need to be able to catch this error prior to completing the transaction which does the system-level free of the block. This can be done by maintaining a database-level list of all pending blocks and checking this list prior to doing a free. Other forms of database corruption being experienced just need to be shook out of the existing code base through the usual test/fix process.

Fragmentation is another concern, especially as the list of all free blocks is maintained in memory. Currently this is being addressed by forcing all disk space allocations to be a multiple of 64. Increasing this number has a significant impact on fragmentation and the number of blocks in the free block list, but is expensive in terms of small blocks. But there are other approaches which can be used. Remember the old buddy allocation scheme where blocks were allocated with a length which was always a power of 2? Fragmentation is avoided, but at a cost increasing disk space requirements by 100%. I am thinking that there is a less costly approach. We only allocate blocks whose size is a Fibonacci Number. This should also avoid fragmentation while increasing disk space requirements by only 60%.

Finally, there is a scalability issue in the pending lists held by the fork blocks. This list can get very large. We need to move these lists out of the fork block and into another database-level table.

0 Comments:

Post a Comment

<< Home