FennelPageBasedDataStructureHowto
From Eigenpedia
This page walks through an example of how to create a new persistent page-based data structure built on Fennel.
Contents |
Premise
For conceptual background on how Fennel stores pages in segments, see the segment design docs.
As an example, we're going to create a sparse bitmap class. The idea is that an application can store a bitmap of arbitrary size as long as it is mostly "holes", i.e. pages which have all zero bits set.
To keep the example simple, we'll structure it with a single directory page which indexes a number of leaf pages (each with a capacity of 8 bits), with no intermediate levels. In the example below, the directory has entries for two non-empty leaf pages (indexed by their bit start offsets, with pointers to the corresponding leaf pages); the rest of the leaves are holes (i.e. no pages have been allocated for them yet, so they are implicitly full of zero bits). So a total of three pages are allocated.
Although the logical bitmap size is arbitrary, physically it cannot have more non-empty leaf pages than can be indexed by the single directory page. There is no correlation between the page locations on disk and the bitmap offsets; in the example above, if bit 33 was set first, and then bit 9, the second leaf page might be allocated with a lower page ID than the first leaf page.
A few other simplifying assumptions:
- bits are numbered starting from 0
- each leaf page spans the same fixed number of bits, which must be a multiple of 8
- pages are never deleted, even if all of the bits are reset to 0
- the only operations are getting and setting individual bits
The complete code for this example is in fennel/test/SparseBitmapTest.cpp. We'll walk through snippets of it in the sections below.
Note: if you need to modify this page, please update the code in Perforce as well.
Directory Node Structure
Directory pages will be stored based on the following header structure. It inherits fennel::StoredNode, which contains a generic field for tracking the page type via a magic number. This is important for us, since we have two different node types (directory and leaf), and we wouldn't want to accidentally mix them up. Fennel will automatically set and check the magic numbers as pages are allocated and accessed, and raise an assertion failure if we make a mistake.
/**
* Page header data structure for a sparse bitmap directory node.
*/
struct SparseBitmapDirectory : public fennel::StoredNode
{
static const fennel::MagicNumber MAGIC_NUMBER = 0x82009900461412f0LL;
/**
* Number of SparseBitmapDirEntry items currently filled on this page.
*/
uint nEntries;
/**
* @return read-only reference to array of directory entries
* on this page
*/
SparseBitmapDirEntry const *getEntriesForRead() const
{
return reinterpret_cast<SparseBitmapDirEntry const *>(
this + 1);
}
/**
* @return read/write reference to array of directory entries
* on this page
*/
SparseBitmapDirEntry *getEntriesForWrite()
{
return reinterpret_cast<SparseBitmapDirEntry *>(
this + 1);
}
};
The magic number is generated as part of defining a new page type by running uuidgen and combining the last two portions of the result:
$ uuidgen
359342bc-b69c-4eba-8200-9900461412f0
[-----------------]
Be careful not to copy and paste magic numbers from existing data structures when creating new ones. Fennel currently has no way to catch this mistake for you.
Beyond the fixed-size header, the rest of a directory page will be used as an array of entries describing allocated leaves. The nEntries field tracks the size of this array; the rest of the page contents beyond the array are undefined. The two helper methods getEntriesForRead and getEntriesForWrite encapsulate pointer arithmetic for locating the array start just past the header. It's important for both versions to be available since Fennel uses const-correctness as a way of helping programmers avoid accidentally writing to pages which haven't been exclusively locked and marked dirty in the cache.
The array entries are defined as follows:
/**
* 0-based offset (bit address) within a sparse bitmap.
*/
typedef uint64_t SparseBitmapOffset;
/**
* Entry in a sparse bitmap directory.
*/
struct SparseBitmapDirEntry
{
/**
* Start offsets of bits contained by the corresponding leaf node.
*/
SparseBitmapOffset iLeafStartOffset;
/**
* Reference to leaf node.
*/
fennel::PageId leafId;
};
Note that only the start offset of each non-empty leaf is indexed; we know which other bit addresses are spanned by each leaf because the leaves are fixed-size.
Out of programmer laziness, we won't bother keeping the array sorted; we'll just use linear search and insert new entries at the end.
Finally, we define a corresponding guard object for directory page locks:
/** * Page lock guard for sparse bitmap directory nodes. */ typedef fennel::SegNodeLock<SparseBitmapDirectory> SparseBitmapDirLock;
This is based on the generic fennel::SegNodeLock; we'll see how it is used later on.
Leaf Node Structure
The leaf page header definition follows a similar pattern:
/**
* Page header data structure for a sparse bitmap leaf node.
*/
struct SparseBitmapLeaf : public fennel::StoredNode
{
static const fennel::MagicNumber MAGIC_NUMBER = 0xba107d175b3338dcLL;
/**
* Start offset of bits contained by this leaf node. This is redundant
* (for sanity-checking and self-identification purposes); it should
* match the corresponding directory entry referencing this leaf.
*/
SparseBitmapOffset iStartOffset;
/**
* @return read-only reference to bytes containing bit array on
* this page
*/
fennel::PConstBuffer getBytesForRead() const
{
return reinterpret_cast<fennel::PConstBuffer>(this + 1);
}
/**
* @return read/write reference to bytes containing bit array on
* this page
*/
fennel::PBuffer getBytesForWrite()
{
return reinterpret_cast<fennel::PBuffer>(this + 1);
}
};
iStartOffset is redundant, but it's a good idea to build in a limited amount of redundancy for use by physical integrity-checking utilities (and also for when you are unlucky enough to be tasked with digging through a corrupted database to diagnose or repair it!)
Again, we define a companion guard object for leaf page locks:
/** * Page lock guard for sparse bitmap leaf nodes. */ typedef fennel::SegNodeLock<SparseBitmapLeaf> SparseBitmapLeafLock;
Bitmap Access Class
Now that we've defined the persistent structures, we can develop the class which will be used to instantiate and access them:
/**
* SparseBitmap is an example of how to create a page-based
* persistent data structure using Fennel. It is only intended
* for educational purposes, not for real use.
*/
class SparseBitmap
{
/**
* Accessor for the segment storing the bitmap node pages.
*/
fennel::SegmentAccessor segmentAccessor;
/**
* Location of bitmap directory node.
*/
fennel::PageId dirPageId;
/**
* Number of bits contained by each leaf node; this is fixed
* based on segment page size after headers and footers have been
* subtracted off.
*/
size_t nBitsPerLeaf;
/**
* Maximum number of entries which can be contained by
* the directory node; this is fixed based on segment page size
* after headers and footers have been subtracted off, and determines
* the total number of leaf pages which can be allocated.
*/
size_t nDirEntriesMax;
/**
* Looks up a leaf node (based on its start offset) in the directory.
*
* @param dirLock page guard for directory node, which must already
* be locked in the desired mode
*
* @param iLeafStartOffset leaf node start offset to search for
*
* @return location of leaf node, or NULL_PAGE_ID if not found
*/
fennel::PageId searchDirectory(
SparseBitmapDirLock &dirLock,
SparseBitmapOffset iLeafStartOffset);
public:
/**
* Creates a new empty sparse bitmap (or loads an existing one).
*
* @param segmentAccessor accessor for the segment storing the bitmap node
* pages
*
* @param dirPageId location of existing directory node to load,
* or NULL_PAGE_ID to create a new bitmap
*/
explicit SparseBitmap(
fennel::SegmentAccessor segmentAccessor,
fennel::PageId dirPageId = fennel::NULL_PAGE_ID);
/**
* Sparse bitmap destructor.
*/
virtual ~SparseBitmap();
/**
* Reads a bit from the bitmap.
*
* @param iOffset address of bit to read
*
* @return true iff bit is set
*/
bool getBit(SparseBitmapOffset iOffset);
/**
* Writes a bit in the bitmap.
*
* @param iOffset address of bit to write
*
* @param value true to set bit; false to clear bit
*/
void setBit(SparseBitmapOffset iOffset, bool value);
/**
* @return the location of the directory node for this bitmap
*/
fennel::PageId getDirPageId() const;
/**
* @return number of bits contained by each leaf node
*/
size_t getBitsPerLeaf() const;
/**
* @return maximum number of entries which can be contained
* by the directory node
*/
size_t getMaxDirectoryEntries() const;
};
The segmentAccessor data member is what we need for allocating and accessing persistent pages in the segment which will contain the bitmap storage. The dirPageId data member is our starting point for finding the directory node.
We'll walk through the method implementations below.
Constructor
The SparseBitmap constructor takes care of precomputing some parameters based on page size, and also allocates the directory page if none was supplied. This is our first example of how a page lock guard object (dirLock) is used. The guard is associated with the segment accessor, and then allocatePage is called, resulting in a new page being allocated in the segment, with a corresponding cache buffer mapped to it and implicitly pinned with an exclusive lock via dirLock.
We pass ANON_PAGE_OWNER_ID for anonymous allocation; real applications can define their own object ID's so that all pages of a particular object (e.g. one bitmap) can be marked with a unique owner ID in the segment extent metadata.
The cache buffer is accessed in a strongly-typed fashion via dirLock.getNodeForWrite; since dirLock is based on a template, it knows how to automatically cast the cache memory reference to the correct type (SparseBitmapDirectory &).
SparseBitmap::SparseBitmap(
fennel::SegmentAccessor segmentAccessorInit,
fennel::PageId dirPageIdInit)
{
segmentAccessor = segmentAccessorInit;
dirPageId = dirPageIdInit;
if (dirPageId == fennel::NULL_PAGE_ID) {
// create new empty directory
SparseBitmapDirLock dirLock;
dirLock.accessSegment(segmentAccessor);
dirPageId = dirLock.allocatePage(fennel::ANON_PAGE_OWNER_ID);
SparseBitmapDirectory &dir = dirLock.getNodeForWrite();
dir.nEntries = 0;
}
// precompute some limits based on page and header sizes
size_t cbPage = segmentAccessor.pSegment->getUsablePageSize();
size_t nBytesPerLeaf = cbPage - sizeof(SparseBitmapLeaf);
nBitsPerLeaf = nBytesPerLeaf * 8;
assert(nBitsPerLeaf > 0);
nDirEntriesMax =
(cbPage - sizeof(SparseBitmapDirectory)) / sizeof(SparseBitmapDirEntry);
assert(nDirEntriesMax > 0);
}
The guard object takes care of releasing the cache lock as soon as the guard goes out of scope for any reason (including unexpected reasons such as exceptions being thrown). This is very important for making sure that page locks don't linger. However, it also means that the programmer has to be careful, since once the guard goes out of scope, the corresponding node reference becomes invalid. For example, in the code above, it would be a big mistake to keep using the dir reference after the if block!
Method getBit
The 'getBit method is an example of using shared locks to read existing data without modifying it. Again using guard objects we start with a shared lock on the directory, and try to find an entry covering the offset of the bit we're looking for. If we can't find it, the bit is implicitly clear, so we can return immediately without looking at any leaves; dirLock will be unlocked automatically. If we do find an entry, then we'll take a shared-lock on the corresponding leaf node and access the bit. To minimize the impact on concurrent writers, we can explicitly unlock dirLock as soon as we're done accessing the directory.
bool SparseBitmap::getBit(SparseBitmapOffset iOffset)
{
// Lock directory page in shared mode
SparseBitmapDirLock dirLock;
dirLock.accessSegment(segmentAccessor);
dirLock.lockShared(dirPageId);
// Compute start offset of leaf page containing iOffset
SparseBitmapOffset iLeafStartOffset =
(iOffset / nBitsPerLeaf) * nBitsPerLeaf;
// Look for it in the directory
fennel::PageId leafId = searchDirectory(dirLock, iLeafStartOffset);
if (leafId == fennel::NULL_PAGE_ID) {
// Not in directory, so the bit is not set
return false;
}
// Unlock directory node early since we don't need it any more
dirLock.unlock();
// Lock leaf page and perform sanity check
SparseBitmapLeafLock leafLock;
leafLock.accessSegment(segmentAccessor);
leafLock.lockShared(leafId);
SparseBitmapLeaf const &leaf = leafLock.getNodeForRead();
assert(leaf.iStartOffset == iLeafStartOffset);
// Read bit value from leaf
fennel::PConstBuffer pBytes = leaf.getBytesForRead();
size_t iBit = iOffset - iLeafStartOffset;
size_t iByte = iBit / 8;
size_t iBitInByte = iBit - 8 * iByte;
if (pBytes[iByte] & (1 << iBitInByte)) {
return true;
} else {
return false;
}
}
The searchDirectory helper method performs the brain-dead linear search mentioned earlier, and illustrates how Boost and STLport algorithms may even be applicable to the contents of cache pages:
fennel::PageId SparseBitmap::searchDirectory(
SparseBitmapDirLock &dirLock,
SparseBitmapOffset iLeafStartOffset)
{
SparseBitmapDirectory const &dir = dirLock.getNodeForRead();
SparseBitmapDirEntry const *pFirst = dir.getEntriesForRead();
SparseBitmapDirEntry const *pLast = pFirst + dir.nEntries;
SparseBitmapDirEntry const *pFound =
std::find_if(
pFirst,
pLast,
boost::bind(&SparseBitmapDirEntry::iLeafStartOffset, _1)
== iLeafStartOffset);
if (pFound == pLast) {
return fennel::NULL_PAGE_ID;
}
return pFound->leafId;
}
Important: note that dir is declared as a reference (SparseBitmapDirectory const &). Declaring it as a real object would not work, since it would slice off the page header from the cache page and read garbage from the stack, instead of accessing the cache page contents by reference, which is what we want. There is currently no compile-time protection available to prevent this mistake.
Method setBit
The setBit method below is similar to getBit, but there are a few differences to note. We need to take an exclusive lock on the directory, since there's a chance we may need to modify it. However, we do not call getNodeForWrite on it until we are sure that we need to modify it; this avoids unnecessarily dirtying the directory page in the case where a leaf has already been allocated for the bit we want. We are able to reuse the searchDirectory helper method here since it uses getNodeForRead. Note that exclusive-locking a page doesn't mark it as dirty; that is deferred until getNodeForWrite is called.
Also note that we have to zero-out the contents of new bitmap leaf pages explicitly; Fennel does not guarantee anything about the contents of a newly allocated page.
The exception thrown when the directory is full is a good illustration of how page guards take care of unlocking automatically, avoiding the need to clean anything up explicitly during unwind.
void SparseBitmap::setBit(SparseBitmapOffset iOffset, bool value)
{
// Lock directory page in exclusive mode
SparseBitmapDirLock dirLock;
dirLock.accessSegment(segmentAccessor);
dirLock.lockExclusive(dirPageId);
// Search directory; if it already has a corresponding leaf entry,
// we won't need to modify the directory
SparseBitmapOffset iLeafStartOffset =
(iOffset / nBitsPerLeaf) * nBitsPerLeaf;
fennel::PageId leafId = searchDirectory(dirLock, iLeafStartOffset);
SparseBitmapLeafLock leafLock;
leafLock.accessSegment(segmentAccessor);
bool clearLeaf = false;
if (leafId == fennel::NULL_PAGE_ID) {
// Leaf doesn't exist yet: we'll need a new one
SparseBitmapDirectory &dir = dirLock.getNodeForWrite();
if (dir.nEntries >= nDirEntriesMax) {
// Oops, directory is full and we have no provisions
// for splitting it; we haven't modified the directory yet,
// so the bitmap remains intact
throw std::runtime_error("SparseBitmap directory full");
}
// Allocate new leaf and add it to the directory
SparseBitmapDirEntry *pLast =
dir.getEntriesForWrite() + dir.nEntries;
leafId = leafLock.allocatePage(fennel::ANON_PAGE_OWNER_ID);
pLast->iLeafStartOffset = iLeafStartOffset;
pLast->leafId = leafId;
dir.nEntries++;
clearLeaf = true;
dirLock.unlock();
} else {
// Leaf already exists, so no need to modify directory;
// we can unlock the directory early
dirLock.unlock();
leafLock.lockExclusive(leafId);
}
// Write bit value to leaf
SparseBitmapLeaf &leaf = leafLock.getNodeForWrite();
fennel::PBuffer pBytes = leaf.getBytesForWrite();
if (clearLeaf) {
// We're initializing a new leaf, so clear all the bits first
leaf.iStartOffset = iLeafStartOffset;
memset(pBytes, 0, nBitsPerLeaf / 8);
}
size_t iBit = iOffset - iLeafStartOffset;
size_t iByte = iBit / 8;
size_t iBitInByte = iBit - 8 * iByte;
if (value) {
pBytes[iByte] |= (1 << iBitInByte);
} else {
pBytes[iByte] &= ~(1 << iBitInByte);
}
}
There are a few subtle concurrency control points to note here:
- In the case where we need to allocate a new leaf, it's OK to release the directory lock once we are done updating the directory (and before initializing the leaf). The reason is that the exclusive lock taken by leaf allocation will prevent any other threads from seeing the uninitialized state of the leaf.
- It's easy to show that the getBit and setBit operations are deadlock-free since they always proceed in directory->leaf order.
- Starting the setBit operation by taking an exclusive lock on the directory is actually overly pessimistic, and can cause the directory to become a source of contention. An optimized approach is to start by taking a shared lock on the directory node. If no new leaf needs to be allocated, then the optimism was a win. In the case where a new leaf needs to be allocated, then we can try an instantaneous upgrade to an exclusive lock on the directory. If that fails (due to the presence of other threads with shared locks), then we would unlock the directory and restart the operation, this time starting pessimistically with an exclusive lock as before. This approach takes only a little more code, and works well, with no possibility of deadlock (it has been implemented for the Fennel btree library, where root access is similar).
Unit Tests
The unit tests for SparseBitmap demonstrate application-level usage.
Although a real application would be free standing, the SparseBitmap unit tests are based on the Fennel/Boost framework, keeping a number of data members for resources such as segment/device/cache objects:
/**
* Unit tests for SparseBitmap.
*/
class SparseBitmapTest : virtual public TestBase
{
SharedRandomAccessDevice pDevice;
SharedCache pCache;
SharedSegmentFactory pSegmentFactory;
SharedSegment pSegment;
SegmentAccessor segmentAccessor;
static const DeviceId BITMAP_DEVICE_ID;
void openStorage(DeviceMode deviceMode);
void closeStorage();
public:
explicit SparseBitmapTest()
{
FENNEL_UNIT_TEST_CASE(SparseBitmapTest,testBasic);
FENNEL_UNIT_TEST_CASE(SparseBitmapTest,testSpread);
FENNEL_UNIT_TEST_CASE(SparseBitmapTest,testSizes);
FENNEL_UNIT_TEST_CASE(SparseBitmapTest,testFullDirectory);
}
virtual void testCaseTearDown();
void testBasic();
void testSpread();
void testSizes();
void testFullDirectory();
};
The testBasic case demonstrates creating a new bitmap, setting a bit, closing the bitmap, and then reloading it and verifying that the bit is still set:
void SparseBitmapTest::testBasic()
{
// Create a new bitmap and set a single bit at offset 0
PageId dirPageId;
openStorage(DeviceMode::createNew);
{
SparseBitmap bitmap(segmentAccessor);
dirPageId = bitmap.getDirPageId();
bitmap.setBit(0, 1);
// Make sure we can read the bit back
bool x = bitmap.getBit(0);
BOOST_CHECK_EQUAL(1, x);
}
// Now close and re-open storage
closeStorage();
openStorage(DeviceMode::load);
{
// Verify that we can still read the bit back
SparseBitmap bitmap(segmentAccessor, dirPageId);
bool x = bitmap.getBit(0);
BOOST_CHECK_EQUAL(1, x);
}
}
The directory location is remembered by the dirPageId stack variable across the close/reopen.
openStorage is implemented as follows. We use a simple LinearDeviceSegment since we don't need random allocation (the bitmap only allocates new pages, never freeing existing ones).
void SparseBitmapTest::openStorage(DeviceMode deviceMode)
{
// Create or load a file device
pDevice.reset(
new RandomAccessFileDevice(
"bitmap.dat",
deviceMode,
0));
// Set up the cache
CacheParams cacheParams;
cacheParams.readConfig(configMap);
pCache = Cache::newCache(cacheParams);
pCache->registerDevice(BITMAP_DEVICE_ID, pDevice);
// Map a segment onto the file
pSegmentFactory =
SegmentFactory::newSegmentFactory(configMap, shared_from_this());
LinearDeviceSegmentParams segParams;
CompoundId::setDeviceId(segParams.firstBlockId, BITMAP_DEVICE_ID);
CompoundId::setBlockNum(segParams.firstBlockId, 0);
if (!deviceMode.create) {
segParams.nPagesAllocated = MAXU;
}
pSegment = pSegmentFactory->newLinearDeviceSegment(
pCache,
segParams);
segmentAccessor.pSegment = pSegment;
segmentAccessor.pCacheAccessor = pCache;
}
closeStorage is also called from testCaseTearDown to make sure each test case always cleans up after itself:
void SparseBitmapTest::closeStorage()
{
segmentAccessor.reset();
if (pSegment) {
assert(pSegment.unique());
pSegment.reset();
}
if (pSegmentFactory) {
assert(pSegmentFactory.unique());
pSegmentFactory.reset();
}
if (pCache) {
pCache->unregisterDevice(BITMAP_DEVICE_ID);
assert(pCache.unique());
pCache.reset();
}
if (pDevice) {
assert(pDevice.unique());
pDevice.reset();
}
}
The unique assertions verify that no lingering references remain to objects tracked via shared pointers; this is important for catching leaks and cyclic references.
One other unit test of interest is testSizes:
void SparseBitmapTest::testSizes()
{
openStorage(DeviceMode::createNew);
if (pCache->getPageSize() != 4096) {
// Expected values below are only valid for 4K page size
return;
}
SparseBitmap bitmap(segmentAccessor);
BOOST_CHECK_EQUAL(32640, bitmap.getBitsPerLeaf());
BOOST_CHECK_EQUAL(255, bitmap.getMaxDirectoryEntries());
}
See if you can figure out where those numbers came from based on the default 4K page size.


