FarragoTableRebuild

From Eigenpedia

Jump to: navigation, search

Contents

Overview

This document describes the Farrago command to rebuild tables stored on disk. Like many database systems, Farrago (and LucidDb) store data onto disk using the B-tree data structure. The structure is designed to store data compactly. However, as rows are deleted, "holes" appear. Over time, data becomes fragmented. The rebuild table command provides a way of repacking a table's data. It is especially useful for systems in which update is implemented with delete and insert.

Syntax

ALTER TABLE <qualified-table-name> REBUILD

General approach

Farrago (and LucidDb) tables are stored as a collection of indexes. For a given table, the catalog keeps a "table definition" and a collection of "index definitions". Rebuilding a table, means rebuilding the indexes.

Image:FarragoTableRebuildOverview.gif

At a high level, the three steps include

  1. creating new btrees
  2. copying data from the old btrees to the new ones
  3. dropping the old btrees, and updating references in the catalog

Care should be taken to ensure that all the steps take place within the same transaction. If there is an error, new indexes should be cleaned up, and the catalog should be reverted.

Implementation

For simplicity, step (2) will be implemented as a SQL statement. To avoid modifying the SQL parser, a simple statement will be used:

insert into T select * from T;

The second "T", used in the select, reflect the old btrees. The first "T", the insert target, refers to the new btrees.

JVS: this is a first cut; at a minimum, we need to make sure this doesn't result in extra buffering (since the optimizer will think source and target structures are the same when they're not). Nice-to-have would be for the reentrant SQL statement itself to reflect the two roles, e.g. using the SQL:2003 trigger transition table syntax: insert into NEW TABLE select * from OLD TABLE (we wouldn't mention T at all, and the reentrant validator would be set up to map those). But that should probably wait until we actually implement triggers.

To make this work, the FarragoSessionIndexMap interface will be updated so that it can return index root pages based on the usage pattern (read or write).

 /**
 * Gets the root PageId of an index.
 *
 * @param index the index of interest
 * @param write whether to get a page root to be used for reading or
 * for writing. A page root to be used for reading may reflect the state
 * of the index before modification. A page root to be used for writing
 * reflects the state of the index during or after modification.
 *
 * @return root PageId as a long
 */
 public long getIndexRoot(FemLocalIndex index, boolean write);

The base implementation, FarragoDbSessionIndexMap, returns the same value for either usage pattern. A new implementation, FarragoRebuildIndexMap, wraps a basic session map and returns special values for the write pattern.

For the unusual insert statement, a rebuild command allows "T" to refer to two sets of indexes by overriding the default index map with a rebuild-specific map.

Personal tools