FarragoSessionFramework

From Eigenpedia

Jump to: navigation, search

under construction

Contents

Introduction

This page documents the session framework in Farrago. This topic sprawls over a number of related areas such as session management, threading, transactions, locking, and statement lifecycle. It is intended to provide conceptual help to anyone attempting to debug or enhance these areas.

Java Packages

The diagram below shows the Farrago Java packages directly involved in session management:

Image:SessionPackages.png

  • Package net.sf.farrago.session defines all session framework interfaces (both API and SPI).
  • Package net.sf.farrago.db defines default implementations for most of the session framework interfaces.
  • Package net.sf.farrago.ddl defines default implementations for session interfaces related to DDL.
  • Package net.sf.farrago.query defines default implementations for session interfaces related to query processing.
  • Package net.sf.farrago.runtime defines default implementations for session interfaces related to statement runtime support.
  • Package net.sf.farrago.jdbc.engine implements the Farrago JDBC driver by invoking session API methods.

Session Containment

The next diagram illustrates how sessions are managed within the scope of a database instance; it also shows some of the important objects which exist within the scope of a session.

Image:SessionAssociations.png

  • A session is owned by the singleton FarragoDatabase object.
  • A session may own zero or more statement contexts. A statement context is a context in which statements (typically SQL) can be prepared and executed. In JDBC terms, a statement context corresponds to either a PreparedStatement (usable for only a single SQL statement which may be executed over and over) or a Statement (reusable for a series of different SQL statements).
  • While a non-DDL statement is being prepared (whether explicitly via creation of a PreparedStatement, or implicitly via a Statement execution), the process is managed by a preparing statement object. This object only exists for the duration of the preparation process, after which it is discarded.
  • While a non-DDL statement is being executed, the execution state is tracked by a runtime context object. This object is discarded once execution completes. If another execution later takes place in the same statement context (whether of the same statement or of a different statement), a new runtime context is allocated.
  • Execution of a non-DDL statement also brings into existence a JDBC ResultSet. For a DML statement, this only exists for the lifetime of the execute call (the rowcount is read from it); for a query, it stays in existence across the entire execute/fetch sequence until closed.
  • Finally, execution of a non-DDL statement also associates an executable statement with the statement context. Unlike the other objects, the executable statement is not owned by the statement context; it is a reusable object pinned from the global code cache, and may be shared by other concurrently executing statement contexts. The association with the executable statement ends once execution completes. Note that the preparation process results in a new executable statement being created and added to the global code cache (unless a matching executable statement was already available in the cache, in which case most of preparation can be skipped).

Later on in this document, we will also discuss reentrant sessions, which are associated with other sessions rather than directly with the database.

JDBC Associations

The next diagram elaborates on the associations between session objects and their JDBC counterparts:

Image:SessionJdbcAssociations.png

  • The JDBC counterpart to a session is a FarragoJdbcEngineConnection. Note that the association is a one-way layering; the connection knows about the session, but not vice versa.
  • The JDBC counterpart to a statement context is a FarragoJdbcEngineStatement. Again, the association is a one-way layering; the statement knows about the statement context, but not vice versa.
  • Each statement knows which connection it is associated with, but connections do not track ownership of their statements (even though the session/context ownership is tracked bidirectionally at the layer below).
  • Prepared statements are subclassed from FarragoJdbcEngineStatement, with specializations for DDL and non-DDL.

Threading Model

The session framework threading is passive: no threads are created for session management purposes--instead, externally created threads call in through JDBC or other API's. (Threads may be created below the session layer as part of statement execution, e.g. for a UDX invocation or for a parallel query executor.)

The source of external threads depends on how Farrago is deployed. When running as an embedded engine (similar to hsqldb or Derby), threads are part of the calling application. When running as a network server, threads are typically managed in a pool by the network listener (e.g. by RMI). --Jvs 19:29, 18 November 2008 (EST): Does VJDBC follow the rules below, or does it depend how the calling application uses the JDBC API?

Regardless of how calling threads originate, they must follow these rules:

  • Two threads T1 and T2 are allowed to call Farrago via different sessions (e.g. via different JDBC connections) simultaneously, and they will be serviced concurrently.
  • Two threads T1 and T2 are not allowed to call Farrago on the same session simultaneously. Farrago does nothing to prevent this, so making this mistake will lead to (very bad) undefined behavior. For threading purposes, all sub-objects created directly or indirectly via a session (see diagram above) should be considered to be part of that session. For example, at the JDBC level, calling on a ResultSet and the Connection it came from at the same time is incorrect.
  • There is one exception to the previous rule, which is asynchronous cancel. A cancel call may be made from any thread at any time.

Synchronization

Synchronization is required in order to support the following use cases:

  • concurrent creation/destruction of sessions within the same database
  • management view enumeration of existing sessions and statements and their activities
  • safe kill of an executing statement
  • safe kill of a session with executing statements
  • safe shutdown of a database with active sessions

Accordingly, all session framework code adheres to the following rules (most of the relevant code is in package net.sf.farrago.db):

  • Currently, at most one database instance is supported per JVM. This is a singleton instance of net.sf.farrago.db.FarragoDatabase, managed via net.sf.farrago.db.FarragoDbSingleton. Database level synchronization is done via the mutex at the FarragoDbSingleton class level (not on the singleton object instance). This level of synchronization prevents conflicts across
    • database startup
    • database shutdown
    • database backup (at most one of these can be taking place at a time)
    • session creation
    • session destruction
    • session enumeration
  • All synchronization at the session level takes place on the session object itself (not on statement contexts nor any of their component objects). Most public methods for FarragoDbSession are marked as synchronized, whereas most public methods for FarragoDbStmtContextBase, and FarragoDbStmtContext wrap a synchronized(session) { ... } block around their bodies. This level of synchronization prevents conflicts between session activity and the various kill/shutdown scenarios.
    • Note that preventing concurrent API calls on the same session from the application level is not the purpose of this synchronization. As noted above, such calls are likely to lead to undefined behavior.
  • Whenever an operation needs to acquire both the database-level mutex and the mutex for a particular session, it must take the database-level mutex first and the session second. This mutex acquisition ordering guarantees that no deadlocks will occur.
  • Any globally shared resources (e.g. the code cache) must be protected by their own synchronization so that threads from different sessions can access them concurrently.

Note that there is no synchronization at all at the JDBC level; this layer is designed to be thin enough that it can rely on the underlying session synchronization for all operations.

Personalities

As explained in PersonalityFeatureFramework, Farrago session behavior is influenced by the personality set for the session (an instance of FarragoSessionPersonality). Each session actually has two associated personalities (which may be the same): a default personality (set by the session factory when the session is initialized) and a current personality (changed via ALTER SESSION IMPLEMENTATION SET JAR ...). The ALTER SESSION IMPLEMENTATION SET DEFAULT statement reverts the current session to become the default session again. When a new personality is created, it is given a reference to the default personality so that it can delegate behavior aspects which do not need to be overridden.

Transactions

In Farrago, a transaction is associated with a session. Transactions cannot be shared across sessions, and a single session can have at most one transaction active at a time.

There is no association between transactions and threads; a transaction may be started on one thread, execute statements on a different thread, and commit on yet another thread; as long as all of this takes place on the same session, and multiple threads are not calling onto the session concurrently, all is well. This is important since network listener thread pools typically do not guarantee any particular binding between threads and call sequences.

A Farrago transaction spans two separate persistent resource managers:

  • Fennel data store: this is tracked via an instance of FennelTxnContext, kept as a data member in FarragoDbSession.
  • Catalog repository: this is tracked via instances of FarragoReposTxnContext; these are typically managed on the stack of particular operations (rather than as a data member in FarragoDbSession).

There are currently some open issues in this area:

  • There is no distributed transaction coordination across the two resource managers (no two-phase commit). This means, for example, that for a DDL statement such as CREATE TABLE, a crash may leave the catalog out of sync with the Fennel data store after recovery (e.g. a table definition referencing an unallocated index root page).
  • Managing the two transaction contexts independently makes some code difficult to follow; even without distributed transactions, it would help if everything were managed from a single transaction context object.
  • Eventually, distributed transaction coordination may also need to span SQL/MED foreign servers. (With Enki enabled, this is already applicable to catalog views, since these are implemented in terms of SQL/MED queries against the DBMS underlying Hibernate.)
  • The effect on transaction state is not currently well defined if an exception is encountered during commit or rollback.
  • --Jvs 20:53, 18 November 2008 (EST): TBD: issues with reentrant sessions, plus need to unify FennelTxnContext with TxnIdRef

Savepoints

For personalities which enable them, Farrago transactions support savepoints. At the Farrago level, a session maintains a list of FarragoDbSavepoint objects, each with

  • an internal savepoint ID unique within the session
  • savepoint name
  • handle to a Fennel-level savepoint (FennelSvptHandle), which keeps track of the necessary log state

Savepoint control is exposed through JDBC API's as well as via SQL statements.

Session Variables

Each session has associated with it a set of state variables, represented as an instance of FarragoSessionVariables. Some of these variables (such as the current user name) are standard as defined by SQL:2003, while others are Farrago-specific, and still others are personality-specific. Variables common across all personalities are usually modeled as data members of FarragoSessionVariables, whereas personality-specific variables are managed in a name/value pair map for extensibility.

Reentrant Sessions

The framework supports a concept known as reentrant sessions. A reentrant session may be created as part of the implementation of processing some SQL statement being executed on a user-level session. For example, the CREATE VIEW AS SELECT ... statement needs to prepare the SELECT statement in order to validate the view definition; this is done using a reentrant session, which has two benefits:

  • any resources allocated are associated with the reentrant session (rather than the user-level session), so they can be freed together at the end of statement processing
  • any changes to session state (e.g. changing the current schema) take place on the reentrant session, so they do not affect the state of the user-level session

When a reentrant session is created, all of the existing state of the user-level session is cloned, so the reentrant session inherits variable values such as the current schema.

Reentrant sessions are not added to the global session list; it is the responsibility of the statement which creates the reentrant session to close it explicitly as part of the statement's own cleanup. This also means that reentrant sessions are not visible via management views.

When a transaction is already in progress on a session, a new reentrant session will participate in the same transaction (regardless of whether the user-level session is in autocommit mode). Otherwise, the reentrant session inherits the autocommit setting of the user-level session.

--Jvs 01:57, 16 January 2009 (EST): need to explain how abort requests get propagated to reentrant sessions

Reentrant sessions are currently used for the following purposes:

  • queries and DML executed internally by statements such as ALTER SYSTEM, ALTER TABLE, ANALYZE TABLE, and CREATE INDEX (when rows already exist)
  • loopback JDBC connections created by user-defined routine code (via URL jdbc:default:connection); see eigenjira:FRG-249 for a related threading issue
  • validation of default value definitions
  • validation of view definitions
  • validation of routine definitions
  • reentrant statements used by the optimizer, e.g. for reduction of constant expressions and uncorrelated subqueries; see FarragoReentrantStmt

Statement Lifecycle

A statement context has the following lifecycle:

Image:SessionStatementLifecycle.png

  • After a new context is constructed, it is initially in the unprepared state (with no associated SQL statement).
  • Before a SQL statement can be executed, it must first be prepared. This is true regardless of whether a real prepare was requested via JDBC vs an implicit prepare for direct one-shot execution.
    • A special case is made for executing a DDL statement; in this case, the preparation request also executes the DDL statement and then immediately returns the context to the unprepared state. The reason is that DDL statements are interpreted (they do not have compiled execution plans associated with them).
  • Once prepared, a statement can be executed.
    • If it is a DML statement, then execution takes place entirely within the execute method; a cursor (ResultSet instance) is used internally for fetching the rowcount, but this cursor is not visible to the caller.
    • For a query, execution is split across the initial execute method and each next call on the cursor.
  • During execution, a cancel request cannot be processed immediately if a fetch is taking place; in that case, the context enters the cancel-pending state until the executing code detects the cancel request and aborts what it is doing. A cancel on an executing DML statement works the same way.
  • Before a context can be closed, it must first be returned to the unprepared state, canceling any in-progress execution and releasing all associated resources

Note that the lifecycle above is from the point of view of the statement context; the JDBC statement which wraps it may cycle through several states in a single method. For example, FarragoJdbcEngineStatement.executeUpdate(String sql) calls prepare, execute, and unprepare on the underlying statement context in order to implement direct execution.

--Jvs 01:54, 17 January 2009 (EST): explain special case for DDL cancelation

Attachments

Personal tools