FarragoDml

From Eigenpedia

Jump to: navigation, search

This page describes the representation for DML operators in Farrago throughout query preparation.


Contents

Parsing

The JavaCC parser definition in farrago/src/org/eigenbase/sql/parser/CommonParser.jj defines the grammar for DML statements alongside of SQL queries since many of the elements (e.g. WHERE clause) are common. The parser result is in the form of the org.eigenbase.sql.SqlNode object model, with DML subclasses as follows:

  • SqlInsert: operands are target table, source query, and target column list
  • SqlUpdate: operands are target table identifier (with optional alias), source expressions, target column list, and search condition
  • SqlDelete: operands are target table identifier (with optional alias) and search condition

org.eigenbase.sql.fun.SqlStdOperatorTable has corresponding operator definitions insertOperator, updateOperator, and deleteOperator.

The class definitions for SqlUpdate and SqlDelete have an additional operand slot for "source query". This is initialized as null by the parser, then filled in by the validator, as we'll see in the next section.


Validation

The first step in SQL statement validation (org.eigenbase.sql.validate.SqlValidatorImpl.performUnconditionalRewrites) edits the SqlNode tree to normalize it into a form which makes the rest of validation more uniform. In particular, this is where the source query operand gets filled in for SqlUpdate and SqlDelete:

  • For an SqlDelete representing DELETE FROM T WHERE x=2, the source query is of the form SELECT * FROM T WHERE x=2
  • For an SqlUpdate representing UPDATE T SET x=y+1, the source query is of the form SELECT *, y+1 FROM T

So, for DELETE, the entire "old row" is selected. For UPDATE, the "old row" is concatenated with the new expressions. This provides the maximum possible information for physical implementations. For example, a transactional implemenation of DELETE (e.g. FTRS) needs to log each old row image to make it available during rollback; index key columns are also needed during index update.

After this comes the real validation (methods validateInsert, validateUpdate, and validateDelete). In all cases, the source query is validated by itself; additionally, for INSERT and UPDATE, the types of the target columns are used to infer types for any null values and dynamic parameters in the source query. For example, with INSERT INTO T(x,y) VALUES(?, null), the type for the dynamic parameter is inferred from the type of column x, and the type for the null value is inferred from the type of column y.

After the source query has been validated, additional checks are performed:

  • The existence of the target table is checked; for INSERT and UPDATE, this is also used to obtain the target row type, and to validate the names of the columns in the target column list (if any).
  • For INSERT and UPDATE, assignment compatibility is checked between the fields of the source row type and the target column types.
  • For INSERT, the number of source and target columns is compared. By default, the validator requires these quantities to be equal. However, validator subclasses may override this behavior to allow for the presence of //pseudocolumns// such as ROWID.
  • For all operators, access rights are checked (method validateAccess).

Translation to Relational Algebra

Translation is accomplished by methods convertInsert, convertUpdate, and convertDelete in org.eigenbase.sql2rel.SqlToRelConverter.

In the world of RelNodes, all DML operators are represented abstractly as instances of TableModificationRel, along with an attribute specifying the operator type (INSERT, UPDATE, or DELETE). TableModificationRel takes a single child. For DELETE and UPDATE, this child is simply the translation of the source select into relational algebra. For INSERT, it's a little more complicated (method convertColumnList):

  • If no target column list is specified, the child is simply the translation of the source select
  • Otherwise, the child is constructed by combining source expressions with default values for unreferenced columns, and reordering them to match the target table column definition order. For example, for a table T(A,B,C), and statement INSERT INTO T(B) SELECT X FROM S, the child is the translation of SELECT defaultValue(A), X, defaultValue(C) FROM S.
  • In either case, the child is interpreted as a full "new row" matching the table definition.

Optimization and Physical Implementation

From here, it is impossible to say much more about DML query processing, since the details are dependent on the local data wrapper used to implement the target table. The wrapper must supply an optimizer rule for converting TableModificationRel into a physical RelNode (e.g. FtrsTableModificationRule produces FtrsTableModificationRel, which is capable of generating ExecStream definitions for all of the DML operators).

In addition, the optimizer rule may be able to discard unnecessary information. For example, a local data wrapper supporting only a primary key index and no secondary indexes might be able to project the child of DELETE down to just the primary key columns, ignoring the other fields because they are irrelevant for finding and deleting matching rows.

Personal tools