FarragoMedJdbcPlugin

From Eigenpedia

Jump to: navigation, search

Contents

Farrago SQL/MED Plugin for JDBC

Farrago provides a plugin implementing SQL/MED access to any foreign DBMS for which a JDBC driver is available.

Farrago supports the SQL standard three-level qualifier scheme for the names of tables: catalog.schema.table. When using the SQL/MED plugin for JDBC to make foreign data namespaces accessible from Farrago SQL queries, it is important to keep a number of points in mind:

  • Different JDBC drivers support different qualifier schemes. Some support catalogs and schemas; others only support schemas; and still others support neither catalogs nor schemas. Default values may or may not be in effect depending on the server connection URL. For this reason, the user must be careful when defining the foreign server in Farrago to get the desired mapping.
  • When referencing foreign tables directly via a virtual catalog (without importing their metadata into a local schema), use the foreign schema and table name. In this case, the catalog name will be the name given the foreign server via CREATE SERVER.
  • When referencing foreign table via locally defined or imported metadata, use the local schema and table name. In this case, the catalog name will be LOCALDB.

JDBC Foreign Data Wrapper Definition

Because this plugin is included as part of the standard Farrago distribution, a corresponding foreign data wrapper instance named SYS_JDBC is predefined by Farrago's initialization scripts. Normally, there is no need to define additional instances.

However, it may be useful to define additional wrapper instances corresponding to specific DBMS types. For the JDBC plugin, all SQL/MED options declared on the wrapper are propagated to the server, allowing common option settings to be factored out of individual server definitions. (Note that this is not necessarily true for other plugins.) Option settings on a server always take precedence over settings inherited from a wrapper.

JDBC Foreign Server Definition

In order to access a particular JDBC data source, a corresponding foreign server must be defined. The following server-level options can be specified:

Name Description
JNDI_NAME Name of the JNDI resource that contains a javax.sql.DataSource suitable for use with the SQL/MED JDBC foreign data wrapper. The DataSource must not block when connections are requested. This option is mutually exclusive with the existing DRIVER_CLASS, URL, USER_NAME, PASSWORD, and EXTENDED_OPTIONS parameters. The following parameters are ignored if JNDI_NAME is specified: LOGIN_TIMEOUT, VALIDATION_QUERY. No default value.
DRIVER_CLASS Fully qualified name of the JDBC driver class to load; this must be available on the classpath. Currently this requires either changing the classpath used to invoke Farrago, or rolling the JDBC driver classes into a new plugin jar combined with the default Farrago plugin contents. (We plan to eventually support the SQL:2003 jar dependencies feature to eliminate this repackaging burden.)
URL JDBC URL specifying the data source to access
USER_NAME User name for login to foreign server
PASSWORD Password for login to foreign server
USE_SCHEMA_NAME_AS_FOREIGN_QUALIFIER If set to 'TRUE', then for tables without a table-level SCHEMA_NAME specified, the server-level SCHEMA_NAME is used. The default is 'FALSE'.
SCHEMA_NAME If USE_SCHEMA_NAME_AS_FOREIGN_QUALIFIER is set to 'FALSE':
Artificial foreign schema name. This option is only required for servers which don't support schemas. When specified, the plugin will accept this schema name as a foreign qualifier when importing foreign tables or accessing them directly, but will omit all schema references when sending SQL to the foreign server. For servers that support schemas, this option should never be specified; reference the real foreign schema names instead.
If USE_SCHEMA_NAME_AS_FOREIGN_QUALIFIER is set to 'TRUE':
Server must support schemas. This SCHEMA_NAME will be used if the table-level SCHEMA_NAME is not set.
SCHEMA_MAPPING Semicolon-separated list of schema_name:mapped_schema_name pairs. This option is only valid for servers which support schemas. This allows for a single schema view of tables spanning multiple schemas. If set, TABLE_MAPPING and TABLE_PREFIX_MAPPING must not be set. A schema or table name may be quoted by the double quote character. The double quote character can be used in a name, if escaped by itself (the double quote character). A single quote character can be used, if escaped by itself (the single quote character).
TABLE_MAPPING Semicolon-separated list of source_schema.source_table:mapped_schema.mapped_table pairs. This option is only valid for servers which support schemas. This allows for a single schema view of the specified tables spanning multiple schemas. If set, SCHEMA_MAPPING and TABLE_PREFIX_MAPPING must not be set. A schema or table name may be quoted by the double quote character. The double quote character can be used in a name, if escaped by itself (the double quote character). A single quote character can be used, if escaped by itself (the single quote character).
TABLE_PREFIX_MAPPING Semicolon-separated list of source_schema.source_table_prefix:mapped_schema.mapped_table_prefix pairs. This allows multiple source tables, with a common prefix, to be mapped into a specific target schema with a different prefix. If set, SCHEMA_MAPPING and TABLE_MAPPING must not be set. If artificial foreign schema names are in use, the source schema name is ignored. A schema or table name may be quoted by the double quote character. The double quote character can be used in a name, if escaped by itself (the double quote character). A single quote character can be used, if escaped by itself (the single quote character). If the last character of either the source or target table prefix is a percent sign (%), it is interpreted as a wild card and stripped off. This allows a prefix to be appended to or removed from all tables (e.g., mapping definitions like 'SRC_SCHEMA.%:TGT_SCHEMA.PREFIX_%' or 'SRC_SCHEMA.PREFIX_%:TGT_SCHEMA.%'). As a consequence, if a prefix ends in % it must be doubled.
QUALIFYING_CATALOG_NAME Foreign catalog name to which this server should correspond. This option should only be used for servers which support multiple catalogs. Normally, a default catalog is always defined for a connection, so this option isn't usually required. To access multiple catalogs from the same physical server, create multiple logical servers, each with a different value for this option.
TABLE_TYPES Comma-separated list of table types used to restrict metadata results for operations such as IMPORT FOREIGN SCHEMA. For example, some servers mix system tables into the same schema with user tables and views. To hide them, specify a value like 'TABLE,VIEW' for this option. To retrieve only tables and system tables, but no views, use 'TABLE,SYSTEM TABLE'. In general, table type names may vary across JDBC drivers, so check your driver's documentation or call your driver's DatabaseMetaData.getTableTypes() method to list the available types. This option is never required; if not specified, metadata for all table types is returned. No spaces are allowed in the list unless they make up part of the type name, e.g. 'SYSTEM TABLE'.
EXTENDED_OPTIONS If set to 'TRUE', unrecognized options are passed through to the JDBC driver as properties; otherwise, they are ignored. The default is 'FALSE'.
TYPE_SUBSTITUTION If set to 'TRUE' (the default), datatypes which are not supported by Farrago are replaced with the best available substitute. If set to 'FALSE', unsupported datatypes result in errors. Note that any TYPE_MAPPING specified is executed before TYPE_SUBSTITUTION is run.
TYPE_MAPPING Semicolon-separated list of orig_datatype:subst_datatype pairs. If set, Farrago will replace the type of foreign columns matching orig_datatype with the corresponding subst_datatype. The format for both datatype strings is DATATYPE_NAME[(PRECISION[,SCALE])] with no spaces allowed. On orig_datatype, precision and scale (if specified) are used to more precisely limit the matching datatype. In other words, "DECIMAL" will match any decimal type, but "DECIMAL(10,2)" will only match that specific precision and scale. If precision and scale are not specified on subst_datatype, the original precision and scale are kept. If the 'subst_datatype' is not a supported Farrago datatype, the mapping is ignored. Note that TYPE_MAPPING is run before TYPE_SUBSTITUTION and does not depend on TYPE_SUBSTITUTION.
LENIENT If set to 'TRUE', match target columns specified in the foreign table definition with data source columns using case-sensitive string comparison on column name. Specified target columns which are missing in the source are filled in with null. Likewise, specified target columns with types that are cast-incompatible from matching source columns are filled in with null. Other matching columns are cast to the target type (either the type specified explicitly, or the type stored in the catalog when the foreign table descriptor is created with implicit column types). The default is 'FALSE', meaning columns are mapped by position rather than name, and mismatches in the mapping may lead to runtime errors.

Lenience makes the foreign table definition resilient to changes in the data source; strictness ensures that data is not accidentally lost or fabricated due to a mismapping.

SKIP_TYPE_CHECK If set to 'TRUE', assume target columns specified in the foreign table definition are compatible with data source columns in name, type and number. The default is 'FALSE'. This setting is ignored when LENIENT is 'TRUE'. The purpose of this setting is to make it possible to skip metadata checks (possibly expensive depending on source DBMS) when referencing the foreign table (or even when defining it if column types are explicitly specified or imported). The downside is that if the source is actually incompatible with the local catalog definition, then errors will not be detected until query execution, and diagnostics may be poor as a result. Not available in versions before v0.9.2
LOGIN_TIMEOUT The maximum time in seconds to wait while attempting to connect to foreign server. Ignored if JNDI_NAME is set.
VALIDATION_QUERY The SQL query to use to validate connections in the JDBC wrapper cache. If not specified, no connection validation will be done. If specified, must be a query (in a dialect understood by the foreign JDBC driver being used) which will return at least one row without error. Ignored if JNDI_NAME is set.
DISABLED_PUSHDOWN_REL_PATTERN A regular expression matching the names of rules or logical relational expressions for which pushdown to data source should be disabled. To enable all pushdown, the value can be left as empty string (the default). To disable all pushdown, the value can be set to a match-anything regular expression ('.*'). To disable a particular expression, the value can be set to the corresponding optimizer RelNode class name, e.g. 'FilterRel', or internal rule description, e.g. '.*on proj'.
ASSUME_PUSHDOWN_VALID A boolean (true/false) expression that controls whether the Wrapper will check the pushdown SQL prior to executing it. Setting it to true will execute the SQL without first preparing and checking the SQL is valid (vie PreparedStatement.getMetadata()). Setting it to false (the default) will check the SQL first, including some very unnatural Oracle hacks since their thin driver does not support ps.getMetadata(). In general, this should be set to false, but if your JDBC driver lacks ps.getMetadata() this might be necessary, e.g. Oracle.
FETCH_SIZE An integer fetch size to set on cursors via Statement.setFetchSize. Default is -1 (don't call setFetchSize at all).
AUTOCOMMIT If set to 'TRUE' (the default), leave connection in autocommit mode. Otherwise, request transaction mode. For read-only usage, this normally doesn't matter, since the wrapper closes cursors as soon as it is done with them, but some JDBC drivers are sensitive (e.g. the PostgreSQL driver, which requires an explicit FETCH_SIZE and non-autocommit to avoid exhausting memory for large result sets).
MAX_IDLE_CONNECTIONS When creating a connection pool implicitly (e.g. JNDI_NAME is not set), this many connections are kept open. If a connection is returned to the pool and this many idle connections are already in the pool, the newly returned connection is immediately closed. Each MedJdbcDataServer instance may hold this many connections open. Ignored if JNDI_NAME is specified. Defaults to 1.
EVICTION_TIMER_PERIOD_MILLIS Controls how frequently idle connections are pruned from the pool. If set to a value N > 0 (zero), idle connections are removed from the pool and closed every N milliseconds. Ignored if JNDI_NAME is specified. Defaults to -1 (disabled).
MIN_EVICTION_IDLE_MILLIS Controls how long a connection must be idle before it is evicted. If EVICTION_TIMER_PERIOD_MILLIS is disabled, this setting has no effect. Otherwise, when the eviction timer fires, idle connections are only closed if they have been idle for at least this many milliseconds. Ignored if JNDI_NAME is specified. Defaults to -1 (disabled).
VALIDATION_TIMING Controls when connections are validated. Multiple values may be specified if separated by commas. The values are case insensitive. Possible values and their meanings: ON_BORROW (validate when connection is borrowed from the connection pool), ON_RETURN (validate when connection is returned to the connection pool), or WHILE_IDLE (validate idle connections when the eviction timer fires and the connection has not been idle long enough to warrant eviction). The WHILE_IDLE value only has effect when both EVICTION_TIMER_PERIOD_MILLIS and MIN_EVICTION_IDLE_MILLIS is positive. This parameter is ignored if JNDI_NAME is specified. This parameter has no effect if VALIDATION_QUERY is not specified. Default value is ON_BORROW.
DISABLE_CONNECTION_POOL Reverts connection management behavior to pre-connection pool semantics. A single connection is re-used for all purposes. Default value is false.

For example:

create server ora_sales_server
foreign data wrapper sys_jdbc
options(
    driver_class 'oracle.jdbc.driver.OracleDriver',
    url 'jdbc:oracle:thin:@localhost:1521:ora1',
    user_name 'scott',
    password 'tiger',
    type_mapping 'DATE:TIMESTAMP;DECIMAL(22,0):DOUBLE',
    validation_query 'select 1 from dual');

(The Oracle JDBC driver incorrectly exposes datetime types without nanoseconds as DATE, and exposes floating-point numerics as DECIMAL(22,0), so the mappings above are usually required.)

The following JDBC API calls are made in order to validate a foreign server definition:

  • DriverManager.getConnection
  • Connection.getMetaData (if supported)
  • Connection.setAutoCommit (only if AUTOCOMMIT is set to false)
  • DatabaseMetaData.XXX (various calls to test for supported features)

JDBC Foreign Table Definition

The only option always required when defining a JDBC foreign table is TABLE_NAME (or OBJECT):

create schema ora_sales;

create foreign table ora_sales.dept
server ora_sales_server
options(table_name 'DEPT');

For servers which support multiple schemas, the SCHEMA_NAME option can be specified at the table level to choose a schema other than the default:

create schema sqlsrvr_sales;

create foreign table sqlsrvr_sales.dept
server sqlsrvr_sales_server
options(schema_name 'DBO', table_name 'DEPT');

The following JDBC API calls are made in order to validate a foreign table definition:

  • Connection.getMetaData
  • Connection.prepareStatement (if supported)
  • PreparedStatement.getMetaData (if supported)
  • PreparedStatement.executeQuery (if supported)
  • Connection.createStatement (only if PreparedStatement not supported)
  • Statement.executeQuery (only if PreparedStatement not supported)
  • ResultSet.getMetaData (only if PreparedStatement not supported)
  • ResultSetMetaData.getColumnCount
  • ResultSetMetaData.getColumnName
  • ResultSetMetaData.getColumnType
  • ResultSetMetaData.getPrecision
  • ResultSetMetaData.getScale
  • ResultSetMetaData.isNullable

Querying JDBC Foreign Tables

Once defined, JDBC foreign tables may be queried just like local tables. The following two queries are equivalent (the first one references the virtual catalog, while the second one references the local metadata copy):

select * from ora_sales_server.scott.dept;

select * from ora_sales.dept;

Currently, relational operators such as filters, projections, aggregations, and joins are not pushed down into the SQL query used to access the foreign server. For example, to implement the following query, all dept rows are fetched from Oracle, and then the filter is applied locally inside of Farrago:

select * from ora_sales.dept
where deptno=10;

True distributed query optimization will be implemented in a future Farrago release.

The following JDBC API calls are made in order to execute a query against a foreign table:

  • Connection.createStatement
  • Statement.executeQuery
  • Statement.setFetchSize (only if FETCH_SIZE is set)
  • ResultSet.getXXX (where XXX depends on datatype of column queried)

By default, connections to JDBC data sources are in autocommit mode, meaning that each foreign query runs in its own read-only transaction which ends automatically when the top-level Farrago query is closed. (If a single Farrago query accesses multiple foreign tables, then multiple foreign queries may be issued, even if all of the foreign tables are from the same data source.) The default isolation level is DBMS-dependent, and so the effect of SQL/MED queries on foreign concurrency control (e.g. read locks) also varies by DBMS.

JDBC Metadata Import

Multiple foreign table definitions can be imported into Farrago automatically with the IMPORT FOREIGN SCHEMA statement:

import foreign schema sales
from server ora_sales_server
into ora_sales;

select * from ora_sales.emp;

The following JDBC API calls are made in order to perform the import:

  • DatabaseMetaData.getTables
  • DatabaseMetaData.getColumns

Updating JDBC Foreign Tables

NOTE: update functionality is not yet implemented; this section describes the planned support.

The Farrago SQL/MED plugin for JDBC will support the standard INSERT/UPDATE/DELETE DML statements with a foreign table as the target, e.g.

insert into ora_sales.dept 
select * from sales.depts;

delete from ora_sales.dept
where deptno=10;

Each row to be modified by the query corresponds to an individual execution of a prepared statement. The existence of a primary key is required for DELETE or UPDATE.

New SQL/MED options govern the transactional characteristics of the DML:

Name Scope Description
DISTRIBUTED_TRANSACTIONS server A setting of 0 (the default) means that when a non-autocommit Farrago transaction accessing this server spans multiple DML statements, all of those statements must target the same server. A setting of 1 requests that non-autocommit Farrago transactions which access this server should be implemented as distributed transactions (this requires all foreign servers involved to support the javax.sql.XAConnection interface).
READ_ONLY server 0 to request DML support. 1 for read-only connection (the default).
ISOLATION_LEVEL server Isolation level to use for all access to this server. If unspecified, default is DBMS-dependent. This setting is ignored for distributed transactions (for which the isolation level is based on the Farrago session).
TRANSACTION_ROW_LIMIT server or table (table overrides server) Number of rows to modify before committing transaction and starting a new one. Default is 0 (only commit once all rows have been processed). This option is ignored when autocommit is disabled in the Farrago session, because then the Farrago-level transaction governs the foreign transaction. This option is also ignored when this wrapper's AUTOCOMMIT option is set to 'TRUE' (the default), because then every update is committed immediately.
BATCH_ROW_LIMIT server or table (table overrides server) Number of rows to send to foreign server per batch. Default is 1.

The following JDBC API calls are made in order to execute DML against a foreign table:

  • Connection.prepareStatement
  • PreparedStatement.setXXX (where XXX depends on datatype of column updated)
  • PreparedStatement.executeUpdate
  • PreparedStatement.addBatch (if BATCH_ROW_LIMIT specified)
  • Statement.executeBatch (if BATCH_ROW_LIMIT specified)
  • Connection.setReadOnly (if supported)
  • Connection.setTransactionIsolation (if ISOLATION_LEVEL specified)
  • Connection.commit (if AUTOCOMMIT set to false)
  • Connection.rollback (if AUTOCOMMIT set to false)
  • DatabaseMetaData.getPrimaryKeys (for UPDATE/DELETE)

Examples

MySQL

Create a remote server connection profile first. This profile will later be used to show foreign tables and interact with the MySQL server.

create server mysql_jbond
foreign data wrapper sys_jdbc
options(
    driver_class 'com.mysql.jdbc.Driver',
    url 'jdbc:mysql://172.27.0.204/sandbox',
    user_name 'jbond',
    password 'MyEyesOnly',
    validation_query 'select 1 from dual');

mysql_jbond - name given to the 'server connection' which is saved in LucidDB.
com.mysql.jdbc.Driver - is a specific java class name of the MySQL JDBC driver.
jbond - user that connects to the remote MySQL server.
MyEyesOnly - password belonging to the user connecting to the remote MySQL server.
select 1 from dual - SQL query that will succeed if a connection is established.


create schema bond_schema;


Next we want to have a place for the foreign tables to show up.
bond_schema - is a schema where Mr. Bond wants his tables to show up from the remote server.

Finally we tell LucidDB to create a table called SOURCE_BUDGET_FINANCIALS in the newly created bond_schema with the structure and data being provided through the connection called mysql_jbond which reaches a MySQL server, digs in a schema called evil_operations with a specific table also called SOURCE_BUDGET_FINANCIALS.

create foreign table bond_schema.SOURCE_BUDGET_FINANCIALS 
server mysql_jbond
options(schema_name 'evil_operations', table_name 'SOURCE_BUDGET_FINANCIALS');

Personal tools
Product Documentation