FarragoCatalogChangeRules

From Eigenpedia

Jump to: navigation, search

This page defines the rules developers must follow when making changes to the Farrago catalog; adhering to these rules will ensure that existing catalog instances can be upgraded whenever new versions of Farrago-based systems are installed. For background on how an upgrade procedure can be incorporated into an installation program, see FarragoCatalogUpgrade.


Contents

Catalog Contents

The Farrago catalog contains various kinds of metadata:

  • The bootstrap definition of MOF (the generic metamodel in terms of which all other metadata is defined). This is an immutable part of MDR so it never changes. (If we upgrade to later versions of MOF in the future, we will need to review the migration impact carefully.)
  • FEM (the Farrago Extension Metamodel). Developers modify this by editing farrago/catalog/xmi/FarragoExtMetamodel.uml. All changes to FEM must be additive only, as defined in section "Additive Changes Only" in CatalogUml.
  • System-owned bootstrap objects either created by running the main method of class net.sf.farrago.db.FarragoDatabase, or by running the UDR, sys_boot.sys_boot.update_system_objects(), which is called by the script farrago/initsql/createStorageWrappers.sql. These objects are those which cannot be created via catalog initialization SQL scripts (either because they have no corresponding DDL command, e.g. type descriptors, or because they are required to exist before those scripts can run, e.g. the sa user). The UDR is used to create type descriptors. It handles upgrades so only types that do not already exist in the catalogs are created when the UDR is run on an existing database.
  • System-owned objects created by running catalog initialization scripts. Developers make changes by editing those initialization scripts. Such changes are the subject of the rest of this page.
  • User-defined objects. These may depend on system-defined objects in arbitrary ways, so the purpose of many of the change rules is to avoid accidentally causing (a) failed upgrade due to dependencies from user-defined objects or (b) silently broken or invalidated user-defined object definitions resulting from a seemingly successful upgrade.
  • System parameters are defined in FarragoExtMetamodel.uml in the FEM Config package. Normally, default values for the parameters are specified in farrago/catalog/xmi/FarragoConfTemplate.xmi for newly created catalogs. In order to upgrade existing catalogs with new system parameter values, the parameters need to be specified with a multiplicity of 0..1 in the UML model to account for cases where the parameters don't yet exist. In that case, the parameter values are set by sys_boot.sys_boot.update_configuration(), which is called by the script farrago/initsql/createStorageWrappers.sql.

Extension projects are likely to have their own counterparts to some of the above, for which similar rules must be observed.


Regressions

We would eventually like to create a Farrago-level backwards-compatibility regression test, as described in FarragoCatalogUpgrade. This would catch breaking changes as far upstream from packaged releases as possible. Since this test does not yet exist, it is currently the responsibility of anyone maintaining an extension project installation program to test for regressions within that downstream context. When a regression is detected, the offending developer should be notified so that he or she can help fix the broken initialization script, restoring backwards compatibility.


Relevant Initialization SQL Scripts

Change rules apply to the following scripts in farrago/initsql:

  • createStorageWrappers.sql
  • createSqljSchema.sql
  • createJdbcViews.sql
  • createMgmtViews.sql

Note that script createSalesSchema.sql is not subject to change rules because it is not expected to be run by catalog initialization procedures for installable systems (its only purpose is to create global fixtures used in Farrago unit tests).

New scripts may be added over time; the list above should be changed accordingly.

Change rules also apply to the following scripts in luciddb/initsql (used in intializing the catalog for extension project LucidDB):

  • installApplib.sql
  • createDbaViews.sql
  • createTestHarness.sql
  • createForeignWrappers.sql



Change Rules For Initialization SQL Scripts

The subsections below state the rules which must be followed by all developers.


Scripts must be idempotent

It is sometimes necessary for a catalog upgrade to be re-attempted due to a failure on the first try, so the script must have the same final effect when run through completely from the beginning regardless of whether it has previously been partially executed.


Scripts must not assume anything about the default catalog or schema.

Individual scripts will typically be concatenated into a larger upgrade script as part of installer packaging, so each one must set its own context. This can be done by

  • Setting a catalog explicitly (set catalog 'catalog-name';)
  • Setting a schema explicitly (set schema 'catalog-name.schema-name';)
  • or
  • Creating objects with explicit qualification (create or replace function catalog-name.schema-name.function-name ...;)



Creation and call of update_system_objects must be the very first actions in createStorageWrappers.sql.

The implementation of this procedure must effectively follow the other rules listed here (as if it were a script).



Scripts must not create new schemas within default catalog LOCALDB

This catalog is intended to remain clean as the container for user-defined schemas (plus the standard schemas defined by SQL:2003, SQLJ and INFORMATION_SCHEMA).


Instead, create new system-owned schemas within catalog SYS_BOOT. However, before doing so, it's a good idea to explain why on the farrago-dev@eigenbase.org mailing list and make sure everyone agrees. It might be more appropriate to add your new objects directly to one of the existing schemas under SYS_BOOT:

  • JDBC_METADATA: contains metadata views and supporting objects needed to implement the JDBC metadata API
  • MGMT: contains system-management views and routines


(Note that catalog initialization scripts for extension projects may choose to define additional user-visible schemas within catalog LOCALDB. For example, LucidDB defines schema SYS_ROOT as a container for its out-of-the-box DBA views and procedures.)



Object creation DDL must use CREATE OR REPLACE

This allows an existing definition of an object to be overwritten.


Objects must not be dropped

This could break user-defined objects. Eventually, we may come up with a protocol for officially deprecating and eventually dropping system-defined objects, but for now, just leave old object definitions behind. Add comments (both in the SQL script and via the DESCRIPTION clause for objects that support it) to indicate that they are deprecated.

For Java-based objects such as external routines and SQL/MED data wrappers, be careful not to delete entry points from the jar if existing system objects depend on those entry points.


Objects must not be renamed

Farrago supports create or replace rename to new_name object_type old_name functionality, but this must not be executed by catalog initialization scripts, since users may maintain scripts which depend by name on the existence of system objects. Plus, this would make the script non-idempotent. Instead, create a new object and leave the existing one alone. Examples are jars, wrappers, servers, schemas, views, tables, columns, and routines.


Existing associations cannot be made more restrictive

If an existing association is defined as 0..* (i.e., 1 to many), it cannot be modified to a 0..1 (i.e., 1-1) association.

The reverse is acceptable. An existing 1-1 association can be changed to a 1 to many association. However, if there are any existing system views that reference the attribute corresponding to the parent end of the association, (which is valid because of the previous 1-1 association), then that reference becomes invalid with the less restrictive 1 to many association. The view definitions will need to be modified to take this into account. Moreover, migration from earlier catalog versions will need to be addressed because the older catalogs will still have the original, now invalid, view definitions.

Szuercher 15:56, 3 November 2008 (PST): This doesn't automatically work in Enki/Hibernate repositories. It would require intervention during upgrade to move the association data across repository storage tables. In addition, Enki/Hibernate can't handle switching associations between ordered/unordered/high cardinality without intervention.


Scripts must not create tables

Physical tables are problematic for a number of reasons. Updating their contents in an idempotent fashion is difficult. And different extension projects may use different physical table storage schemes. So, create a view with a multi-row VALUES clause instead. (See view table_types_view_internal in createJdbcViews.sql for an example.)


Existing columns in views must not be changed

Add new columns at the end, and do not delete or reorder existing columns.
(Technically, adding a new column could break a user script which uses select *, in that it will now return unexpected information, but avoiding this would be unnecessarily restrictive. Users who desire isolation from such changes should create a view and select from that in their script instead.)


Never change the datatype family of a view column. Narrowing precision or scale within a family (e.g. from INTEGER down to SMALLINT) may be OK, but avoid it if you can.


Routines require case-by-case consideration

If there is a chance that changing the semantics of an existing routine could break applications which call it, then make the change in additive fashion instead. (The exception is when the change is a bug fix; but even here, if it is likely that applications have started accidentally relying on the buggy behavior, it may be better to make the change additive.)

For routines, an additive change can mean either creating an entirely new routine name, or overloading an existing routine name. Overloading is only OK if the change is based on a new parameter signature which cannot possibly introduce ambiguity with respect to existing invocation scripts. According to SQL:2003, procedures can only be overloaded on number of parameters, whereas functions can be overloaded on both number and type.

Be especially careful with changing the semantics of an existing UDX, since its output is an entire table. Follow the rules given above for "existing column ordering in views must not be changed", treating the output as if it were a view. But in addition, treat adding a new column to the output as a change in semantics (since for a UDX, invocation scripts often need to rely on select * as the number of columns produced may be dynamic based on input). In other words, if such a change cannot be achieved via a new overload, come up with a new UDX name instead.

Never change the specific name of an existing routine overload, and never change its parameter signature. Overloading is very sensitive to parameter types, even within datatype families. However, widening precision or scale may be OK.

For the return type of a function, follow the rules given above for the datatype of a view column.


SQL/MED options require case-by-case consideration

The concerns here are similar to routines; changing the options might be a bugfix, or it might be a semantic change.


Scripts must assume default Farrago CREATE OR REPLACE behavior

By default, Farrago processes CREATE OR REPLACE as follows:

  1. Validate the new object definition and save it (without committing it)
  2. Revalidate all existing objects which depend on the modified object; revalidation verifies that the new objects are still valid with respect to the new object definition
  3. If any validation fails, rollback the CREATE OR REPLACE (causing the script to fail); otherwise, commit

Extension projects may support less restrictive behavior, e.g. allowing dependents to become temporarily invalid, and then revalidating them when the script executes CREATE OR REPLACE for them too. However, Farrago catalog initialization scripts must not rely on such behavior. (Catalog initialization scripts which are private to an extension project are free to make whatever assumptions they want.)

Here's an example. (This happens to break another rule above, namely modifying the existing columns of a view.) Suppose the old version of the catalog script had

create or replace view v1 as select a,b from t;
create or replace view v2 as select a from v1 where b='X';


The new version has

create or replace view v1 as select a,c from t;
create or replace view v2 as select a from v1 where c='X';


The first statement will break with the default Farrago behavior. Farrago will attempt to revalidate the old definition of v2, and that revalidation will fail because column b no longer exists in the new version of v1. (With the less restrictive behavior described above, the first statement would succeed, leaving v2 invalid, and then the second statement would revalidate v2, repairing it.) Don't do this.



Scripts may not assume that the objects they create are private

Even if a script does not grant any access to users or roles, it is not OK to assume that the objects created are private and therefore not subject to dependency constraints from user-defined objects as described above. Extension projects may create system-level dependencies on such objects; system administrators may do likewise.


Developers must be forward-looking when making catalog changes

Given all of these restrictions, it is very easy to quickly accumulate catalog flotsam and jetsam which will be difficult or impossible to eliminate. So, think long and hard before adding new objects; that way you won't have to change them again later. Seek review. Broadcast your proposed changes to farrago-dev.


TBD

  • rules for changing granted access
Personal tools