MdrJdbcMaintenance

From Eigenpedia

Jump to: navigation, search

Contents

Background

This page describes some maintenance work undertaken on the MDRJDBC module which stores the Farrago catalog. This layering diagram from the MDR architecture shows how MDRJDBC plugs into MDR; it is the box at the bottom with JDBC in italics:

architecture.gif

By default, Farrago (an MDR client) is configured to use hsqldb as the JDBC storage for its catalog; this HOWTO explains the process for configuring Farrago to use another DBMS such as PostgreSQL instead.

MDRJDBC Overview

See the package documentation.

Source Code

The latest source code for MDRJDBC can be found in //open/mdrjdbc/... in perforce.eigenbase.org. This Sun Public License (SPL) codebase was forked out of its original location in the Netbeans CVS repository for ease of maintenance, and to avoid having to add another license (CDDL) to Farrago's compatibility requirements. The Netbeans team was not actively maintaining it, so there is little chance that further changes will need to be propagated from there.

The buildfile depends on the Eigenbase copies of the MDR binaries under //open/dev/thirdparty (which also contains the updated version of mdrjdbc.jar, inside of mdrextras.tar.gz).

Constraining Factors

JVS has not been successful in building the latest MDR source code, and has not yet been able to resurrect a sandbox with the code as of the last time the binaries checked into Eigenbase were built. For this reason, and the fact that the MDR code is quite intricate, the MDRJDBC maintenance work is constrained to touch only MDRJDBC itself--the calling code from MDR cannot be changed for now.

Performance Problems Addressed

Changes were made to address two serious performance issues with metadata deletion. The net effect was quite dramatic; time for DROP SCHEMA FOODMART (after analyzing to create histograms for all columns) went from ten minutes down to under 30 seconds in one test.

Deletion from Ordered Associations

The first problem has to do with the way ordered associations are represented. Such an association is stored via a table with at least three columns:

  • key (MOFID of object on "one" end of association)
  • value (MOFID of object on "many" end of association)
  • ordinal

For example:

Key Value Ordinal
obj1 obj2 0
obj1 obj3 1
obj4 obj5 0
obj4 obj6 1
obj4 obj7 2

Here, obj1 and obj4 might be tables, while the other objects are columns.

The performance problem arose from the fact that ordinals have to be reassigned whenever objects are removed from the association. For example, if obj5 is deleted, the ordinals for obj6 and obj7 both have to be shifted down by 1. MDRJDBC was implementing this by issuing one UPDATE statement per deletion. This meant that when all child objects were being deleted for a particular parent (e.g. the columns affected by DROP TABLE), the number of row updates was O(n^2). The performance was especially bad for large hsqldb repositories. More serious than the table->column association was the column->histogram association, since Farrago stores 100 histogram bars for each analyzed column.

The solution only deals with the common case where all children are being deleted. It is incredibly ugly due to the requirement to only modify MDRJDBC, without touching MDR itself. To make this work, the deletion code looks up the stack to determine the context in which it is being invoked; if it is being invoked from the MDR code which purges all links from an object being deleted, then it skips the ordinal UPDATE invocations altogether, and also uses a single DELETE instead of one per child. Here it is, from JdbcMultivaluedOrderedIndex.ItemListIterator:

        // implement ListIterator
        public void remove()
        {
            if (lastPos == -1) {
                throw new IllegalStateException();
            }
            // remove from iter first, in case we just did a previous() from end
            iter.remove();
            if (!deletingAllChecked) {
                deletingAllChecked = true;
                Throwable t = new Throwable();
                t.fillInStackTrace();
                StackTraceElement [] st = t.getStackTrace();
                if (st.length > 3) {
                    if (st[3].getMethodName().equals("deleteLinksAndComponents")
                        && st[2].getMethodName().equals("remove")
                        && st[1].getMethodName().equals("remove"))
                    {
                        deletingAll = true;
                        // Delete everything in one SQL call,
                        // and then skip SQL for subsequent
                        // iterations.
                        try {
                            removeImpl(key);
                        } catch (StorageException ex) {
                            throw new RuntimeStorageException(ex);
                        }
                    }
                }
            }
            if (!deletingAll) {
                try {
                    // if at end (!iter.hasNext()), no need to shift ordinals
                    removeImpl(
                        key,lastPos,iter.hasNext());
                } catch (StorageException ex) {
                    throw new RuntimeStorageException(ex);
                }
            }
            lastPos = -1;
        }

Looking up the stack in this way may break if either the JVM or the MDR implementation changes. So besides being ugly, it is also brittle. The breakage effect would be that the optimization would just not kick in; i.e. performance would go back to being bad, but incorrect behavior should not result.

Deletion from Class Extents

One of the tables created by MDR is called ObjectsByClasses. It stores extents; i.e. for each class, a list of all of the objects which directly instantiate that class:

Key Value
Table obj1
Table obj4
Column obj2
Column obj3
Column obj5
Column obj6
Column obj7

MDRJDBC indexing did not cover the value column. What this meant was that for a class with a large number of instances (e.g. Column or HistogramBar), en masse deletion of a large number of them became very slow: O(n^2) in the number of instances.

The solution was to special-case the indexing and search predicate for this table (detected by its reserved name). We can rely on the fact that the same object will never show up more than once in the Value column, since an object only directly instantiates one class. So, the fixed version of MDRJDBC creates a separate UNIQUE constraint on the Value column, and uses only that column during search.

In most cases, there should be no need to do anything special to take advantage of this physical schema change; the FarragoCatalogUpgrade procedure relies on a logical export/import via XMI using a freshly created catalog instance for the import. However, if only mdrjdbc.jar is replaced (against an existing installed catalog), then the new index needs to be created explicitly, e.g.

CREATE UNIQUE INDEX xyz ON "MDR"."ObjectsByClasses"("IDX_MVAL_MOFID");

Syntax may vary by DBMS.

For hsqldb in the default in-memory configuration, you can shut down Farrago, and then in file catalog/FarragoCatalog.script, change the following line:

CREATE MEMORY TABLE "ObjectsByClasses"(IDX_KEY_MOFID BIGINT NOT NULL,IDX_MVAL_MOFID BIGINT NOT NULL,IDX_SUR BIGINT NOT NULL,PRIMARY KEY(IDX_KEY_MOFID,IDX_SUR))

to

CREATE MEMORY TABLE "ObjectsByClasses"(IDX_KEY_MOFID BIGINT NOT NULL,IDX_MVAL_MOFID BIGINT NOT NULL,IDX_SUR BIGINT NOT NULL,PRIMARY KEY(IDX_KEY_MOFID,IDX_SUR),UNIQUE(IDX_MVAL_MOFID))

(all on one line) and then restart.

Personal tools