FarragoIteratorRedesign

From Eigenpedia

Jump to: navigation, search

Contents

Farrago Iterator Redesign Discussion

This page is a work in progress. Please make comments as you see fit.

Rationale

Currently, Farrago uses Java's java.util.Iterator to implement iterator calling convention for Java-based execution objects. Unfortunately, the semantics of query processing mean that there are cases where an execution object does not have enough information to properly indicate whether another row of data is (or will be) available. Therefore implementations of java.util.Iterator.hasNext() must either break the interface's contract and return false before the end of all data or the implementation must block waiting for more data or an end-of-data indication.

As a result, Farrago requires separate threads for Java-based execution objects, which is sometimes undesirable. Requiring a separate Java thread prevents Fennel's scheduler architecture from being used to schedule Java-based execution objects. Our goal is to provide a new iterator-like interface that makes it posssible for Fennel to schedule Java XOs.

Requirements

  1. Replaces java.util.Iterator in generated Java XOs.
  2. Shouldn't require complicated implementations: currently hasNext() often performs the work of next() in order to return the correct result and next() ends up calling hasNext() to insure the next row is in place.
  3. Allows caller to distinguish between row currently available, row not available, and end-of-data.
  4. Allows restart of iterator, which in turn facilitates query re-use.
  5. Clearly describes the lifetime of objects returned by the iterator (e.g., object is re-used)
  6. Provides mechanism for release of resources in an orderly manner. (May be possible to defer.)

Design

To start the ball rolling on an alternate implementation, here's an interface describing a replacement for java.util.Iterator and org.eigenbase.runtime.RestartableIterator. Please add comments. As usual, I'm not wed to the names I've chosen for the interface or its methods.

public interface FarragoTupleIter extends FarragoAllocation
{
    /**
     * NoDataReason provides a reason why no data was returned by a
     * call to {@link #fetchNext()}.
     */
    public enum NoDataReason
    {
        /**
         * End of data.  No more data will be returned unless the
         * iterator is reset by a call to {@link #restart()}.
         */
        END_OF_DATA,

        /**
         * Data underflow. No more data will be returned until the
         * underlying data source provides more input rows.
         */
        UNDERFLOW,
    }

    /**
     * Returns the next element in the iteration.  This method returns
     * the next value in the iteration, if there is one.  If not, it
     * returns a value from the {@link #NoDataReason} enumeration
     * indicating why no data was returned.
     *
     * <p>If this method returns {@link NoDataReason#END_OF_DATA}, no
     * further data will be returned by this iterator unless
     * {@link #restart()} is called.
     *
     * <p>If this method returns {@link NoDataReason#UNDERFLOW}, no
     * data is currently available, but may be come available in the
     * future.  It is possible for consecutive calls to return
     * UNDERFLOW and then END_OF_DATA.
     *
     * <p>The object returned by this method may be re-used for each
     * subsequent call to <code>fetchNext()</code>.  In other words,
     * callers must either make certain that the returned value is no
     * longer needed or is copied before any subsequent calls to
     * <code>fetchNext()</code>.
     *
     * @return the next element in the iteration, or an instance of
     *         {@link NoDataReason}.
     */
    public Object fetchNext();

    /**
     * Restarts this iterator, so that a subsequent call to
     * {@link #fetchNext()} returns the first element in the collection
     * being iterated.
     */
    public void restart();

    /**
     * Closes this iterator, allowing it to release its resources.  No
     * further calls to {@link #fetchNext()} or {@link #restart()} may
     * be made once the iterator is closed.
     *
     * (In practice this abstract method comes from FarragoAllocation,
     * and is only here for illustrative purposes.)
     */
    public void closeAllocation();
}

Implementation Plan

  1. //(done)// p4-integrate current rules to saffron -- don't attempt to fix, jhyde will fix them at some point
  2. //(done)// write new iter api
  3. //(done)// define global constant 'boolean ENABLE_NEW_ITER = false'
  4. //(done)// add code to generate new style of code, protected by 'if (ENABLE_NEW_ITER)'. Stephan and JVS can work on this in parallel: JVS on the MDR rels, Stephan on the rest. Farrago should stay working the whole time.
  5. //(done)// set ENABLE_NEW_ITER=true as soon as the whole of farrago works
  6. //(done//) delete old iterator code after all dependencies are gone

Additionally, JVS says:

> I can take on more than just MDR for #4. One way to split it up is one person handles the "pure Java sources" which can never produce underflow (MDR extent, UDX, MedMock, OneRow, ResultSet). These can all be handled via an adapter. The other person handles calc, union, and Fennel. > (MDR join is not a source so it probably can't use the adapter approach.)

Personal tools