FarragoUdx
From Eigenpedia
For background on user-defined routines in general, see the UDT/UDR overview.
Contents |
Overview
A UDX is a user-defined transformation which takes normal UDF scalar parameters plus zero or more query expressions as cursor inputs, and produces a well-defined virtual table as output. Currently, we only support UDX's implemented as external methods written in Java. The easiest way to understand them is to see them in action. The example below illustrates a simple UDX with zero relational inputs.
create schema udx; set schema 'udx'; set path 'udx'; create function ramp(n int) returns table(i int) language java parameter style system defined java no sql external name 'class net.sf.farrago.test.FarragoTestUDR.ramp'; select * from table(ramp(5)); +----+ | I | +----+ | 0 | | 1 | | 2 | | 3 | | 4 | +----+
Note the usage of PARAMETER STYLE SYSTEM DEFINED JAVA instead of the usual PARAMETER STYLE JAVA. The reason for this difference is that Farrago uses a non-standard calling convention when invoking the Java UDX from SQL. The SQL:2003 standard calling convention is not supported in this area because it isvery inefficient and difficult to use.
The ramp behavior is implemented by an underlying Java class (normally this would be deployed and loaded as a jar instead of as a direct reference from CREATE FUNCTION to a method on the classpath):
package net.sf.farrago.test;
import java.sql.*;
public abstract class FarragoTestUDR
{
public static void ramp(int n, PreparedStatement resultInserter)
throws SQLException
{
for (int i = 0; i < n; ++i) {
resultInserter.setInt(1, i);
resultInserter.executeUpdate();
}
}
}
The invoked method must take an extra parameter of type java.sql.PreparedStatement. This is used to write the results back to the invoking SQL query plan by setting parameter values corresponding by position to the declared output table structure, and then calling executeUpdate for each row to be written. It effectively implements INSERT INTO RESULT(X,Y, ..., Z) VALUES(?,?, ..., ?) where RESULT represents the output set. In the ramp example, it would be INSERT INTO RESULT(I) VALUES(?). Besides executeUpdate, the only methods currently supported are the setXXX parameter setter methods, getParameterMetaData, and clearParameters (which sets all parameters to null). Other methods will throw UnsupportedOperationException.
Relational Inputs
Now let's look at a UDX which takes cursor input:
create schema udx;
set schema 'udx';
set path 'udx';
create function stringify(c cursor, delimiter varchar(128))
returns table(v varchar(65535))
language java
parameter style system defined java
no sql
external name 'class net.sf.farrago.test.FarragoTestUDR.stringify';
select upper(v)
from table(
stringify(
cursor(select * from sales.depts),
':'));
+---------------+
| EXPR$0 |
+---------------+
| 10:SALES |
| 20:MARKETING |
| 30:ACCOUNTS |
+---------------+
stringify takes rows with multiple columns as input and produces rows with a single column as output by concatenating the input columns, using a delimiter string specified as a normal (non-cursor) parameter. The cursor constructor is required to make it clear that the input should be provided to the UDX in ResultSet form.
Here's the code for stringify:
public static void stringify(
ResultSet inputSet,
String delimiter,
PreparedStatement resultInserter)
throws SQLException
{
// Validate ParameterMetaData (require exactly one output)
assert(resultInserter.getParameterMetaData().getParameterCount() == 1);
// Use ResultSetMetaData to get input column count
int n = inputSet.getMetaData().getColumnCount();
StringBuilder sb = new StringBuilder();
while (inputSet.next()) {
sb.setLength(0);
for (int i = 1; i <= n; ++i) {
sb.append(inputSet.getString(i));
if (i < n) {
sb.append(delimiter);
}
}
resultInserter.setString(1, sb.toString());
resultInserter.executeUpdate();
}
}
A UDX can take any number of relational inputs, and can process their rows in any order (fetching all of one first, and then moving on to the next, or interleaving fetches). Cursor input ResultSets are forward-only and read-only, and do not need to be closed by the UDX (that is done automatically when the top-level statement execution is closed). An ORDER BY can be used on a cursor constructor to control the order of rows returned by a ResultSet, e.g.
select upper(v)
from table(
stringify(
cursor(select * from sales.depts order by deptno),
':'));
UDX invocations can be nested just like any other query construct. There is a proposal for FarragoUdxResultTypeDerivation. FRG-289 tracks the need to allow the user to supply costing metadata to the optimizer as part of a UDX implementation.
Threading and Cancellation
Each UDX method is invoked in its own thread. This is what allows it to a run to completion as a procedure rather than via repeated invocations of a state machine (the normal case for iterators in query execution). The thread communicates result rows to the invoking query via a synchronized queue.
In the following cases, the query may be canceled while the UDX thread is still producing rows:
- asynchronous abort request from calling program
- cursor closed explicitly by calling program
- error encountered in some other iterator
- higher-level iterator returns end-of-stream (e.g. a join may see end-of-stream on its left-hand-side, at which point it can return end-of-stream even if a UDX on the right-hand-side is still merrily producing rows)
In all such cases, the cursor cannot be fully closed until the UDX completes. This may take a long time if the UDX has a lot of rows to produce. Farrago uses a passive abort technique to terminate the UDX. Instead of attempting to kill the thread (which could have messy side-effects), Farrago throws an exception the next time the UDX calls back into executeUpdate. This means that it is very important for the UDX implementation to avoid suppressing exceptions from this call; instead, it must propagate them back up to its invoker, where the thread will terminate cleanly. Of course, the UDX should perform its own cleanup in a finally block.
Standards Conformance
SQL:2003 refers to UDX's as table functions and does not support relational inputs. Table functions are treated as returning a MULTISET of ROW where the row structure corresponds to the output table. The syntax TABLE(RAMP(5)) converts the MULTISET value into a relational expression usable in the FROM clause.
The standard JAVA calling convention for invocation of an array-returning function such as a table function is very different from Farrago's non-standard convention. In the standard, the query plan is supposed to call the external method once for each row to be produced, passing in state variables such as the row number to inform the external method of the position in the iteration. This makes writing such external methods very painful because
- They must be written in state-machine style
- No context is maintained along with each call, making it very difficult to write non-trivial functions
For these reasons, Farrago is not standards-conformant in this area.

