FarragoUdxResultTypeDerivation
From Eigenpedia
This page documents proposed extensions to FarragoUdx to allow the result type of a UDX invocation to be derived dynamically based on its parameters, and to give the optimizer information about allowable rewrites. The approach is declarative, so that no code needs to be written to achieve most common patterns. We'll go through some examples to illustrate the requirements.
A basic requirement is the ability to propagate the type of an entire input row, possibly inserting additional fixed columns before or after. Here's an example of appending a 128-byte digest to each row:
create function append_digest(c cursor) returns table(c.*, digest binary(128)) ...; -- produces result (deptno, name, digest) select * from table(append_digest(cursor(select deptno, name from sales.depts))));
NOTE jvs 27-Jan-2007: The functionality above is already working.
A more advanced requirement is the ability to identify column subsets from the input and treat them differently. Suppose that we want to preserve a subset of the input columns, but collapse the rest into a single string:
create function collapse(c cursor, r select from c) returns table(r.*, etc varchar(1024)) ...; -- produces result (empno, name, etc) select * from table(collapse(cursor(select * from sales.emps), row(empno, name)));
Here the "r select from c" parameter definition indicates that a row of column names will be specified for each invocation, and these column names must be a subset of the names present in the referenced cursor. In the invocation, the row operator is used to bracket the list of column names. The invoked Java method must have a corresponding parameter of type List<String> which receives the column names.
NOTE jvs 27-Jan-2007: The ability to declare a column-list parameter and receive the list of column names at invocation time is already working.
NOTE zfong 29-Dec-2008: The ability to propagate only that column list to the output also works. The rest of this page is not yet implemented.
Sometimes row construction needs to be done in a subtractive fashion rather than additively. A UDX may implement a custom join operator, and the author of the UDX wants to discard the redundant join keys from one side:
create function snazzy_join(c1 cursor, c2 cursor, k1 select from c1, k2 select from c2)
returns table (c1.*, c2.* excluding k2.*)
...;
-- produces result (deptno,dname,ename)
select * from table(snazzy_join(
cursor(select deptno,name as dname from depts),
cursor(select deptno,name as ename from emps),
row(deptno),
row(deptno)));
The validator would require that after the full row type has been constructed, all result column names must be unique. This may require aliasing on the cursor input queries.
A related enhancement would be a way to constrain datatypes on the columns referenced in column lists. Currently, UDX's can do this programatically, but it only happens at runtime, whereas something metadata-driven and prepare-time would be preferable in many cases.
That's it for dynamic type derivation. Now, how about syntax for letting the optimizer know about potential rewrites?
In the append_digest example above, we'd like to be able to tell the optimizer that filters can be pushed down:
create function append_digest(c cursor) returns table(c.*, digest binary(128)) restrict input with filter(c.*) ...; -- as a result, optimizer knows that select * from table(append_digest(cursor(select * from sales.depts))) where deptno=10; -- is equivalent to select * from table(append_digest(cursor(select * from sales.depts where deptno=10)));
But maybe snazzy_join does not allow for filter pushdown on join keys (while anything else is OK):
create function snazzy_join(c1 cursor, c2 cursor, k1 select from c1, k2 select from c2) returns table (c1.*, c2.* excluding k2.*) restrict input with filter(c1.* excluding k1.*, c2.* excluding k2.*) ...;
Similarly, we can tell the optimizer how to handle projection:
create function collapse(c cursor, r select from c) returns table(r.*, etc varchar(1024)) restrict input with select(r.*) ...; -- as a result, optimizer knows that select empno, etc from table(collapse(cursor(select * from sales.emps), row(empno, name))); -- is equivalent to select empno, etc from table(collapse(cursor(select empno from sales.emps), row(empno))); -- but that select deptno, etc from table(collapse(cursor(select * from sales.emps), row(empno, name))); -- is NOT equivalent to select deptno, etc from table(collapse(cursor(select deptno from sales.emps), row())); -- (because etc is derived from the other columns)
We might want to come up with a way to express the fact that if etc is projected away, then all of the underlying columns can be projected; a similar construct would also be useful for tracing column-level dataflow dependencies through columns fabricated by the UDX.

