FarragoMedUdx
From Eigenpedia
Suppose you'd like to write a SQL/MED foreign data wrapper to allow users to query the Internet Movie Database (IMDb) via SQL. Being an accomplished Java programmer, you already know how to call IMDb via HTTP and parse HTML results (or maybe you know about the secret web-service API); now you need to figure out how to expose that functionality via SQL/MED. This document describes the easy way: write a FarragoUdx which encapsulates your request/response logic, and then tell SQL/MED how to rewrite queries against a foreign table into invocations of your UDX.
We'll assume you've already created a skeletal foreign data wrapper implementation, probably by copying the implementation from package net.sf.farrago.namespace.mock and adapting it to your purposes. In fact, the mock wrapper provides an example of how to rewrite to a UDX, so let's take a look at that. Here's a SQL transcript which shows it in action:
0: jdbc:farrago:> create schema mock_schema; 0: jdbc:farrago:> 0: jdbc:farrago:> set schema 'mock_schema'; 0: jdbc:farrago:> 0: jdbc:farrago:> create function ramp(n int) . . . . . . . . > returns table(i int) . . . . . . . . > language java . . . . . . . . > parameter style system defined java . . . . . . . . > no sql . . . . . . . . > external name 'class net.sf.farrago.test.FarragoTestUDR.ramp'; 0: jdbc:farrago:> 0: jdbc:farrago:> create foreign table mock_ramp_udx_table( . . . . . . . . > id int not null) . . . . . . . . > server sys_mock_foreign_data_server . . . . . . . . > options ( . . . . . . . . > executor_impl 'JAVA', . . . . . . . . > udx_specific_name 'mock_schema.ramp', . . . . . . . . > row_count '3'); 0: jdbc:farrago:> 0: jdbc:farrago:> select * from mock_ramp_udx_table; +-----+ | ID | +-----+ | 0 | | 1 | | 2 | +-----+ 0: jdbc:farrago:>
The script first creates the UDX, and then references it in the definition of a foreign table via the UDX_SPECIFIC_NAME option. This option is interpreted by the mock wrapper as a request to implement the table by calling the UDX. The code which achieves that is in method MedMockColumnSet.toRel:
RexBuilder rexBuilder = cluster.getRexBuilder();
RexNode arg = rexBuilder.makeExactLiteral(new BigDecimal(nRows));
// Call to super handles the rest.
return toUdxRel(
cluster,
connection,
udxSpecificName,
null,
new RexNode [] { arg });
The toRel method is responsible for constructing the UDX argument list (in this case the desired row-count from member variable nRows) as an array of RexNode objects. The rest (catalog lookup, relational algebra construction) is handled by the toUdxRel call implemented by superclass MedAbstractColumnSet. That's all there is to it. Usually, all of the arguments will be literals, so it's just a matter of finding the right factory methods on the RexBuilder interface.
Let's use EXPLAIN PLAN to see the "before and after":
0: jdbc:farrago:> explain plan without implementation for
. . . . . . . . > select * from mock_ramp_udx_table;
'column0'
'ProjectRel(ID=[$0])'
' TableAccessRel(table=[[LOCALDB, MOCK_SCHEMA, MOCK_RAMP_UDX_TABLE]])'
2 rows selected (0.075 seconds)
0: jdbc:farrago:>
0: jdbc:farrago:> explain plan with implementation for
. . . . . . . . > select * from mock_ramp_udx_table;
'column0'
'IterCalcRel(expr#0=[{inputs}], expr#1=[CAST($t0):INTEGER NOT NULL], ID=[$t1])'
' FarragoJavaUdxRel(invocation=[CAST(RAMP(CAST(3):INTEGER)):RecordType(INTEGER I) NOT NULL])'
2 rows selected (0.083 seconds)
Before plan implementation, we see a TableAccessRel to the foreign table. Afterwards, we see it rewritten as a FarragoJavaUdxRel, with the call to the RAMP function plus some redundant argument and result datatype casts.
Now, the popcorn is getting cold, so back to IMDb. Let's assume you can get as far as writing a UDX which retrieves a particular entity from IMDb, say actors. So you test it out, first retrieving a particular actor by URL, and then requesting all actors by passing in null for the URL:
create schema imdb;
set schema 'imdb';
create function retrieve_actor(url varchar(256))
returns table(url varchar(256), name varchar(256), dob date, ...)
language java
parameter style system defined java
no sql
external name 'imdb_plugin:com.acme.ImdbUDX.retrieveActor';
select name,dob from table(retrieve_actor('http://www.imdb.com/name/nm0000148/'));
'NAME','DOB'
'Harrison Ford', '1942-08-13'
select name,dob from table(retrieve_actor(null));
'NAME','DOB'
'Harrison Ford', '1942-08-13'
'Ruth Gordon', '1896-10-30'
...
To turn this into a foreign table, you can follow the pattern from MedMockColumnSet; in your ImdbColumnSet.toRel, once you established that the table being queried was the actors table, you would do something like:
RexBuilder rexBuilder = cluster.getRexBuilder();
RexNode arg = rexBuilder.constantNull();
// Call to super handles the rest.
return toUdxRel(
cluster,
connection,
"imdb.retrieve_actor",
null,
new RexNode [] { arg });
The code passes a literal null (constructed via constantNull) for arg so that querying the table will return all actors.
Then, once you'd created a foreign server imdb_server which instantiated your data wrapper, you could do:
create foreign table actors server imdb_server options (entity_name 'ACTOR'); select name,dob from actors; 'NAME','DOB' 'Harrison Ford', '1942-08-13' 'Ruth Gordon', '1896-10-30' ...
At this point, you're probably wondering why you would want to jump through all of these hoops. After all, you could just as easily have done:
create view actors as select * from table(retrieve_actor(null)); select name,dob from actors;
No foreign data wrapper required, and you get the same results. But read on...
There are at least three good reasons to go to the trouble of writing a foreign data wrapper:
- Optimization potential. Once you have the basics working, it's possible to start adding rules for optimizations such as pushing filters down into the foreign server. For example, a query against the actors view above will return rows for all actors before doing any further processing. So if you execute a query like select name from actors where url='http://www.imdb.com/name/nm0000148/', Farrago will pull out all of the actor entries from IMDb and then apply the URL filter locally. This will be very slow, and the people at IMDb probably won't be happy if you run this over and over. But with a foreign data wrapper, you can provide an optimizer rule which recognizes URL filters, extracts the URL, and supplies it as the argument to the UDX invocation, so only one row will come back. More advanced patterns such as joins can also be pushed down in this way (e.g. "find all movies in which Li Gong acts").
- Dynamic schemas. Some data sources may extend their schemas over time. For example, IMDb might start providing box office and video revenue information. If a foreign data wrapper is capable of querying metadata, it can use this to dynamically derive the available tables and columns. This is not possible with fixed views built on top of fixed UDX's.
- UDX reuse. Instead of writing one UDX per entity to be queried, you can write just one and reuse it for all entities. In the IMDb example, you might write a retrieve_url UDX capable of producing a different result set for each entity. Because foreign table rewrite happens after SQL validation, it is not subject to the usual rule that every UDX has to produce a fixed table result set.
For a concrete example, see FarragoMedMqlPlugin.
Final note. Implementation via UDX is not necessarily the most efficient way to accomplish external data access. Farrago provides two other approaches:
- Generation of Java iterator code. There's no threading/queueing overhead, and code generation can produce very efficient runtime code. This is the approach used for the JDBC foreign data wrapper.
- Generation of Fennel ExecStream plans. In the right hands, native code provides extreme efficiency and access to API's not available from Java. This is the approach used for the flat-file foreign data wrapper.
However, writing a UDX is much easier than either of these approaches, so it's usually best to start with that and move to one of the other methods only if required.

