LucidDbStreamingLoad
From Eigenpedia
Contents |
Introduction
This page describes experimental support for bulk loading data into LucidDB without having to save it on disk in a flatfile first, and without having to write a new foreign data wrapper.
The approach is explained in terms of using LucidDB as an ETL target for Pentaho Data Integration (PDI, formerly Kettle).
NOTE: A PDI plugin now exists which takes care of all of the messy bits, so if you actually want to bulk load from PDI, you should use that instead. However, the information in this page is still useful as an example for duct-taping together a bulk load from any source, e.g. LucidDbTalend.
Setup
These instructions currently only work on Linux. Windows support via named pipes may be developed in the future.
Streaming load is based on fifo support in LucidDB's flatfile reader. This is available since release 0.8.0, so set up a copy of that before continuing (see LucidDbGettingStarted if this is your first time setting up a LucidDB instance). However, we'll use luciddb-0.7.5 as the example installation directory, since these instructions were originally based on a prerelease with that version number.
Create a directory luciddb-0.7.5/fifo; this is where the named pipe will be created, along with some other related files.
Using sqllineClient, create a new user:
create user pdi authorization 'unused'; !quit
Reconnect with sqllineClient -n PDI to create a target schema for this user, as well as a foreign server for the fifo directory (adjusting the path accordingly):
create schema target;
create table target.entropy(random_uuid varchar(128));
create or replace server fifo_server
foreign data wrapper sys_file_wrapper
options (
directory '/path/to/luciddb-0.7.5/fifo/',
file_extension 'csv',
with_header 'yes',
num_rows_scan '0',
lenient 'no');
!quit
The num_rows_scan setting is very important for the fifo to work correctly; without it, you will get mysterious errors or hangs later.
We will use PDI to fill the empty table with randomly generated UUID's.
Download a copy of PDI and install it.
You can tell PDI how to connect to LucidDB by giving it LucidDB's client JDBC driver and connection information, following steps similar to those in LucidDbAggregateDesigner#Install the Client (except the JDBC driver directory is libext instead of drivers). However, that is not actually necessary for streaming load to work, so we won't go into details here.
Create Generic Loader Script
Create a new file named lucidDbLoader in your luciddb-0.7.5/bin directory, with the following contents:
#!/bin/bash
usage() {
echo "Usage: lucidDbLoader <jdbc-url> <user-name> <password> <fifo-dir> <fifo-server-name> <schema-name> <table-name>"
echo "Example: lucidDbLoader jdbc:luciddb:rmi://localhost sa unused /var/tpch FFSRV TPCH ORDERS"
}
if [ "$#" != 7 ]; then
usage;
exit -1;
fi
set +e
binDir=$(cd `dirname $0`; pwd)
jdbcUrl="$1"
userName="$2"
password="$3"
fifoDir="$4"
fifoServerName="$5"
schemaName="$6"
tableName="$7"
controlFileName="${fifoDir}/${tableName}.bcp"
fifoName="${fifoDir}/${tableName}.csv"
if [ ! -f ${controlFileName} ]; then
echo "Control file ${controlFileName} does not exist"
exit -1;
fi
rm -f ${fifoName}
mkfifo ${fifoName}
cat > ${fifoDir}/lucidDbLoader.sql <<EOF
insert into "${schemaName}"."${tableName}"
select * from "$fifoServerName"."DEFAULT"."${tableName}";
!quit
EOF
${binDir}/sqllineClient -u ${jdbcUrl} -n ${userName} -p ${password} > ${fifoDir}/lucidDbLoader.log 2>&1 --run=${fifoDir}/lucidDbLoader.sql &
cat - > ${fifoName}
wait
rm -f ${fifoName}
Then make it executable:
chmod a+x lucidDbLoader
(Eventually a script like this will come standard with LucidDB releases, but for now you have to create it yourself.)
Create Wrapper Script
Now, in the luciddb-0.7.5/fifo directory, create a script named loadEntropy as follows (adjusting paths as usual):
#!/bin/bash set +e cat - | /path/to/luciddb-0.7.5/bin/lucidDbLoader jdbc:luciddb:rmi://localhost PDI unused /path/to/luciddb-0.7.5/fifo FIFO_SERVER TARGET ENTROPY
Make it executable also:
chmod a+x loadEntropy
The only reason this script is needed is that PDI doesn't have a dedicated step for loading LucidDB yet, so we're going to have to repurpose the generic flatfile output step. Once PDI has been enhanced with a dedicated LucidDB loader step, this script will no longer be necessary.
Create BCP Control File
In the luciddb-0.7.5/fifo directory, create a file named ENTROPY.bcp as follows:
9.0 1 1 SQLCHAR 0 128 "rn" 1 RANDOM_UUID ""
For general information on BCP control file format, see LucidDbFlatFileWrapper#Column Descriptions.
Again, this will no longer be needed once a dedicated step is available which knows how to generate the control file automatically based on transformation metadata.
Define PDI Transformation
OK, now we have everything we need for setting up the load in PDI.
Create a new transformation like this:
For the second page of the text file output step, fill the information in as follows:
Test the Transformation
Execute the transformation, and if all goes well, it will load 9 UUID rows into LucidDB.
Verify with sqllineClient -n PDI:
0: jdbc:luciddb:rmi://localhost> select * from target.entropy; +------------------------------------------------------------------------------+ | RANDOM_UUID | +------------------------------------------------------------------------------+ | 3gorklolnkke2 | | 4s44n5cabmocp | | 491i7gbh90l5q | | 4lrn9mdblh9nr | | 4454ckbti41gh | | hnff3toh1322 | | 2mimnlo6llqfp | | fjg1lgkvlohg | | 33s3gj8qcp7it | +------------------------------------------------------------------------------+
Troubleshooting
Fifos can be a bit of a pain since if one end dies or does something wrong, the other end is left hanging.
The file luciddb-0.7.5/fifo/lucidDbLoader.sql contains the INSERT statement generated to execute the load.
The file luciddb-0.7.5/fifo/lucidDbLoader.log contains the result of executing that INSERT statement.
While the loader script is running the fifo luciddb-0.7.5/fifo/ENTROPY.csv will exist.
The PDI execution window has a logging tab to show the output of commands executed. (It does not treat a return code of -1 from a command execution as a failure, so you have to check the logging window yourself instead of relying on the execution result pane.)
Within sqllineClient, you can use select * from sys_root.dba_sql_statements to see what is executing in LucidDB.
If all else fails, use ps to see what processes are running; kill -9 is your friend.



