LucidDbAppLib SPLIT STRINGS

From Eigenpedia

Jump to: navigation, search

Contents

Purpose

Split one or more input strings, by a provided separator string, into separate rows.

Syntax

There are six variants of this UDX, taking as input either a string, a one-column table, or a multi-column table, all optionally generating a column of sequence numbers:

Single-string input:

select * from table(
  applib.split_string_to_rows(
    IN_STRING,
    SEPARATOR_CHAR,
    ESCAPE_CHAR,
    TRIM_TOKENS)
);

Single-column input:

select * from table(
  applib.split_rows(
    IN_CURSOR,
    SEPARATOR_CHAR,
    ESCAPE_CHAR,
    TRIM_TOKENS)
);

Multi-column input:

select * from table(
  applib.split_rows(
    IN_CURSOR,
    COL_NAME,
    SEPARATOR_CHAR,
    ESCAPE_CHAR,
    TRIM_TOKENS)
);

Single-string input with sequence numbers:

select * from table(
  applib.split_string_to_rows(
    IN_STRING,
    SEPARATOR_CHAR,
    ESCAPE_CHAR,
    TRIM_TOKENS,
    START_NUM,
    INCREMENT_BY)
);

Single-column input with sequence numbers:

select * from table(
  applib.split_rows(
    IN_CURSOR,
    SEPARATOR_CHAR,
    ESCAPE_CHAR,
    TRIM_TOKENS,
    START_NUM,
    INCREMENT_BY)
);

Multi-column input with sequence numbers:

select * from table(
  applib.split_rows(
    IN_CURSOR,
    COL_NAME,
    SEPARATOR_CHAR,
    ESCAPE_CHAR,
    TRIM_TOKENS,
    START_NUM,
    INCREMENT_BY)
);

Single-string input

Parameters

Input:

  • INPUT_STRING varchar(65535): a string to be split
  • SEPARATOR_CHAR char(1): a single-character separator to split by.
  • ESCAPE_CHAR char(1): an escape character to escape the separator when needed
  • TRIM_TOKENS boolean: trim leading and trailing whitespace from the split tokens, TRUE or FALSE

Output:

  • OUT_STRINGS: a single-column result set containing the split tokens from INPUT_STRING.
    • If INPUT_STRING is empty, or null, or if it's whitespace-only and TRIM_TOKENS is TRUE, no output rows will be produced.

Example

Input

  • SEPARATOR_CHAR: "~"
  • ESCAPE_CHAR: '!'
  • TRIM_TOKENS: TRUE
  • IN_STRING: "AA~BB~CC"

Query

select * from table(
  applib.split_string_to_rows(
    'AA~BB~CC',
    '~',
    '!',
    TRUE)
);

Output

OUT_STRINGS
-----------
AA
BB
CC

Single-column input

Parameters

Input:

  • INPUT_CURSOR cursor: a single-column cursor containing strings to be split.
  • SEPARATOR_CHAR varchar(1): a single-character separator to split by
  • ESCAPE_CHAR varchar(1): an escape character to escape the separator when needed
  • TRIM_TOKENS boolean: trim leading and trailing whitespace from the split tokens, TRUE or FALSE

Output:

  • a single-column result set containing the split tokens from all strings from INPUT_CURSOR, named the same as the single column in INPUT_CURSOR.

Example

Input

  • SEPARATOR_CHAR: "~"
  • ESCAPE_CHAR: '!'
  • TRIM_TOKENS: TRUE
  • INPUT_CURSOR:
MY_COLUMN_NAME
--------------
AA~BB
CC~DD

Query

select * from table(
  applib.split_rows(
    cursor(select MY_COLUMN_NAME from ...),
    '~',
    '!',
    TRUE)
);

Output

MY_COLUMN_NAME
--------------
AA
BB
CC
DD

Multi-column input

Parameters

Input:

  • INPUT_CURSOR cursor: a multi-column cursor containing, among other things, a column of strings to be split
  • COL_NAME: a ROW reference to the single column to be split
  • SEPARATOR_CHAR char(1): a single-character separator to split by
  • ESCAPE_CHAR char(1): an escape character to escape the separator when needed
  • TRIM_TOKENS boolean: trim leading and trailing whitespace from the split tokens, TRUE or FALSE

Output:

  • a multi-column result set, with the split column containing the split tokens from all strings of that column in INPUT_CURSOR, with the other columns' values duplicated.

Example

Input

  • SEPARATOR_CHAR: '~'
  • ESCAPE_CHAR: '!'
  • TRIM_TOKENS: TRUE
  • COL_NAME: row(COL2)
  • INPUT_CURSOR:
COL1 |COL2  |COL3
-----+------+-------
A1   |AA~BB |C1
A2   |CC~DD |C2

Query

select * from table(
  applib.split_rows(
    cursor(select COL1, COL2, COL3 from ...),
    row(COL2),
    '~',
    '!',
    TRUE)
);

Output

COL1 |COL2  |COL3
-----+------+----
A1   |AA    |C1
A1   |BB    |C1
A2   |CC    |C2
A2   |DD    |C2

Sequence Numbers

The variants with sequence numbers all add a column (SEQ_NUM bigint) to the resulting table, filled with a sequence of numbers.

Parameters

  • START_NUM bigint: the sequence number used for the first row. If NULL, this will default to 1.
  • INCREMENT_BY bigint: added to the previous sequence number to generate the next. Can be positive (for increasing sequence) or negative (for decreasing sequence). Must not be 0. If NULL, will default to 1.

Escaping

The "escape" parameter sets which character to be used for escaping the splitter character in cases where that character legally appears in tokens. For example, if exclamation mark, '!', is used, a splitter character preceded by an exclamation mark will not cause a string split but instead be included in the output string, unescaped:

  • INPUT_STRING: AA!~B!B~CC
  • out:
OUT_STRINGS
-----------
AA~B!B
CC

Exceptions

Exceptions are thrown in the following situations:

  • (SEPARATOR_CHAR is more or less than one single character) - currently disabled due to FRG-261
  • (ESCAPE_CHAR is more or less than one single character) - currently disabled due to FRG-261
  • INPUT_CURSOR is more than one single column in single-column mode
  • COL_NAME is more than one column
  • START_NUM and INCREMENT_BY are present and INCREMENT_BY=0.

Source Code

http://p4web.eigenbase.org/open/dev/luciddb/src/com/lucidera/luciddb/applib/string/SplitStringUdx.java

Personal tools