NumericType

From Eigenpedia

Jump to: navigation, search

This document describes the functional design of the DECIMAL/NUMERIC type.
See NumericTypeDesign for internals of the decimal implementation.

Our discussions are also online.

Contents

Decimal Properties


In Farrago, a DECIMAL type is an numeric SQL type, with a fixed number of base 10 digits. Some of these digits are dedicated to the fractional portion of a number. Decimals are characterized by the following properties:

Precision. the total number of digits
Scale. the number of digits after the decimal point
Radix. base 10 is used

Decimals are declared as DECIMAL(precision, scale). For example, DECIMAL(6,2) represents a six digit number, with two digits after the decimal point (and up to four digits before).

“123.45” can be represented exactly by DECIMAL(6,2)
“12345” is too big
“12.345” has too many digits after the decimal point.

Decimal Representation


Internally, decimals are represented as signed integers, with a constant decimal shift. For example, the DECIMAL(6,2) value “123.45” is shifted by two digits (multiplied by 10^2) to get an integer representation of “12345”. The DECIMAL(6,3) value “123.45” is adjusted by 3 digits to get an integer representation of “123450”.

All decimals are represented by 64-bit signed integers. The maximum precision of a decimal is 19. Note that most, but not all 19-digit numbers can be represented.


For reference, the maximum range of integer values are as follows:

32-bit integer. +/- 2.1 * 10^9. can represent precision 9, a few numbers of precision 10
64-bit integer. +/- 9.2 * 10^18. can represent precision 18, most of precision 19
128-bit integer. +/- 1.7 * 10^38. can represent precision 38, a bit of precision 39
double precision floating point. accurately represents about 15 decimal digits

Decimal Arithmetic: Result Type


Type matching rules
Floating point numbers have the greatest range of any numbers. Consequently, arithmetic between a floating point number and a decimal number is treated as floating point arithmetic. On the other hand, decimals have more expressive power than integer numeric types. INTEGER and BIGINT are special cases of DECIMAL (with scale = 0). Consequently, arithmetic between decimals and integers is treated as decimal arithmetic.

Data type precedence. floating-point > fixed-point > integral

In addition to precision and scale, let us also introduce the following concepts for our discussion of decimal arithmetic:


(whole) Digits. the number of digits before the decimal point; in other words (Precision –Scale)
Abbreviations. P=precision, S=scale, D=digits
Suffixes. for example P1, P2=(precision of operands); Pout=(precision of result)

Natural result types
The results of arithmentic between two decimals have the following natural constraints:


Addition/Subtraction
Dout = max(D1,D2)+1
Sout = max(S1,S2)
Example: 999.0 + 1.01 = 1000.01

Multiplication
Dout = D1+D2
Sout = S1+S2
Example: 90.1 * 5.01 = 451.401


Division
Dout = Dtop + Sbottom
Sout = can be infinite
Example: 1/3 = 0.33333...
Example: 12.34 / 0.01 = 1234

Notice that results have a broader range than operands. This presents a challenge to our database. During validation, it assigns result types to arithmetic, but it is constrained by system limits (19 decimal digits). It must choose between integral digits or fractional digits. Insufficient integral digits may lead to result overflow. Insufficient fractional digits leads to loss of precision. Neither choice is ideal. Other database vendors have provided the following heuristics:

Limited result types


Like Farrago, the database systems below have a limited max precision, Pmax. They generally have a preferred result type for arithmetic operations (precision, number of whole digits, and/or scale). The preferred type is described by equations. However, if the preferred precision is greater than the maximum precision, either the whole digits or the scale must be reduced.

Table 1: Result types of popular databases.


Pmax ADD/SUB MULT DIV
SQL 2003 Spec Pout=implementation Sout=max(S1,S2) Pout=implementation Sout=S1+S2 Pout=implementation Sout=implementation
(causes overflow) Leading digit lost Leading digit lost Leading digit lost [round or trunc accepted]
DB2 31 Dout=max(D1,D2) Sout=max(S1,S2) Pout=P1+P2 [up to 31] Sout=S1+S1 [up to 31] Pout=31 Dout=Dtop+Sbottom [up to 31] Sout=31-Dtop
Lower Dout to preserve fractional part of result. Sout will be at least 3 if the database parameter MIN_DEC_DIV_3 is set.
Sql Server 38 Dout=max(D1,D2) [+1 for add] Sout=max(S1,S2) Pout=P1+P2+1 [up to 38] Sout=S1+S2 [up to 38] Dout=Dtop+Sbottom [up to 38] Sout=max(6,Stop+Pbottom+1)
Lower Sout to preserve integral part of result. Lower Sout to preserve integral part of result. Lower Sout to preserve integral part of result. Sout is always at least 6.
Broadbase 19 Dout=max(D1,D2) Sout=max(S1,S2) Pout=P1+P2 [up to 19] Sout=S1+S2 [up to 19] Pout=max(D1,D2)+max(S1,S2) [up to 19] Sout=max(S1,S2)
Lower Sout to preserve integral part of result. Lower Sout to preserve integral part of result. Lower Sout to preserve integral part of result. If the parameter divParam is set, Sout will be at least divParam.
Farrago 19 Dout=max(D1,D2)+1 Sout=max(S1,S2) Pout=P1+P2 [up to 19] Sout=S1+S2 [up to 19] Dout=Dtop+Sbottom [up to 19] Sout=max(6,Stop+Pbottom+1)
Lower Dout to preserve the fractional part of result. Lower Sout to preserve integral part of result.
LucidDb 19 Dout=max(D1,D2)+1 Sout=max(S1,S2) Pout=P1+P2 [up to 19] Sout=S1+S2 [up to 19] Dout=Dtop+Sbottom [up to 19] Sout=6
Lower Dout to preserve the fractional part of result. Sout is capped at 6 to preserve integral part of result. Implemented with doubles to avoid overflow. Sout is capped at 6 to preserve integral part of result. Implemented with doubles to avoid overflow.

Tai: Good, let's avoid the DB parameter

  • Oracle uses a generic NUMBER type (which represents up to 38 digits without rounding) and has no need to specify scale/precision
  • Postgres/EnterpriseDB use an arbitrary precision number (up to 1000 digits)
  • MySQL uses an arbitrary precision number (previously double precision, now up to 65 digits exact precision)


In general, the SQL Spec could be described as cautious. Its result type does not allow rounding errors, except on division. As a result, exceptions are thrown whenever arithmetic would cause a loss of any digits. DB2 faithfully implements the SQL Spec. (Note: examples assume a maximum precision of 19).

DECIMAL(19,0) + 1.000 [DECIMAL(4,3)] => DECIMAL(19,3) // loss of integral digits (range)
DECIMAL(19,3) * 1.000 [DECIMAL(4,3)] => DECIMAL(19,6) // loss of integral digits (range)


The SQL Server and Broadbase implementations could be described as permissive. When necessary, the scale is lowered in an effort to prevent the result from overflowing. The cost is that small digits may be prematurely cut from the results of arithmetic.

DECIMAL(19,0) + 1.000 [DECIMAL(4,3)] => DECIMAL(19,0) // loss of small digits (less precise)
DECIMAL(19,3) * 1.000 [DECIMAL(4,3)] => DECIMAL(19,2) // loss of small digits (less precise)


Division. DB2 and SQL Server attempt to support the maximum result of a division, sacrificing scale as necessary. Some implementations, DB2 and Broadbase, provide a database parameter which guarantees a minimum number of decimal digits (for example, 3). SQL Server specifies that the maximum number of decimal digits should be 6.

Proposed behavior. Attempt to follow the SQL Spec and therefore DB2. Note that the max precision of 19 digits currently falls short of other major databases. We'll follow a division scheme similar to SqlServer which provides a practical heuristic (similar to the significant digits of scientific arithmetic) without too many side effects. (In DB2, dividing Dec(1,0) by Dec(1,0) yields Dec(19,18). Subsequent addition of small number like Dec(2,0) may cause an overflow).

Table 2. SQL 2003 specification for results of other operations


Operation Behavior
round result is integer; either + or - 0.5 is okay
trunc/floor, ceiling result is integer; truncate after decimal, maybe add one
% not applicable, unless scale is 0
ln, e, pow, sqrt returns implementation defined real


Decimal Arithmetic: Implementation


During validation, arithmetic between operands is expanded so that the operand types are compatible and the return type is inferred. The function calls are mapped to more tangible typed function calls. During this process, the decimal arithmetic will be mapped to more basic integer and floating point arithmetic.

Example of decimal addition:

3.5 + 1.49 [initial decimal values]
3.50 + 1.49 [are set to same scale]
350 + 149 [then are reinterpreted as long values]
499 [the long values are added]
4.99 [and the result is reinterpreted as a decimal]


Notes
1. literal values such as pow(10,S) are computed once per query
2. overflow is often avoided by using a result type that can accomodates all possible result values

3. otherwise overflow happens if and only if basic arithmetic fails or cast fails

Table 3. Mapping decimal arithmetic to integer and floating point arithmetic

Addition/Subtraction

reinterpretCast(decimal(Pout,Sout))
    addBigint
        decimal1 * pow(10, Sout-S1)
        decimal2 * pow(10, Sout-S2)
Overflow:
(1) adjusting scale overflows or
(2) bigint addition overflows
[note: result type is guaranteed to be sufficient or max]

Multiplication

reinterpretCast(decimal(Pout,Sout))
    multiplyBigint
        decimal1
        decimal2
Overflow:
(1) multiplication overflows
[note: result type is guaranteed to be sufficient or max]

Division

reinterpretCast(decimal(Pout,Sout))
    cast float to bigint
        multiplyFloat
            divideFloat
                cast decimal1 as float
                cast decimal2 as float
            pow(10,Sout-S1+S2) // may be less than 1
Overflow:
(1) float division overflows (impossible?)
(2) cast to integer type overflows
(3) multiplication overflows
[note: float range is much larger than decimal, but inexact]
[note: may be less precise; how else might we implement?]

Rounding (rounds half away from zero)

divideBigint
    case decimal > 0
        addBigint
            decimal
            pow(10,S)/2
    else
        subtractBigint
            decimal
            pow(10,S)/2
    endif
    pow(10,S)
Overflow:
(1) bigint addition fails

Truncation/Floor

divideBigint
    decimal
    pow(10,S)

Ceiling

divideBigint
    addBigint
        decimal
        pow(10,S) - 1
    pow(10,S)
Overflow:
(1) bigint addition fails

Mod

error unless (scale = 0)
same as mod Bigint

Exponential functions: ln, e, pow, sqrt
All other functions

Decimals are casted as doubles.

It's important to have this feature because decimals are the default type for almost all numeric literals containing a decimal point.

Casting


Note: unlike arithmetic operations, casts DO require special overflow checks
Note: casts rely on the round function (described above)
(Refer to SQL 2003 Part 2 Section 4.4.2 Number characteristics)

Int to Decimal

multiplyBigint
    int
    pow(10,S)

Real to Decimal

cast to Bigint
    multiplyFloat
        real
        pow(10,S)

Decimal to Int

round
    decimal
    0

Decimal to Real

divideFloat
    cast decimal as float
    pow(10,s)

Decimal to Decimal

if (Sin == Sout)
    outValue = inValue
else if (Sin > Sout)
    outValue = inValue * pow(10,Sin-Sout)
else
    outValue = round(decimal, Sout)
end


Union and CASE

SQL:2003 Part 2 Section 9.3, "Data types of results of aggregations", defines the rules to be used for deriving a possibly widened result type in the case where a query column's values will consist of the union of values from a number of different yet compatible types. This is relevant for relational set operators such as UNION and INTERSECTION, as well as CASE expressions (where each THEN clause may supply a value of a different type).


According to Syntax Rule 3.c, "If all of the data types in DTS [the set of types being combined] are exact numeric, then the result data type is exact numeric with implementation-defined precision and with scale equal to the maximum of the scales of the data types in DTS." As with multiplication, the standard is erring on the side of safety in preserving scale, with overflow resulting in errors if insufficient precision remains for the whole digits.

Farrago currently violates the standard here, instead choosing to preserve whole digits instead of scale (similar to LucidDB multiplication type derivation). We will address this as part of work on strict-vs-pragmatic mode.

UDF/UDT


The (2s-complement) integer representation of decimals is readily converted into java.math.BigInteger, then BigDecimal.

Psuedocode (Java rather than JNI).

[int internalDecimal]
BigDecimal arg = new BigDecimal(new BigInteger(&internalDecimal), scale)

(Refer to SQL 2003 Part 10 Section 9.11 SQLJ Datatype Properties for type mapping)

JDBC


Values can be casted from BigDecimal to other types (strings, integers, doubles).

Design Questions


Will we use 32-bit integers or only 64-bit integers?
The more compact 32-bit integers have the benefits of less storage space, possibly faster execution. On the other hand, they might complicate the model, because they would necessitate the handling of multiple internal types. But then again, it might save us time in the long run, if we eventually decided to use multiple internal types. For now, we'll keep things simple and only use the 64-bit types unless there are good reasons to go 32-bit. (Question: is there a compelling reason to do 32-bit?) Other databases use proprietary internal representations for their numeric data.
Tai: Keep it simple, let’s start with 64-bit storage.

What precision/scale are we going to support?
Other databases allow a high precision/scale because they probably have a larger binary datatype than 64 bit. If we are only doing that much it is only possible to really support 19 digits of precision or perhaps 18. (Might as well support almost 19 and allow numbers to overflow naturally when the internal representation cannot support them.) Similarly scale can range from 0-19. Negative scale is not supported in the SQL 2003 spec and would complicate things, so maybe we could skip it for now. (Question: is there a compelling reason to do negative scale?) Oracle supports a large range of scales because their generic NUMBER type can dynamically become approximate at extreme values.

What are the return types of arithmetic?
If we would like to follow the SQL 2003 specification, our approach would be similar to DB2's. This also allows us to have the correct overflow behavior without too much additional work. The most controversial arithmetic operation is probably DIVISION, for which the standard provides little guidance. Allowing the maximum integral digits to support the result (as DB2 and SQL Server do) may mean sacrificing fractional digits (especially with our limited precision), and might mean a messy database parameter to guarantee a specified number of fractional digits. Perhaps we could require a set number as minimum (like 3).


References


SQL 2003 Part 2 Section 6.26 (the draft found through this link is close to the actual standard)
[1]

DB2 8.1 SQL Reference Vol I Ch 2
[2]

MSDN Transact SQL Reference
[3]

Personal tools