Small. Fast. Reliable.
Choose any three.

Datatypes In SQLite Version 3

Most SQL database engines (every SQL database engine other than SQLite, as far as we know) uses static typing. With static typing, the datatype of a value is determined by its container - the particular column the value is stored in.

SQLite uses a more general dynamic type system. In SQLite, the datatype of a value is associated with the value itself, not with the container in which it is stored. The dynamic type system of SQLite is backwards compatible with the more common static type systems of other database engines in the sense that SQL statement that work on statically typed databases should would the same way in SQLite. However, the dynamic typing in SQLite allowed it to do things which are not possible in traditional statically typed databases.

1. Storage Classes

Each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:

Any column in a version 3 database, except an INTEGER PRIMARY KEY column, may be used to store any type of value.

All values supplied to SQLite, whether as literals embedded in SQL statements or values bound to pre-compiled SQL statements are assigned a storage class before the SQL statement is executed. Under circumstances described below, the database engine may convert values between numeric storage classes (INTEGER and REAL) and TEXT during query execution.

Storage classes are initially assigned as follows:

The storage class of a value that is the result of an SQL scalar operator depends on the outermost operator of the expression. User-defined functions may return values with any storage class. It is not generally possible to determine the storage class of the result of an expression at compile time.

2. Column Affinity

In SQLite version 3, the type of a value is associated with the value itself, not with the column or variable in which the value is stored. (This is sometimes called manifest typing or duck typing.) All other SQL databases engines that we are aware of use the more restrictive system of static typing where the type is associated with the container, not the value. To look at it another way, SQLite provides dynamic datatypes such as one finds in "script" programming languages such as Awk, Perl, Tcl, Python, and Ruby, whereas other SQL database engines provide only compile-time fixed, static typing such as found in Pascal, C++, and Java.

In order to maximize compatibility between SQLite and other database engines, SQLite support the concept of "type affinity" on columns. The type affinity of a column is the recommended type for data stored in that column. The key here is that the type is recommended, not required. Any column can still store any type of data, in theory. It is just that some columns, given the choice, will prefer to use one storage class over another. The preferred storage class for a column is called its "affinity".

Each column in an SQLite 3 database is assigned one of the following type affinities:

A column with TEXT affinity stores all data using storage classes NULL, TEXT or BLOB. If numerical data is inserted into a column with TEXT affinity it is converted to text form before being stored.

A column with NUMERIC affinity may contain values using all five storage classes. When text data is inserted into a NUMERIC column, an attempt is made to convert it to an integer or real number before it is stored. If the conversion is successful (meaning that the conversion occurs without loss of information), then the value is stored using the INTEGER or REAL storage class. If the conversion cannot be performed without loss of information then the value is stored using the TEXT storage class. No attempt is made to convert NULL or blob values.

A column that uses INTEGER affinity behaves in the same way as a column with NUMERIC affinity, except that if a real value with no fractional component and a magnitude that is less than or equal to the largest possible integer (or text value that converts to such) is inserted it is converted to an integer and stored using the INTEGER storage class.

A column with REAL affinity behaves like a column with NUMERIC affinity except that it forces integer values into floating point representation. (As an internal optimization, small floating point values with no fractional component are stored on disk as integers in order to take up less space and are converted back into floating point as the value is read out.)

A column with affinity NONE does not prefer one storage class over another. No attempt is made to coerce data from one storage class into another. The data is stored on disk exactly as specified.

2.1 Determination Of Column Affinity

The type affinity of a column is determined by the declared type of the column, according to the following rules:

  1. If the datatype contains the string "INT" then it is assigned INTEGER affinity.

  2. If the datatype of the column contains any of the strings "CHAR", "CLOB", or "TEXT" then that column has TEXT affinity. Notice that the type VARCHAR contains the string "CHAR" and is thus assigned TEXT affinity.

  3. If the datatype for a column contains the string "BLOB" or if no datatype is specified then the column has affinity NONE.

  4. If the datatype for a column contains any of the strings "REAL", "FLOA", or "DOUB" then the column has REAL affinity

  5. Otherwise, the affinity is NUMERIC.

If a table is created using a "CREATE TABLE <table> AS SELECT..." statement, then all columns have no datatype specified and they are given no affinity.

2.2 Column Affinity Example

CREATE TABLE t1(
    t  TEXT,
    nu NUMERIC, 
    i  INTEGER,
    no BLOB
);

-- Storage classes for the following row:
-- TEXT, REAL, INTEGER, TEXT
INSERT INTO t1 VALUES('500.0', '500.0', '500.0', '500.0');

-- Storage classes for the following row:
-- TEXT, REAL, INTEGER, REAL
INSERT INTO t1 VALUES(500.0, 500.0, 500.0, 500.0);

3. Comparison Expressions

Like SQLite version 2, version 3 features the binary comparison operators '=', '<', '<=', '>=' and '!=', an operation to test for set membership, 'IN', and the ternary comparison operator 'BETWEEN'.

The results of a comparison depend on the storage classes of the two values being compared, according to the following rules:

SQLite may attempt to convert values between the numeric storage classes (INTEGER and REAL) and TEXT before performing a comparison. Whether or not any conversions are attempted before the comparison takes place depends on the nominal affinity assigned to the expressions on either side of the binary operator. Affinities are assigned to expressions in the following cases:

Conversions are applied before the comparison as described below. In the following bullet points, the two operands are refered to as expression A and expression B. Expressions A and B may appear as either the left or right operands - the following statements are true when considering both "A <op>B" and "B <op>A".

In SQLite, the expression "a BETWEEN b AND c" is equivalent to "a >= b AND a <= c", even if this means that different affinities are applied to 'a' in each of the comparisons required to evaluate the expression.

Expressions of the type "a IN (SELECT b ....)" are handled by the three rules enumerated above for binary comparisons (e.g. in a similar manner to "a = b"). For example if 'b' is a column value and 'a' is an expression, then the affinity of 'b' is applied to 'a' before any comparisons take place.

SQLite treats the expression "a IN (x, y, z)" as equivalent to "a = +x OR a = +y OR a = +z". The values to the right of the IN operator (the "x", "y", and "z" values in this example) are considered to be expressions, even if they happen to be column values. If the value of the left of the IN operator is a column, then the affinity of that column is used. If the value is an expression then no conversions occur.

3.1 Comparison Example

CREATE TABLE t1(
    a TEXT,
    b NUMERIC,
    c BLOB
);

-- Storage classes for the following row:
-- TEXT, REAL, TEXT
INSERT INTO t1 VALUES('500', '500', '500');

-- 60 and 40 are converted to '60' and '40' and values are compared as TEXT.
SELECT a < 60, a < 40 FROM t1;
1|0

-- Comparisons are numeric. No conversions are required.
SELECT b < 60, b < 600 FROM t1;
0|1

-- Both 60 and 600 (storage class NUMERIC) are less than '500'
-- (storage class TEXT).
SELECT c < 60, c < 600 FROM t1;
0|0

4. Operators

All mathematical operators (which is to say, all operators other than the concatenation operator "||") apply NUMERIC affinity to all operands prior to being carried out. If one or both operands cannot be converted to NUMERIC then the result of the operation is NULL.

For the concatenation operator, TEXT affinity is applied to both operands. If either operand cannot be converted to TEXT (because it is NULL or a BLOB) then the result of the concatenation is NULL.

5. Sorting, Grouping and Compound SELECTs

When values are sorted by an ORDER by clause, values with storage class NULL come first, followed by INTEGER and REAL values interspersed in numeric order, followed by TEXT values usually in memcmp() order, and finally BLOB values in memcmp() order. No storage class conversions occur before the sort.

When grouping values with the GROUP BY clause values with different storage classes are considered distinct, except for INTEGER and REAL values which are considered equal if they are numerically equal. No affinities are applied to any values as the result of a GROUP by clause.

The compound SELECT operators UNION, INTERSECT and EXCEPT perform implicit comparisons between values. Before these comparisons are performed an affinity may be applied to each value. The same affinity, if any, is applied to all values that may be returned in a single column of the compound SELECT result set. The affinity applied is the affinity of the column returned by the left most component SELECTs that has a column value (and not some other kind of expression) in that position. If for a given compound SELECT column none of the component SELECTs return a column value, no affinity is applied to the values from that column before they are compared.

6. Other Affinity Modes

The above sections describe the operation of the database engine in 'normal' affinity mode. SQLite version 3 will feature two other affinity modes, as follows:

7. User-defined Collation Sequences

By default, when SQLite compares two text values, the result of the comparison is determined using memcmp(), regardless of the encoding of the string. SQLite v3 provides the ability for users to supply arbitrary comparison functions, known as user-defined "collation sequences" or "collating functions", to be used instead of memcmp().

Aside from the default collation sequence BINARY, implemented using memcmp(), SQLite features two extra built-in collation sequences intended for testing purposes, the NOCASE and RTRIM collations:

7.1 Assigning Collation Sequences from SQL

Each column of each table has a default collation type. If a collation type other than BINARY is required, a COLLATE clause is specified as part of the column definition to define it.

Whenever two text values are compared by SQLite, a collation sequence is used to determine the results of the comparison according to the following rules. Sections 3 and 5 of this document describe the circumstances under which such a comparison takes place.

For binary comparison operators (=, <, >, <= and >=) if either operand is a column, then the default collation type of the column determines the collation sequence to use for the comparison. If both operands are columns, then the collation type for the left operand determines the collation sequence used. If neither operand is a column, then the BINARY collation sequence is used. For the purposes of this paragraph, a column name preceded by one or more unary "+" operators is considered a column name.

The expression "x BETWEEN y and z" is equivalent to "x >= y AND x <= z". The expression "x IN (SELECT y ...)" is handled in the same way as the expression "x = y" for the purposes of determining the collation sequence to use. The collation sequence used for expressions of the form "x IN (y, z ...)" is the default collation type of x if x is a column, or BINARY otherwise.

An ORDER BY clause that is part of a SELECT statement may be assigned a collation sequence to be used for the sort operation explicitly. In this case the explicit collation sequence is always used. Otherwise, if the expression sorted by an ORDER BY clause is a column, then the default collation type of the column is used to determine sort order. If the expression is not a column, then the BINARY collation sequence is used.

7.2 Collation Sequences Example

The examples below identify the collation sequences that would be used to determine the results of text comparisons that may be performed by various SQL statements. Note that a text comparison may not be required, and no collation sequence used, in the case of numeric, blob or NULL values.

CREATE TABLE t1(
    a,                 -- default collation type BINARY
    b COLLATE BINARY,  -- default collation type BINARY
    c COLLATE REVERSE, -- default collation type REVERSE
    d COLLATE NOCASE   -- default collation type NOCASE
);

-- Text comparison is performed using the BINARY collation sequence.
SELECT (a = b) FROM t1;

-- Text comparison is performed using the NOCASE collation sequence.
SELECT (d = a) FROM t1;

-- Text comparison is performed using the BINARY collation sequence.
SELECT (a = d) FROM t1;

-- Text comparison is performed using the REVERSE collation sequence.
SELECT ('abc' = c) FROM t1;

-- Text comparison is performed using the REVERSE collation sequence.
SELECT (c = 'abc') FROM t1;

-- Grouping is performed using the NOCASE collation sequence (i.e. values
-- 'abc' and 'ABC' are placed in the same group).
SELECT count(*) GROUP BY d FROM t1;

-- Grouping is performed using the BINARY collation sequence.
SELECT count(*) GROUP BY (d || '') FROM t1;

-- Sorting is performed using the REVERSE collation sequence.
SELECT * FROM t1 ORDER BY c;

-- Sorting is performed using the BINARY collation sequence.
SELECT * FROM t1 ORDER BY (c || '');

-- Sorting is performed using the NOCASE collation sequence.
SELECT * FROM t1 ORDER BY c COLLATE NOCASE;


This page last modified 2009/06/08 12:57:30 UTC