SQL Data Definition and Data Types
SQL uses the terms table, row, and column for the formal relational model terms relation, tuple, and attribute, respectively. We will use
the corresponding terms inter-changeably. The main SQL command for data
definition is the CREATE statement, which can be used to create schemas, tables (relations), and
domains (as well as other constructs such as views, assertions, and triggers).
Before we describe the rel-evant CREATE
statements, we discuss schema and catalog concepts in Section 4.1.1 to place
our discussion in perspective. Section 4.1.2 describes how tables are created,
and Section 4.1.3 describes the most important data types available for
attribute specification. Because the SQL specification is very large, we give a
description of the most important features. Further details can be found in the
various SQL stan-dards documents (see end-of-chapter bibliographic notes).
1. Schema and Catalog
Concepts in SQL
Early versions of SQL did not include the concept of a relational
database schema; all tables (relations) were considered part of the same
schema. The concept of an SQL schema was incorporated starting with SQL2 in
order to group together tables and other constructs that belong to the same
database application. An SQL schema
is identified by a schema name, and
includes an authorization identifier
to indicate the user or account who owns the schema, as well as descriptors for each element in the schema. Schema elements include tables, constraints, views, domains, and other
constructs (such as authorization grants) that describe the schema. A schema is
created via the CREATE
SCHEMA statement, which can include all
the schema elements’ definitions. Alternatively, the schema can be assigned a
name and authorization identifier, and the elements can be defined later. For
example, the following statement creates a schema called COMPANY, owned by the user with authorization iden-tifier ‘Jsmith’. Note that
each statement in SQL ends with a semicolon.
CREATE SCHEMA COMPANY
AUTHORIZATION ‘Jsmith’;
In general, not all users are authorized to create schemas and schema
elements. The privilege to create schemas, tables, and other constructs must be
explicitly granted to the relevant user accounts by the system administrator or
DBA.
In addition to the concept of a schema, SQL uses the concept of a catalog—a named collection of schemas
in an SQL environment. An SQL environment
is basically an installation of an SQL-compliant RDBMS on a computer system. A catalog always contains a
special schema called INFORMATION_SCHEMA, which provides information on
all the schemas in the catalog and all the element descriptors in these
schemas. Integrity constraints such as referential integrity can be defined
between relations only if they exist in schemas within the same catalog.
Schemas within the same catalog can also share certain elements, such as domain
definitions.
2. The CREATE TABLE
Command in SQL
The CREATE
TABLE command is used to specify a new relation by
giving it a name and specifying its attributes and initial constraints. The
attributes are specified first, and each attribute is given a name, a data type
to specify its domain of values, and any attribute constraints, such as NOT NULL. The key, entity integrity, and referential integrity constraints can
be specified within the CREATE TABLE statement after the attributes
are declared, or they can be added later using the ALTER TABLE command (see Chapter 5). Figure 4.1 shows sample data definition
statements in SQL for the COMPANY relational database schema shown
in Figure 3.7.
Typically, the SQL schema in which the relations are declared is
implicitly specified in the environment in which the CREATE TABLE statements are executed. Alternatively, we can explicitly attach the
schema name to the relation name, sepa-rated by a period. For example, by
writing
CREATE TABLE COMPANY.EMPLOYEE
...
rather than
CREATE TABLE EMPLOYEE ...
as in Figure 4.1, we can explicitly (rather than implicitly) make the EMPLOYEE table part of the COMPANY schema.
The relations declared through CREATE TABLE
statements are called base tables
(or base relations); this means that the relation and its tuples are actually
created and stored as a file by the DBMS. Base relations are distinguished from
virtual relations, created through
the CREATE
VIEW statement (see Chapter 5), which may or may not
correspond to an actual physical file. In SQL, the attributes in a base table
are considered to be ordered in the
sequence in which they are specified in the CREATE TABLE statement. However, rows (tuples) are not considered to be ordered
within a
relation.
It is important to note that in Figure 4.1, there are some foreign keys that may cause errors because they are specified either
via circular references or because they refer to a table that has not yet been created. For example, the foreign
key Super_ssn in the EMPLOYEE table is a circular reference because it refers to the table itself.
The foreign key Dno in the EMPLOYEE table refers to the DEPARTMENT table, which has
not been created yet. To deal with this type of problem, these constraints
can be left out of the initial CREATE TABLE statement, and then added later
using the ALTER TABLE
statement (see Chapter 5). We displayed all the
foreign keys in Figure 4.1 to show the complete COMPANY schema in one place.
3. Attribute Data Types
and Domains in SQL
The basic data types
available for attributes include numeric, character string, bit string,
Boolean, date, and time.
Numeric data types include integer
numbers of various sizes (INTEGER or INT, and
SMALLINT) and floating-point (real) numbers
of various precision (FLOAT or REAL, and DOUBLE
PRECISION). Formatted numbers can be
declared by using DECIMAL(i,j)—or DEC(i,j) or NUMERIC(i,j)—where i, the precision, is the total number of
decimal digits and j, the scale, is the number of digits after the decimal point. The
default for scale is zero, and the default for precision is
implementation-defined.
Character-string data types are either fixed length—CHAR(n) or CHARACTER(n), where n is the number of characters—or varying length— VARCHAR(n) or CHAR VARYING(n) or CHARACTER VARYING(n), where n is the maximum number of characters. When specifying a literal string value, it is placed between single quotation marks (apostrophes), and it is case sensi-tive (a distinction is made between uppercase and lowercase). For fixed-length strings, a shorter string is padded with blank characters to the right. For example, if the value ‘Smith’ is for an attribute of type CHAR(10), it is padded with five blank characters to become ‘Smith ’ if needed. Padded blanks are generally ignored when strings are compared. For comparison purposes, strings are considered ordered in alphabetic (or lexicographic) order; if a string str1 appears before another string str2 in alphabetic order, then str1 is considered to be less than str2. There is also a concatenation operator denoted by || (double vertical bar) that can concatenate two strings in SQL. For example, ‘abc’ || ‘XYZ’ results in a single string ‘abcXYZ’. Another variable-length string data type called CHARACTER LARGE OBJECT or CLOB is also available to specify columns that have large text values, such as documents. The CLOB maximum length can be specified in kilobytes (K), megabytes (M), or gigabytes (G). For example, CLOB(20M) specifies a max-imum length of 20 megabytes.
Bit-string data types are either of fixed length n—BIT(n)—or varying length—BIT VARYING(n), where n is the maximum number of bits. The default for n, the length of a character string or
bit string, is 1. Literal bit strings are placed between single quotes but
preceded by a B to distinguish them from character strings; for example, B‘10101’. Another variable-length bitstring data type called BINARY LARGE OBJECT or BLOB is also available to specify columns that have large binary values,
such as images. As for CLOB, the maximum length of a BLOB can be specified in kilobits (K), megabits
(M), or gigabits (G). For example, BLOB(30G)
specifies a maxi-mum length of 30 gigabits.
A Boolean data type has the traditional values of TRUE or FALSE. In SQL, because of the presence of NULL values,
a three-valued logic is used, so a third possible value for a Boolean data type
is UNKNOWN. We discuss the need for UNKNOWN and the
three-valued logic in Chapter 5.
The DATE data type has ten positions, and its components are YEAR, MONTH, and
DAY in the form YYYY-MM-DD. The TIME data type has at least eight positions, with the
components HOUR, MINUTE, and SECOND in the form HH:MM:SS. Only valid dates and times should be allowed by
the SQL implementation. This implies that months should be between 1 and 12 and
dates must be between 1 and 31; furthermore, a date should be a valid date for
the corresponding month. The < (less than) comparison can be used with dates
or times—an earlier date is
considered to be smaller than a later date, and similarly with time. Literal
values are represented by single-quoted strings preceded by the keyword DATE or TIME; for example, DATE ‘2008-09-27’ or TIME ‘09:12:47’. In addition, a data type TIME(i), where i is called time fractional seconds precision, specifies i + 1 additional positions for TIME—one position for an additional period (.)
separator character, and i positions
for specifying decimal fractions of a second. A TIME WITH TIME ZONE data type includes an additional six positions for specifying the displacement from the standard universal
time zone, which is in the range +13:00 to –12:59 in units of HOURS:MINUTES. If WITH
TIME ZONE is not included, the default is
the local time zone for the SQL session.
Some additional data types are discussed below. The list of types
discussed here is not exhaustive; different implementations have added more
data types to SQL.
A timestamp data type (TIMESTAMP)
includes the DATE and TIME fields, plus a minimum of six positions for decimal fractions of
seconds and an optional WITH TIME ZONE qualifier. Literal values are
represented by single-quoted strings preceded by the keyword TIMESTAMP, with a blank space between data and time; for example, TIMESTAMP ‘2008-09-27 09:12:47.648302’.
Another data type related to DATE, TIME, and TIMESTAMP is the INTERVAL data type. This specifies an interval—a
relative value that can be used to
increment or decrement an absolute value of a date, time, or timestamp.
Intervals are qualified to be either YEAR/MONTH intervals
or DAY/TIME intervals.
The format of DATE, TIME, and TIMESTAMP can be considered as a special type of string. Hence, they can
generally be used in string comparisons by being cast (or coerced or
converted) into the equivalent strings.
It is possible to specify the data type of each attribute directly, as
in Figure 4.1; alter-natively, a domain can be declared, and the domain name
used with the attribute specification. This makes it easier to change the data
type for a domain that is used by numerous attributes in a schema, and improves
schema readability. For example, we can create a domain SSN_TYPE by the following statement:
CREATE DOMAIN SSN_TYPE AS
CHAR(9);
We can use SSN_TYPE in place of CHAR(9) in Figure 4.1 for the
attributes Ssn and
Super_ssn of
EMPLOYEE, Mgr_ssn of DEPARTMENT,
Essn of WORKS_ON, and Essn of DEPENDENT. A domain can also have an optional default specification via a DEFAULT clause, as we discuss later for attributes. Notice that domains may not
be available in some implementations of SQL.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2023 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.