Database Programming: Techniques and Issues
We now turn our attention to the techniques that have been developed for accessing databases from programs and, in particular, to the issue of how to access SQL data-bases from application programs. Our presentation of SQL in Chapters 4 and 5 focused on the language constructs for various database operations—from schema definition and constraint specification to querying, updating, and specifying views. Most database systems have an interactive interface where these SQL commands can be typed directly into a monitor for execution by the database system. For example, in a computer system where the Oracle RDBMS is installed, the command SQLPLUS starts the interactive interface. The user can type SQL commands or queries directly over several lines, ended by a semicolon and the Enter key (that is, “; <cr>"). Alternatively, a file of commands can be created and executed through the interactive interface by typing @<filename>. The system will execute the commands written in the file and display the results, if any.
The interactive interface is quite convenient for schema and constraint creation or for occasional ad hoc queries. However, in practice, the majority of database inter-actions are executed through programs that have been carefully designed and tested. These programs are generally known as application programs or database applications, and are used as canned transactions by the end users, as discussed in Section 1.4.3. Another common use of database programming is to access a database through an application program that implements a Web interface, for example, when making airline reservations or online purchases. In fact, the vast majority of Web electronic commerce applications include some database access commands. Chapter 14 gives an overview of Web database programming using PHP, a scripting language that has recently become widely used.
In this section, first we give an overview of the main approaches to database programming. Then we discuss some of the problems that occur when trying to access a database from a general-purpose programming language, and the typical sequence of commands for interacting with a database from a software program.
1. Approaches to Database Programming
Several techniques exist for including database interactions in application pro-grams. The main approaches for database programming are the following:
Embedding database commands in a general-purpose programming language. In this approach, database statements are embedded into the host programming language, but they are identified by a special prefix. For example, the prefix for embedded SQL is the string EXEC SQL, which precedes all SQL commands in a host language program. A precompiler or preproccessor scans the source program code to identify database statements and extract them for processing by the DBMS. They are replaced in the program by function calls to the DBMS-generated code. This technique is generally referred to as embedded SQL.
Using a library of database functions. A library of functions is made avail-able to the host programming language for database calls. For example, there could be functions to connect to a database, execute a query, execute an update, and so on. The actual database query and update commands and any other necessary information are included as parameters in the function calls. This approach provides what is known as an application programming interface (API) for accessing a database from application programs.
Designing a brand-new language. A database programming language is designed from scratch to be compatible with the database model and query language. Additional programming structures such as loops and conditional statements are added to the database language to convert it into a full-fledged programming language. An example of this approach is Oracle’s PL/SQL.
In practice, the first two approaches are more common, since many applications are already written in general-purpose programming languages but require some data-base access. The third approach is more appropriate for applications that have intensive database interaction. One of the main problems with the first two approaches is impedance mismatch, which does not occur in the third approach.
2. Impedance Mismatch
Impedance mismatch is the term used to refer to the problems that occur because of differences between the database model and the programming language model. For example, the practical relational model has three main constructs: columns (attributes) and their data types, rows (also referred to as tuples or records), and tables (sets or multisets of records). The first problem that may occur is that the data types of the programming language differ from the attribute data types that are available in the data model. Hence, it is necessary to have a binding for each host programming language that specifies for each attribute type the compatible programming language types. A different binding is needed for each programming language because different languages have different data types. For example, the data types available in C/C++ and Java are different, and both differ from the SQL data types, which are the standard data types for relational databases.
Another problem occurs because the results of most queries are sets or multisets of tuples (rows), and each tuple is formed of a sequence of attribute values. In the pro-gram, it is often necessary to access the individual data values within individual tuples for printing or processing. Hence, a binding is needed to map the query result data structure, which is a table, to an appropriate data structure in the programming language. A mechanism is needed to loop over the tuples in a query result in order to access a single tuple at a time and to extract individual values from the tuple. The extracted attribute values are typically copied to appropriate program variables for further processing by the program. A cursor or iterator variable is typically used to loop over the tuples in a query result. Individual values within each tuple are then extracted into distinct program variables of the appropriate type.
Impedance mismatch is less of a problem when a special database programming language is designed that uses the same data model and data types as the database model. One example of such a language is Oracle’s PL/SQL. The SQL standard also has a proposal for such a database programming language, known as SQL/PSM. For object databases, the object data model (see Chapter 11) is quite similar to the data model of the Java programming language, so the impedance mismatch is greatly reduced when Java is used as the host language for accessing a Java-compatible object database. Several database programming languages have been implemented as research prototypes (see the Selected Bibliography).
3. Typical Sequence of Interaction in Database Programming
When a programmer or software engineer writes a program that requires access to a database, it is quite common for the program to be running on one computer system while the database is installed on another. Recall from Section 2.5 that a common architecture for database access is the client/server model, where a client program handles the logic of a software application, but includes some calls to one or more database servers to access or update the data. When writing such a pro-gram, a common sequence of interaction is the following:
When the client program requires access to a particular database, the pro-gram must first establish or open a connection to the database server. Typically, this involves specifying the Internet address (URL) of the machine where the database server is located, plus providing a login account name and password for database access.
Once the connection is established, the program can interact with the data-base by submitting queries, updates, and other database commands. In general, most types of SQL statements can be included in an application program.
When the program no longer needs access to a particular database, it should terminate or close the connection to the database.
A program can access multiple databases if needed. In some database programming approaches, only one connection can be active at a time, whereas in other approaches multiple connections can be established simultaneously.
In the next three sections, we discuss examples of each of the three main approaches to database programming. Section 13.2 describes how SQL is embedded into a programming language. Section 13.3 discusses how function calls are used to access the database, and Section 13.4 discusses an extension to SQL called SQL/PSM that allows general-purpose programming constructs for defining modules (procedures and functions) that are stored within the database system. Section 13.5 compares these approaches.