Chapter: Fundamentals of Database Systems : Transaction Processing, Concurrency Control, and Recovery : Introduction to Transaction Processing Concepts and Theory

Characterizing Schedules Based on Recoverability

1. Schedules (Histories) of Transactions 2. Characterizing Schedules Based on Recoverability

Characterizing Schedules Based on Recoverability

When transactions are executing concurrently in an interleaved fashion, then the order of execution of operations from all the various transactions is known as a schedule (or history). In this section, first we define the concept of schedules, and then we characterize the types of schedules that facilitate recovery when failures occur. In Section 21.5, we characterize schedules in terms of the interference of par-ticipating transactions, leading to the concepts of serializability and serializable schedules.

1. Schedules (Histories) of Transactions

A schedule (or history) S of n transactions T₁, T₂, ..., T_n is an ordering of the operations of the transactions. Operations from different transactions can be interleaved in the schedule S. However, for each transaction T_i that participates in the schedule S, the operations of T_i in S must appear in the same order in which they occur in T_i. The order of operations in S is considered to be a total ordering, meaning that for any two operations in the schedule, one must occur before the other. It is possible theoretically to deal with schedules whose operations form partial orders (as we discuss later), but we will assume for now total ordering of the operations in a schedule.

For the purpose of recovery and concurrency control, we are mainly interested in the read_item and write_item operations of the transactions, as well as the commit and abort operations. A shorthand notation for describing a schedule uses the symbols b, r, w, e, c, and a for the operations begin_transaction, read_item, write_item, end_transaction, commit, and abort, respectively, and appends as a subscript the transaction id (transaction number) to each operation in the schedule. In this notation, the data-base item X that is read or written follows the r and w operations in parentheses. In some schedules, we will only show the read and write operations, whereas in other schedules, we will show all the operations. For example, the schedule in Figure 21.3(a), which we shall call S_a, can be written as follows in this notation:

S_a: r₁(X); r₂(X); w₁(X); r₁(Y); w₂(X); w₁(Y);

Similarly, the schedule for Figure 21.3(b), which we call S_b, can be written as follows, if we assume that transaction T₁ aborted after its read_item(Y) operation:

S_b: r₁(X); w₁(X); r₂(X); w₂(X); r₁(Y); a₁;

Two operations in a schedule are said to conflict if they satisfy all three of the fol-lowing conditions: (1) they belong to different transactions; (2) they access the same item X; and (3) at least one of the operations is a write_item(X). For example, in schedule S_a, the operations r₁(X) and w₂(X) conflict, as do the operations r₂(X) and w₁(X), and the operations w₁(X) and w₂(X). However, the operations r₁(X) and r₂(X) do not conflict, since they are both read operations; the operations w₂(X) and w₁(Y) do not conflict because they operate on distinct data items X and Y; and the operations r₁(X) and w₁(X) do not conflict because they belong to the same transaction.

Intuitively, two operations are conflicting if changing their order can result in a dif-ferent outcome. For example, if we change the order of the two operations r₁(X); w₂(X) to w₂(X); r₁(X), then the value of X that is read by transaction T₁ changes, because in the second order the value of X is changed by w₂(X) before it is read by r₁(X), whereas in the first order the value is read before it is changed. This is called a read-write conflict. The other type is called a write-write conflict, and is illustrated by the case where we change the order of two operations such as w₁(X); w₂(X) to w₂(X); w₁(X). For a write-write conflict, the last value of X will differ because in one case it is written by T₂ and in the other case by T₁. Notice that two read operations are not conflicting because changing their order makes no difference in outcome.

The rest of this section covers some theoretical definitions concerning schedules. A schedule S of n transactions T₁, T₂, ..., T_n is said to be a complete schedule if the following conditions hold:

1. The operations in S are exactly those operations in T₁, T₂, ..., T_n, including a commit or abort operation as the last operation for each transaction in the schedule.

2. For any pair of operations from the same transaction T_i, their relative order of appearance in S is the same as their order of appearance in T_i.

3. For any two conflicting operations, one of the two must occur before the other in the schedule.

The preceding condition (3) allows for two nonconflicting operations to occur in the schedule without defining which occurs first, thus leading to the definition of a schedule as a partial order of the operations in the n transactions.¹¹ However, a total order must be specified in the schedule for any pair of conflicting operations (condition 3) and for any pair of operations from the same transaction (condition 2). Condition 1 simply states that all operations in the transactions must appear in the complete schedule. Since every transaction has either committed or aborted, a complete schedule will not contain any active transactions at the end of the schedule.

In general, it is difficult to encounter complete schedules in a transaction processing system because new transactions are continually being submitted to the system. Hence, it is useful to define the concept of the committed projection C(S) of a schedule S, which includes only the operations in S that belong to committed trans-actions—that is, transactions T_i whose commit operation c_i is in S.

2. Characterizing Schedules Based on Recoverability

For some schedules it is easy to recover from transaction and system failures, whereas for other schedules the recovery process can be quite involved. In some cases, it is even not possible to recover correctly after a failure. Hence, it is important to characterize the types of schedules for which recovery is possible, as well as those for which recovery is relatively simple. These characterizations do not actually pro-vide the recovery algorithm; they only attempt to theoretically characterize the different types of schedules.

First, we would like to ensure that, once a transaction T is committed, it should never be necessary to roll back T. This ensures that the durability property of trans-actions is not violated (see Section 21.3). The schedules that theoretically meet this criterion are called recoverable schedules; those that do not are called nonrecoverable and hence should not be permitted by the DBMS. The definition of recoverable schedule is as follows: A schedule S is recoverable if no transaction T in S commits until all transactions T that have written some item X that T reads have committed. A transaction T reads from transaction T in a schedule S if some item X is first written by T and later read by T. In addition, T should not have been aborted before T reads item X, and there should be no transactions that write X after T writes it and before T reads it (unless those transactions, if any, have aborted before T reads X).

Some recoverable schedules may require a complex recovery process as we shall see, but if sufficient information is kept (in the log), a recovery algorithm can be devised for any recoverable schedule. The (partial) schedules S_a and S_b from the preceding section are both recoverable, since they satisfy the above definition. Consider the schedule S_a given below, which is the same as schedule S_a except that two commit operations have been added to S_a:

S_a : r₁(X); r₂(X); w₁(X); r₁(Y); w₂(X); c₂; w₁(Y); c₁;

S_a is recoverable, even though it suffers from the lost update problem; this problem is handled by serializability theory (see Section 21.5). However, consider the two (partial) schedules S_c and S_d that follow:

S_c: r₁(X); w₁(X); r₂(X); r₁(Y); w₂(X); c₂; a₁;

S_d: r₁(X); w₁(X); r₂(X); r₁(Y); w₂(X); w₁(Y); c₁; c₂; S_e: r₁(X); w₁(X); r₂(X); r₁(Y); w₂(X); w₁(Y); a₁; a₂;

S_c is not recoverable because T₂ reads item X from T₁, but T₂ commits before T₁ commits. The problem occurs if T₁ aborts after the c₂ operation in S_c, then the value of X that T₂ read is no longer valid and T₂ must be aborted after it is committed, leading to a schedule that is not recoverable. For the schedule to be recoverable, the c₂ operation in S_c must be postponed until after T₁ commits, as shown in S_d. If T₁ aborts instead of committing, then T₂ should also abort as shown in S_e, because the value of X it read is no longer valid. In S_e, aborting T₂ is acceptable since it has not committed yet, which is not the case for the nonrecoverable schedule S_c.

In a recoverable schedule, no committed transaction ever needs to be rolled back, and so the definition of committed transaction as durable is not violated. However, it is possible for a phenomenon known as cascading rollback (or cascading abort) to occur in some recoverable schedules, where an uncommitted transaction has to be rolled back because it read an item from a transaction that failed. This is illustrated in schedule S_e, where transaction T₂ has to be rolled back because it read item X from T₁, and T₁ then aborted.

Because cascading rollback can be quite time-consuming—since numerous transactions can be rolled back (see Chapter 23)—it is important to characterize the sched ules where this phenomenon is guaranteed not to occur. A schedule is said to be cascadeless, or to avoid cascading rollback, if every transaction in the schedule reads only items that were written by committed transactions. In this case, all items read will not be discarded, so no cascading rollback will occur. To satisfy this criterion, the r₂(X) command in schedules S_d and S_e must be postponed until after T₁ has commit-ted (or aborted), thus delaying T₂ but ensuring no cascading rollback if T₁ aborts.

Finally, there is a third, more restrictive type of schedule, called a strict schedule, in which transactions can neither read nor write an item X until the last transaction that wrote X has committed (or aborted). Strict schedules simplify the recovery process. In a strict schedule, the process of undoing a write_item(X) operation of an aborted transaction is simply to restore the before image (old_value or BFIM) of data item X. This simple procedure always works correctly for strict schedules, but it may not work for recoverable or cascadeless schedules. For example, consider schedule S_f :

S_f : w₁(X, 5); w₂(X, 8); a₁;

Suppose that the value of X was originally 9, which is the before image stored in the system log along with the w₁(X, 5) operation. If T₁ aborts, as in S_f , the recovery pro-cedure that restores the before image of an aborted write operation will restore the value of X to 9, even though it has already been changed to 8 by transaction T₂, thus leading to potentially incorrect results. Although schedule S_f is cascadeless, it is not a strict schedule, since it permits T₂ to write item X even though the transaction T₁ that last wrote X had not yet committed (or aborted). A strict schedule does not have this problem.

It is important to note that any strict schedule is also cascadeless, and any cascade-less schedule is also recoverable. Suppose we have i transactions T₁, T₂, ..., T_i, and their number of operations are n₁, n₂, ..., n_i, respectively. If we make a set of all possible schedules of these transactions, we can divide the schedules into two disjoint subsets: recoverable and nonrecoverable. The cascadeless schedules will be a subset of the recoverable schedules, and the strict schedules will be a subset of the cascade-less schedules. Thus, all strict schedules are cascadeless, and all cascadeless schedules are recoverable.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Fundamentals of Database Systems : Transaction Processing, Concurrency Control, and Recovery : Introduction to Transaction Processing Concepts and Theory : Characterizing Schedules Based on Recoverability |