Chapter: Cryptography and Network Security Principles and Practice : Asymmetric Ciphers : Public-Key Cryptography and RSA

The RSA Algorithm

THE RSA ALGORITHM

The pioneering paper by Diffie and Hellman [DIFF76b] introduced a new approach to cryptography and, in effect, challenged cryptologists to come up with a crypto- graphic algorithm that met the requirements for public-key systems. A number of algorithms have been proposed for public-key cryptography. Some of these, though initially promising, turned out to be breakable.4

One of the first successful responses to the challenge was developed in 1977 by Ron Rivest, Adi Shamir, and Len Adleman at MIT and first published in 1978 [RIVE78].5 The Rivest-Shamir-Adleman (RSA) scheme has since that time reigned supreme as the most widely accepted and implemented general-purpose approach to public-key encryption.

The RSA scheme is a block cipher in which the plaintext and ciphertext are integers between 0 and n - 1 for some n. A typical size for n is 1024 bits, or 309 dec- imal digits. That is, n is less than 21024. We examine RSA in this section in some detail, beginning with an explanation of the algorithm. Then we examine some of the computational and cryptanalytical implications of RSA.

Description of the Algorithm

RSA makes use of an expression with exponentials. Plaintext is encrypted in blocks, with each block having a binary value less than some number n. That is, the block size must be less than or equal to log2(n) + 1; in practice, the block size is i bits, where 2i 6 n £ 2i+1. Encryption and decryption are of the following form, for some plaintext block M and ciphertext block C.

C = Me mod n

M = Cd mod n = 1Me d mod n = Med mod n

Both sender and receiver must know the value of n. The sender knows the value of e, and only the receiver knows the value of d. Thus, this is a public-key encryption algorithm with a public key of PU = {e, n} and a private key of PR = {d, n}. For this algorithm to be satisfactory for public-key encryption, the following require- ments must be met.

1. It is possible to find values of e, d, n such that Med mod n = M for all M < n.

2. It is relatively easy to calculate Me mod n and Cd mod n for all values of M < n.

3. It is infeasible to determine d given e and n.

For now, we focus on the first requirement and consider the other questions later. We need to find a relationship of the form

Med mod n = M

The preceding relationship holds if e and d are multiplicative inverses modulo

ϕ(n), where ϕ(n) is the Euler totient function. It is shown in Chapter 8 that for p, q prime, ϕ (pq) = (p - 1)(q - 1). The relationship between e and d can be expressed as

ed mod ϕ(n) = 1 (9.1)

This is equivalent to saying

That is, e and d are multiplicative inverses mod ϕ(n). Note that, according to the rules of modular arithmetic, this is true only if d (and therefore e) is relatively prime to ϕ(n). Equivalently, gcd(ϕ(n), d) = 1. See Appendix 9A for a proof that Equation (9.1) satisfies the requirement for RSA.

We are now ready to state the RSA scheme. The ingredients are the following:

p, q, two prime numbers (private, chosen)

n = pq (public, calculated)

e, with gcd(ϕ(n), e) = 1; 1 < e < ϕ(n) (public, chosen)

d K e-1 (mod ϕ(n)) (private, calculated)

The private key consists of {d, n} and the public key consists of {e, n}. Suppose that user A has published its public key and that user B wishes to send the message M to A. Then B calculates C = Me mod n and transmits C. On receipt of this cipher- text, user A decrypts by calculating M = Cd mod n.

Figure 9.5 summarizes the RSA algorithm. It corresponds to Figure 9.1a: Alice generates a public/private key pair; Bob encrypts using Alice’s public key; and Alice decrypts using her private key. An example from [SING99] is shown in Figure 9.6. For this example, the keys were generated as follows.

1. Select two prime numbers, p = 17 and q = 11.

2. Calculate n = pq = 17 ´ 11 = 187.

3. Calculate ϕ(n) = (p - 1)(q - 1) = 16 ´ 10 = 160.

4. Select e such that e is relatively prime to ϕ(n) = 160 and less than f(n); we choose e = 7.

5. Determine d such that de K 1 (mod 160) and d < 160. The correct value is d = 23, because 23 ´ 7 = 161 = (1 ´ 160) + 1; d can be calculated using the extended Euclid’s algorithm (Chapter 4).

The resulting keys are public key PU = {7, 187} and private key PR = {23, 187}. The example shows the use of these keys for a plaintext input of M = 88. For encryp- tion, we need to calculate C = 887 mod 187. Exploiting the properties of modular arithmetic, we can do this as follows.

887 mod 187 = [(884 mod 187) ´ (882 mod 187)

´ (881 mod 187)] mod 187

881 mod 187 = 88

882 mod 187 = 7744 mod 187 = 77

884 mod 187 = 59,969,536 mod 187 = 132

887 mod 187 = (88 ´ 77 ´ 132) mod 187 = 894,432 mod 187 = 11

For decryption, we calculate M = 1123 mod 187:

1123 mod 187 = [(111 mod 187) ´ (112 mod 187) ´ (114 mod 187)

´ (118 mod 187) ´ (118 mod 187)] mod 187

111 mod 187 = 11

112 mod 187 = 121

114 mod 187 = 14,641 mod 187 = 55

118 mod 187 = 214,358,881 mod 187 = 33

1123 mod 187 = (11 ´ 121 ´ 55 ´ 33 ´ 33) mod 187 = 79,720,245 mod 187 = 88

We now look at an example from [HELL79], which shows the use of RSA to process multiple blocks of data. In this simple example, the plaintext is an alphanumeric string. Each plaintext symbol is assigned a unique code of two decimal digits (e.g., a = 00, A = 26).6 A plaintext block consists of four decimal digits, or two alphanumeric characters. Figure 9.7a illustrates the sequence of events for the encryption of multiple blocks, and Figure 9.7b gives a specific example. The circled numbers indicate the order in which operations are performed.

Computational Aspects

We now turn to the issue of the complexity of the computation required to use RSA. There are actually two issues to consider: encryption/decryption and key genera- tion. Let us look first at the process of encryption and decryption and then consider key generation.

EXPONENTIATION IN MODULAR ARITHMETIC Both encryption and decryption in RSA involve raising an integer to an integer power, mod n. If the exponentiation is done over the integers and then reduced modulo n, the intermediate values would be gargantuan. Fortunately, as the preceding example shows, we can make use of a property of modular arithmetic:

[(a mod n) * (b mod n)] mod n = (a * b) mod n

Thus, we can reduce intermediate results modulo n. This makes the calculation practical.

Another consideration is the efficiency of exponentiation, because with RSA, we are dealing with potentially large exponents. To see how efficiency might be increased, consider that we wish to compute x16. A straightforward approach requires 15 multiplications:

x16 = x * x * x * x * x * x * x * x * x * x * x * x * x * x * x * x

However, we can achieve the same final result with only four multiplications if we repeatedly take the square of each partial result, successively forming (x2, x4, x8, x16). As another example, suppose we wish to calculate x11 mod n for some integers x and

n. Observe that x11 = x1+2+8 = (x)(x2)(x8). In this case, we compute x mod n, x2 mod n, x4 mod n, and x8 mod n and then calculate [(x mod n) ´ (x2 mod n) ´ (x8 mod n)] mod n.

More generally, suppose we wish to find the value ab with a and m positive integers. If we express b as a binary number bkbk-1. . . b0, then we have

We can therefore develop the algorithm7 for computing ab mod n, shown in Figure 9.8. Table 9.4 shows an example of the execution of this algorithm. Note that the variable c is not needed; it is included for explanatory purposes. The final value of c is the value of the exponent.

EFFICIENT OPERATION USING THE PUBLIC KEY To speed up the operation of the RSA algorithm using the public key, a specific choice of e is usually made. The most common choice is 65537 (216 + 1); two other popular choices are 3 and 17. Each of these choices has only two 1 bits, so the number of multiplications required to perform exponentiation is minimized.

However, with a very small public key, such as e = 3, RSA becomes vulnerable to a simple attack. Suppose we have three different RSA users who all use the value e = 3 but have unique values of n, namely (n1, n2, n3). If user A sends the same encrypted message M to all three users, then the three ciphertexts are C1 = M3 mod n1, C2 = M3 mod n2, and C3 = M3 mod n3. It is likely that n1, n2, and n3 are pairwise rela- tively prime. Therefore, one can use the Chinese remainder theorem (CRT) to com- pute M3 mod (n1n2n3). By the rules of the RSA algorithm, M is less than each of the ni; therefore M3 < n1n2n3. Accordingly, the attacker need only compute the cube root of M3. This attack can be countered by adding a unique pseudorandom bit string as padding to each instance of M to be encrypted. This approach is discussed subsequently.

The reader may have noted that the definition of the RSA algorithm (Figure 9.5) requires that during key generation the user selects a value of e that is relatively prime to f(n).Thus, if a value of e is selected first and the primes p and q are generated, it may turn out that gcd(f(n), e) Z 1. In that case, the user must reject the p, q values and generate a new p, q pair.

Table 9.4 Result of the Fast Modular Exponentiation Algorithm for ab mod n, where a = 7,

b = 560 = 1000110000, and n = 561

EFFICIENT OPERATION USING THE PRIVATE KEY We cannot similarly choose a small constant value of d for efficient operation. A small value of d is vulnerable to a brute- force attack and to other forms of cryptanalysis [WIEN90]. However, there is a way to speed up computation using the CRT. We wish to compute the value M = Cd mod n. Let us define the following intermediate results:

Vp = Cd mod p Vq = Cd mod q

Following the CRT using Equation (8.8), define the quantities

Xp = q * (q - 1 mod p) Xq = p * (p - 1 mod q)

The CRT then shows, using Equation (8.9), that

M = (Vp Xp + Vq Xq) mod n

Furthermore, we can simplify the calculation of Vp and Vq using Fermat’s theorem, which states that ap-1 K 1 (mod p) if p and a are relatively prime. Some thought should convince you that the following are valid.

Vp = Cd mod p = Cd mod (p - 1) mod p Vq = Cd mod q = Cd mod (q - 1) mod q

The quantities d mod (p - 1) and d mod (q - 1) can be precalculated. The end result is that the calculation is approximately four times as fast as evaluating M = Cd mod n directly [BONE02].

KEY GENERATION Before the application of the public-key cryptosystem, each participant must generate a pair of keys. This involves the following tasks.

• Determining two prime numbers, p and q.

• Selecting either e or d and calculating the other.

First, consider the selection of p and q. Because the value of n = pq will be known to any potential adversary, in order to prevent the discovery of p and q by exhaustive methods, these primes must be chosen from a sufficiently large set (i.e., p and q must be large numbers). On the other hand, the method used for finding large primes must be reasonably efficient.

At present, there are no useful techniques that yield arbitrarily large primes, so some other means of tackling the problem is needed. The procedure that is generally used is to pick at random an odd number of the desired order of magni- tude and test whether that number is prime. If not, pick successive random numbers until one is found that tests prime.

A variety of tests for primality have been developed (e.g., see [KNUT98] for a description of a number of such tests). Almost invariably, the tests are prob- abilistic. That is, the test will merely determine that a given integer is probably prime. Despite this lack of certainty, these tests can be run in such a way as to make the probability as close to 1.0 as desired. As an example, one of the more efficient and popular algorithms, the Miller-Rabin algorithm, is described in Chapter 8. With this algorithm and most such algorithms, the procedure for test- ing whether a given integer n is prime is to perform some calculation that involves n and a randomly chosen integer a. If n “fails” the test, then n is not prime. If n “passes” the test, then n may be prime or nonprime. If n passes many such tests with many different randomly chosen values for a, then we can have high confidence that n is, in fact, prime.

In summary, the procedure for picking a prime number is as follows.

1. Pick an odd integer n at random (e.g., using a pseudorandom number generator).

2. Pick an integer a < n at random.

3. Perform the probabilistic primality test, such as Miller-Rabin, with a as a parameter. If n fails the test, reject the value n and go to step 1.

4. If n has passed a sufficient number of tests, accept n; otherwise, go to step 2.

This is a somewhat tedious procedure. However, remember that this process is performed relatively infrequently: only when a new pair (PU, PR) is needed.

It is worth noting how many numbers are likely to be rejected before a prime number is found. A result from number theory, known as the prime number theorem, states that the primes near N are spaced on the average one every (ln N) integers. Thus, on average, one would have to test on the order of ln(N) integers before a prime is found. Actually, because all even integers can be immediately rejected, the

correct figure is ln(N)/2. For example, if a prime on the order of magnitude of 2200 were sought, then about ln(2200)/2 = 70 trials would be needed to find a prime.

Having determined prime numbers p and q, the process of key generation is completed by selecting a value of e and calculating d or, alternatively, selecting a value of d and calculating e. Assuming the former, then we need to select an e such that gcd(f(n), e) = 1 and then calculate d K e-1 (mod f(n)). Fortunately, there is a single algorithm that will, at the same time, calculate the greatest common divisor of two integers and, if the gcd is 1, determine the inverse of one of the integers modulo the other. The algorithm, referred to as the extended Euclid’s algorithm, is explained in Chapter 4. Thus, the procedure is to generate a series of random num- bers, testing each against f(n) until a number relatively prime to f(n) is found. Again, we can ask the question: How many random numbers must we test to find a usable number, that is, a number relatively prime to f(n)? It can be shown easily that the probability that two random numbers are relatively prime is about 0.6; thus, very few tests would be needed to find a suitable integer (see Problem 8.2).

The Security of RSA

Four possible approaches to attacking the RSA algorithm are

• Brute force: This involves trying all possible private keys.

• Mathematical attacks: There are several approaches, all equivalent in effort to factoring the product of two primes.

• Timing attacks: These depend on the running time of the decryption algorithm.

• Chosen ciphertext attacks: This type of attack exploits properties of the RSA algorithm.

The defense against the brute-force approach is the same for RSA as for other cryptosystems, namely, to use a large key space. Thus, the larger the number of bits in d, the better. However, because the calculations involved, both in key generation and in encryption/decryption, are complex, the larger the size of the key, the slower the system will run.

In this subsection, we provide an overview of mathematical and timing attacks.

THE FACTORING PROBLEM We can identify three approaches to attacking RSA mathematically.

1. Factor n into its two prime factors. This enables calculation of ϕ(n) = (p - 1) ´

(q - 1), which in turn enables determination of d K e-1 (mod ϕ(n)).

2. Determine ϕ(n) directly, without first determining p and q. Again, this enables determination of d K e-1 (mod ϕ(n)).

3. Determine d directly, without first determining ϕ(n).

Most discussions of the cryptanalysis of RSA have focused on the task of fac- toring n into its two prime factors. Determining f(n) given n is equivalent to factor- ing n [RIBE96]. With presently known algorithms, determining d given e and n appears to be at least as time-consuming as the factoring problem [KALI95]. Hence, we can use factoring performance as a benchmark against which to evaluate the security of RSA.

For a large n with large prime factors, factoring is a hard problem, but it is not as hard as it used to be. A striking illustration of this is the following. In 1977, the three inventors of RSA dared Scientific American readers to decode a cipher they printed in Martin Gardner’s “Mathematical Games” column [GARD77]. They offered a $100 reward for the return of a plaintext sentence, an event they predicted might not occur for some 40 quadrillion years. In April of 1994, a group working over the Internet claimed the prize after only eight months of work [LEUT94]. This challenge used a public key size (length of n) of 129 decimal digits, or around 428 bits. In the meantime, just as they had done for DES, RSA Laboratories had issued challenges for the RSA cipher with key sizes of 100, 110, 120, and so on, digits. The latest challenge to be met is the RSA-200 challenge with a key length of 200 decimal digits, or about 663 bits. Table 9.5 shows the results to date. The level of effort is measured in MIPS-years: a

Table 9.5 Progress in Factorization

million-instructions-per-second processor running for one year, which is about 3 ´x 1013 instructions executed. A 1 GHz Pentium is about a 250-MIPS machine.

A striking fact about Table 9.5 concerns the method used. Until the mid-1990s, factoring attacks were made using an approach known as the quadratic sieve. The attack on RSA-130 used a newer algorithm, the generalized number field sieve (GNFS), and was able to factor a larger number than RSA-129 at only 20% of the computing effort.

The threat to larger key sizes is twofold: the continuing increase in computing power and the continuing refinement of factoring algorithms. We have seen that the move to a different algorithm resulted in a tremendous speedup. We can expect further refinements in the GNFS, and the use of an even better algorithm is also a possibility. In fact, a related algorithm, the special number field sieve (SNFS), can factor numbers with a specialized form considerably faster than the generalized number field sieve. Figure 9.9 compares the performance of the two algorithms. It is reasonable to expect a breakthrough that would enable a general factoring performance in about the same time as SNFS, or even better [ODLY95]. Thus, we need to be careful in choosing a key size for RSA. For the near future, a key size in the range of 1024 to 2048 bits seems reasonable.

In addition to specifying the size of n, a number of other constraints have been suggested by researchers. To avoid values of n that may be factored more easily, the algorithm’s inventors suggest the following constraints on p and q.

1. p and q should differ in length by only a few digits. Thus, for a 1024-bit key (309 decimal digits), both p and q should be on the order of magnitude of 1075 to 10100.

2. Both (p - 1) and (q - 1) should contain a large prime factor.

3. gcd( p - 1, q - 1) should be small.

In addition, it has been demonstrated that if e < n and d < n1/4, then d can be easily determined [WIEN90].

TIMING ATTACKS If one needed yet another lesson about how difficult it is to assess the security of a cryptographic algorithm, the appearance of timing attacks provides a stunning one. Paul Kocher, a cryptographic consultant, demonstrated that a snooper can determine a private key by keeping track of how long a computer takes to decipher messages [KOCH96, KALI96b]. Timing attacks are applicable not just to RSA, but to other public-key cryptography systems. This attack is alarming for two reasons: It comes from a completely unexpected direction, and it is a ciphertext-only attack.

A timing attack is somewhat analogous to a burglar guessing the combination of a safe by observing how long it takes for someone to turn the dial from number to number. We can explain the attack using the modular exponentiation algorithm of Figure 9.8, but the attack can be adapted to work with any implementation that does not run in fixed time. In this algorithm, modular exponentiation is accomplished bit by bit, with one modular multiplication performed at each iteration and an addi- tional modular multiplication performed for each 1 bit.

As Kocher points out in his paper, the attack is simplest to understand in an extreme case. Suppose the target system uses a modular multiplication function that is very fast in almost all cases but in a few cases takes much more time than an entire average modular exponentiation. The attack proceeds bit-by-bit starting with the

leftmost bit, bk. Suppose that the first j bits are known (to obtain the entire exponent, start with j = 0 and repeat the attack until the entire exponent is known). For a given ciphertext, the attacker can complete the first j iterations of the for loop. The operation of the subsequent step depends on the unknown exponent bit. If the bit is set, d ; (d ´ a) mod n will be executed. For a few values of a and d, the modular multiplication will be extremely slow, and the attacker knows which these are.Therefore, if the observed time to execute the decryption algorithm is always slow when this particular iteration is slow with a 1 bit, then this bit is assumed to be 1. If a number of observed execution times for the entire algorithm are fast, then this bit is assumed to be 0.

In practice, modular exponentiation implementations do not have such extreme timing variations, in which the execution time of a single iteration can exceed the mean execution time of the entire algorithm. Nevertheless, there is enough variation to make this attack practical. For details, see [KOCH96].

Although the timing attack is a serious threat, there are simple countermea- sures that can be used, including the following.

• Constant exponentiation time: Ensure that all exponentiations take the same amount of time before returning a result. This is a simple fix but does degrade performance.

• Random delay: Better performance could be achieved by adding a random delay to the exponentiation algorithm to confuse the timing attack. Kocher points out that if defenders don’t add enough noise, attackers could still succeed by collecting additional measurements to compensate for the random delays.

• Blinding: Multiply the ciphertext by a random number before performing exponentiation. This process prevents the attacker from knowing what cipher- text bits are being processed inside the computer and therefore prevents the bit-by-bit analysis essential to the timing attack.

RSA Data Security incorporates a blinding feature into some of its products.

The private-key operation M = Cd mod n is implemented as follows.

1. Generate a secret random number r between 0 and n - 1.

2. Compute C' = C(re) mod n, where e is the public exponent.

3. Compute M' = (C')d mod n with the ordinary RSA implementation.

4. Compute M = M'r-1 mod n. In this equation, r-1 is the multiplicative inverse of r mod n; see Chapter 4 for a discussion of this concept. It can be demonstrated that this is the correct result by observing that red mod n = r mod n.

RSA Data Security reports a 2 to 10% performance penalty for blinding.

CHOSEN CIPHERTEXT ATTACK AND OPTIMAL ASYMMETRIC ENCRYPTION PADDING The

basic RSA algorithm is vulnerable to a chosen ciphertext attack (CCA). CCA is defined as an attack in which the adversary chooses a number of ciphertexts and is then given the corresponding plaintexts, decrypted with the target’s private key. Thus, the adversary could select a plaintext, encrypt it with the target’s public key, and then be able to get the plaintext back by having it decrypted with the private key. Clearly, this provides the adversary with no new information. Instead, the adversary exploits properties of RSA and selects blocks of data that, when processed using the target’s private key, yield information needed for cryptanalysis.

A simple example of a CCA against RSA takes advantage of the following property of RSA:

E(PU, M1) ´ E(PU, M2) = E(PU, [M1 ´ M2]) (9.2)

We can decrypt C = Me mod n using a CCA as follows.

1. Compute X = (C ´ 2e) mod n.

2. Submit X as a chosen ciphertext and receive back Y = Xd mod n. But now note that

X = (C mod n) * (2e mod n)

= (Me mod n) * (2e mod n)

= (2M)e mod n

Therefore, Y = (2M) mod n. From this, we can deduce M. To overcome this simple attack, practical RSA-based cryptosystems randomly pad the plaintext prior to encryption. This randomizes the ciphertext so that Equation (9.2) no longer holds. However, more sophisticated CCAs are possible, and a simple padding with a random value has been shown to be insufficient to provide the desired security. To counter such attacks, RSA Security Inc., a leading RSA vendor and former holder of the RSA patent, recommends modifying the plaintext using a procedure known as optimal asymmetric encryption padding (OAEP). A full discussion of the threats and OAEP are beyond our scope; see [POIN02] for an introduction and [BELL94a] for a thorough analysis. Here, we simply summarize the OAEP procedure.

Figure 9.10 depicts OAEP encryption. As a first step, the message M to be encrypted is padded.A set of optional parameters, P, is passed through a hash function, H.8 The output is then padded with zeros to get the desired length in the overall data

block (DB). Next, a random seed is generated and passed through another hash func- tion, called the mask generating function (MGF). The resulting hash value is bit-by-bit XORed with DB to produce a maskedDB. The maskedDB is in turn passed through the MGF to form a hash that is XORed with the seed to produce the masked seed.The concatenation of the maskedseed and the maskedDB forms the encoded message EM. Note that the EM includes the padded message, masked by the seed, and the seed, masked by the maskedDB. The EM is then encrypted using RSA.

Study Material, Lecturing Notes, Assignment, Reference, Wiki description explanation, brief detail

Cryptography and Network Security Principles and Practice : Asymmetric Ciphers : Public-Key Cryptography and RSA : The RSA Algorithm |