Implementing RSA in Python from Scratch (Part 1)

Implementing RSA in Python from Scratch (Part 1)
Photo by Chris Ried / Unsplash

I'm sure many programmers, particularly web developers have heard of the RSA cryptography system. RSA is an asymmetric cryptography system, meaning that one key is used for encryption and the other for decryption. I've seen a lot of articles explaining the general principles of asymmetric cryptography, but I have not seen any that give easy-to-understand explanations of the mathematical background behind these algorithms, so I decided to write this article.

Explanation of the math behind RSA

For start, as I've mentioned in the paragraph above, to transmit an RSA-encrypted message, you need 2 keys. One that can only encrypt and one that can decrypt. Let encryption key be denoted as `e`, the decryption key will be denoted as `d` and the message will be denoted as `m`. The key principle behind RSA is that in the following notion:

(m^e)^d ≡ m (mod n)

The difficulty of finding `d`, knowing `e` and `n` can be very hard if these numbers are chosen properly (as will be demonstrated below).

But first, we need to introduce some new concepts in number theory.

  • a ≡ b (mod n) means that there is an integer x such that a + nx = b. A more intuitive explanation is that the reminder of a / n equals the reminder of b / n.
  • gcd(a, b) is the greatest number that divides both a and b.
  • lcm(a, b) is the smallest number that is a multiple of both a and b.
  • λ(n) is a number such that x^λ(n) ≡ 1 (mod n) for any x such that gcd(x, n) = 1. From this you can conclude that x^a ≡ x^b (mod n) if a ≡ b (mod λ(n))

Generating keys

Let's make n = pq where p and q are 2 prime numbers. Since λ(p) = p - 1 and λ(q) = q - 1 (lookup Fermat's little theorem proof for why), λ(n) = (p - 1) * (q - 1).

Now we have to slove ed ≡ 1 (mod λ(n)). e can be some prime number (usually 65537) and d can be calculated using extended Euclidian's algorithm (will be written and explained in the code section) from the equation ed + xλ(n) = gcd(e, λ(n)) (d and x are coefficients to be calculated).

pair (e, n) is used for encryption ((m^e) % n) and pair (d, n) is used for decryption ((m^d) % n)

After computing the keys, p and q can be thrown away. Notice that without p and q, finding λ(n) would mean factorizing n, which is not an easy problem to solve for values of n up to 2^2048, which are regularily used.

Implementing RSA in Python

First to list procedures and their steps:

keys generation:

  • find 2 random prime numbers, p and q
  • compute n = p * q and λ(n) = (p - 1) * (q - 1)
  • make e equal some prime number, e.g. e = 35537
  • compute d from equation ed + λ(n)x = gcd(e, λ(n)) = 1 using Extended Euclidian Algorithm (from this point on we will call it eucalg)
Encryption:
  • divide the message into sections (of 256 bits if n is 2048-bit)
  • each section (denoted as m) is encrypted as (m^e) % n

Decryption:

  • divide the message into sections, same as above
  • each section (denoted as m) is decrypted as (m^d) % n

Implementation of Extended Euclidian Algorithm

This algorithm relies on the fact that if x divides both a and b, there will be a pair of coefficients (k, l) such that:
a * k + b * l = x
The algorithm finds these coefficients (and x) by repeatedly substracting the lesser argument from the bigger one, until the lesser one becomes 0. Instead of repeatedly substracting, you can calculate how many times b can be substracted from a (k = a // b) and then substratct b*k. This optimization makes the algorithm run in O(log max(a, b)) which is a lot quicker.
Below is an implementation of the algorithm in Python:

def eucalg(a, b):
	# make a the bigger one and b the lesser one
	swapped = False
	if a < b:
		a, b = b, a
		swapped = True
	# ca and cb store current a and b in form of
	# coefficients with initial a and b
	# a' = ca[0] * a + ca[1] * b
	# b' = cb[0] * a + cb[1] * b
	ca = (1, 0)
	cb = (0, 1)
	while b != 0:
		# k denotes how many times number b
		# can be substracted from a
		k = a // b
		# a  <- b
		# b  <- a - b * k
		# ca <- cb
		# cb <- (ca[0] - k * cb[0], ca[1] - k * cb[1])
		a, b, ca, cb = b, a-b*k, cb, (ca[0]-k*cb[0], ca[1]-k*cb[1])
	if swapped:
		return (ca[1], ca[0])
	else:
		return ca

Notice that the function above can produce negative numbers for coefficients, so in case that happens, all that we need to do is set d to λ(n) - d.

Implementing fast exponentiation with modulo

Some might suggest to just use (b ** e) % n, but this approach is not very good time and memory -wise because b ** e can be very big despite only the last 2000 bits or so being needed. The solution is to reinplement exponentiation with calculating modulo after every division.

Exponentiation implementation below has logarithmic time complexity. Instead of itterating from 1 to e and multiplying r (result) by b, it itterates from the most significant bit of e to the least significant bit of e, and at each bit it does r = r*r + bit. This works because if r equals b^x and you're appending a bit to the end of x, new x will be x * 2 + bit.

def modpow(b, e, n):
	# find length of e in bits
	tst = 1
	siz = 0
	while e >= tst:
		tst <<= 1
		siz += 1
	siz -= 1
	# calculate the result
	r = 1
	for i in range(siz, -1, -1):
		r = (r * r) % n
		if (e >> i) & 1: r = (r * b) % n
	return r

Key generation, encryption and decryption

Keys generation, encryption and decryption have all been explained in the math section and the code below is simply implementation of that.

def keysgen(p, q):
	n = p * q
	lambda_n = (p - 1) * (q - 1)
	e = 35537
	d = eucalg(e, lambda_n)[0]
	if d < 0: d += lambda_n
        # both private and public key must have n stored with them
	return {'priv': (d, n), 'pub': (e, n)}

def numencrypt(m, pub):
	return modpow(m, pub[0], pub[1])

def numdecrypt(m, priv):
	return modpow(m, priv[0], priv[1])

Testing the code we have until now

I stored the code into a file named rsa.py and run the following in Python shell:

>>> import rsa
>>> keys = rsa.keysgen(31337, 31357)
>>> keys
{'priv': (720926705, 982634309), 'pub': (35537, 982634309)}
>>> priv = keys['priv']
>>> pub = keys['pub']
>>> msg = 80087
>>> enc = rsa.numencrypt(msg, priv)
>>> enc
34604568
>>> dec = rsa.numdecrypt(enc, pub)
>>> dec
80087
>>> 

Conclusion

In the end of writing this article I realized that implementation of a usable Python RSA program is more complicated than I initially speculated. Because of that, I decided to split the topic up in multiple articles. With the code presented in this article you have all core parts of RSA written. But you still need a random prime generator and plaintext encryption (numencrypt and numdecrypt are suitable only for numbers m smaller in size than n). All these will be included in the next article that will be published shortly.

[PART 2]