--- title: "Fun with bignum: how RSA encryption works" date: "`r Sys.Date()`" vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{Fun with bignum: how RSA encryption works} \usepackage[utf8]{inputenc} output: html_document --- ```{r setup, include=FALSE} library(openssl) knitr::opts_chunk$set(echo = TRUE) ``` Primitive types such as `int` or `double` store numbers in exactly 4 or 8 bytes, with finite precision. This suffices for most applications, but cryptography requires arithmetic on very large numbers, without loss of precision. Therefore OpenSSL uses a __bignum__ data type which holds arbitrary sized integers and implements all basic arithmetic and comparison operators such as `+`, `-`, `*`, `^`, `%%`, `%/%`, `==`, `!=`, `<`, `<=`, `>` and `>=`. One special case, the [__modular exponent__](https://en.wikipedia.org/wiki/Modular_exponentiation) `a^b %% m` can be calculated using `bignum_mod_exp`, even when `b` is too large for calculating `a^b`. ```{r} # create a bignum y <- bignum("123456789123456789") z <- bignum("D41D8CD98F00B204E9800998ECF8427E", hex = TRUE) # size grows print(y * z) # Basic arithmetic div <- z %/% y mod <- z %% y z2 <- div * y + mod stopifnot(z2 == z) stopifnot(div < z) ``` RSA involves a public key and a private key. The public key should be known by everyone and is used for encrypting messages. Messages encrypted with the public key can only be decrypted in a reasonable amount of time using the private key. In RSA, this asymmetry is based on the practical difficulty of factoring the product of two large prime numbers. ### RSA key generation An RSA key-pair is generated as follows (adapted from [wikipedia](https://en.wikipedia.org/wiki/RSA_(cryptosystem))): - Choose two distinct prime numbers $p$ and $q$. Keep these secret. - Compute the product $n = p*q$. This $n$ value is public and used as the modulus. - Compute $\phi(n) = (p − 1)(q − 1)$. - Choose an integer $e$ smaller than $\phi(n)$ such that $e$ and $\phi(n)$ are coprime. OpenSSL always uses $65537$. - Compute a value for $d$ such that $(d * e)\pmod{\phi(n)} = 1$. OpenSSL has a key generator that does these things for us. ```{r} (key <- rsa_keygen(512)) (pubkey <- key$pubkey) ``` Usually we would use `rsa_encrypt` and `rsa_decrypt` to perform the encryption: ```{r} msg <- charToRaw("hello world") ciphertext <- rsa_encrypt(msg, pubkey) rawToChar(rsa_decrypt(ciphertext, key)) ``` Let's look at how this works under the hood. ### How RSA encryption works The `data` field of the private key extracts the underlying bignum integers: ```{r} key$data ``` You can verify that the equations above hold for this key. The public key is simply a subset of the key which only contains $n$ and $e$: ```{r} pubkey$data ``` In order to encrypt a message into ciphertext we have to treat the message data as an integer. The message cannot be larger than the key size. For example convert the text `hello world` into an integer: ```{r} m <- bignum(charToRaw("hello world")) print(m) ``` To encrypt this message $m$ into ciphertext $c$ we calculate $c = m^e\pmod n$. Using the public key from above: ```{r} e <- pubkey$data$e n <- pubkey$data$n c <- (m ^ e) %% n print(c) ``` This number represents our encrypted message! It is usually exchanged using base64 notation for human readability: ```{r} base64_encode(c) ``` The ciphertext can be decrypted using $d$ from the corresponding private key via $m = c^d \pmod{n}$. Note that `c^d` is too large to calculate directly so we need to use `bignum_mod_exp` instead. ```{r} d <- key$data$d out <- bignum_mod_exp(c, d, n) rawToChar(out) ``` The only difference with the actual `rsa_encrypt` and `rsa_decrypt` functions is that these add some additional padding to the data.