# Demystifying the LFSR

, Programming

Are linear feedback shift registers really that hard to understand? I think not! All you need to know to understand LFSRs is high-school level mathematics and an understanding of binary:

• You know basic algebra and you can work with polynomials like x^2 + x + 1.
• You know how to do long division—or at least, you vaguely remember what long division is, and you can muddle through it.
• You know what prime numbers are, and how any number can be factored into prime factors, like 20 = 2^2 \cdot 5.
• You know some digital logic, like what “XOR” means.
• If you know some group theory, great! Some of this will be familiar.

## What is an LFSR?

A linear feedback shift register is a simple way to generate a sequence of pseudorandom bits by taking a register, shifting its contents by one bit, and mixing in some new data using an XOR operation. This is not a rigorous definition, I just want to get you thinking about how an LFSR actually works—either as a digital circuit or as software.

LFSRs are often used as random number generators, especially in constrained software and hardware environments. If you travel back in time to the 1980s or 1970s, the LFSR was an easy and reliable way to generate random numbers on 8-bit CPUs like the 8080 or 6502. If you’re working in hardware, you can implement an LFSR with only a handful of gates and get a fast, reliable random number generator.

There are two different ways to set up an LFSR: the “Fibonacci LFSR” and the “Galois LFSR”. Both versions work by taking certain bits from the shift register, XORing them together, and putting them back into the shift register. The two versions just work in opposite directions.

How to Pronounce “Fibonacci” and “Galois”

“Fibonacci” is pronounced:

• /ˌfɪbəˈnɑːtʃi/ (or /ˌfiːb-/) in English, “fi-buh-na-chee”, and
• /fi.boˈnat.t͡ʃi/ in Italian.

“Galois” is pronounced:

• /ɡælˈwɑː/ in English, “gal-waa”, and
• /ɡaˈlwa/ in French.

These are both LFSRs. It turns out that the Galois configuration is easier to implement in software, and it has higher performance when implemented in hardware, so I’m just going to focus on the Galois configuration. The Galois configuration and the Fibonacci configuration are equivalent, but for this article, I’m just going to ignore the Fibonacci configuration.

### Code

It is simple to implement an LFSR in code. I’m choosing to use the Galois configuration, and to shift the register left each step.

#include <stdint.h>

// Advance the LFSR by one step.
uint32_t lfsr_advance(uint32_t value, int width, uint32_t taps) {
const uint32_t high_bit = (uint32_t)1 << width;
value <<= 1;
// If a 1 bit gets shifted out, then...
if ((value & high_bit) != 0) {
// Clear that bit, and flip the taps.
value ^= high_bit | taps;
}
return value;
}

You’ll often hard-code the width of the LFSR and the taps, because it’s not something most people care to configure.

// Calculate the next step of a 16-bit LFSR with the taps at
// position 0, 2, 3, and 5. This LFSR has maximum period.
//
// These taps are 101101 in binary, or 0x2d in hexadecimal.
}

This works as a reasonable random number generator, although it only generates one bit at a time.

I chose to shift the LFSR to the left to make the explanation a little easier. You could just as easily shift to the right instead. It will produce the same stream of bits, in the same order, but the bits in the register will be numbered in reverse.

// Advance a 16-bit LFSR by one step, shifting to the right. This is
// the same 16-bit LFSR as above, with taps at position 0, 2, 3, and
// 5, but the taps are numbered in the opposite direction.
uint32_t new_value = value >> 1;
if ((value & 1) != 0) {
new_value ^= 0xb400;
}
return new_value;
}

## Questions

The questions we have are:

• Where do we put the taps?
• How many times can you shift before it starts repeating? (What is the period of the shift register?)

Just based on our knowledge of digital logic, we can figure out that if there are k bits in the shift register, there are 2^k possible states, and the maximum possible period cannot 2^k.

We also realize that if the register contains all zeroes, then the next state will also be all zeroes. If our shift register gets into the zero state, it gets stuck. That means that, at best, our shift register could only loop through the other 2^k-1 possible states. Is it possible to make a shift register that has a period of 2^k-1?

First of all, we can crib that answer from Wikipedia. Wikipedia says that we can construct an LFSR with maximum period if the “reciprocal characteristic polynomial” is a “primitive polynomial”. If you just need to implement an LFSR, you can get a list of “primitive polynomials” from Xilinx (Application Note XAPP-052) for any length up to 168 bits, and you’re done!

Read on if you want to understand what that means and learn how to construct your own primitive polynomials.

## Brute Force

Of course, you don’t actually need to know about polynomials, for smaller LFSRs. You can find out the period of an LFSR easily enough by running it until it repeats.

#include <stdlib.h>

// Calculate the length of an LFSR's period.
int lfsr_period(uint32_t initial_state, int width, uint32_t taps) {
// Record when each state is first visited.
int *states = calloc(1 << width, sizeof(*states));
if (states == NULL) {
abort();
}
uint32_t state = initial_state;
for (int t = 1;; t++) {
if (states[state] != 0) {
return t - states[state];
}
states[state] = t;
}
}

Let’s take a moment to step back and talk about addition in binary, because we’re going to be doing addition a little differently.

You may have seen an explanation of how to add ordinary numbers in binary. For example, in binary, we can add 1011 + 110 = 10001:

This is “normal” addition, with carrying. What happens when we don’t carry? We get 1011 + 110 = 1121… which is not binary, but that’s okay for now.

This is just like adding two polynomials together. In fact, mathematicians prefer to call this polynomial addition.

What happens when we do addition modulo 2? In modulo 2 arithmetic, 1 + 1 = 0. Addition modulo 2 is the same as the XOR operation in digital logic. Modulo 2, we get 1011 + 110 = 1101.

Mathematicians usually call this “polynomial addition” too, but you can see that it doesn’t matter if you write out the polynomial as x^2+x or if you just write the coefficients as 110.

We’re going to be using modulo 2 arithmetic to design LFSRs.

### Code

It is easy enough to write a function for polynomial addition modulo 2. Here is the C code:

#include <stdint.h>

// Add two polynomials, modulo 2.
uint32_t mod2_add(uint32_t p, uint32_t q) {
return p ^ q;
}

This only gives us 32 bits for polynomial coefficients. You may need to use larger types if you want to work with larger polynomials and larger LFSRs.

## Subtraction in Binary

What happens when you subtract modulo 2? Do you get negative numbers? Think about what happens when you subtract 1000 - 1011. Is 1011 bigger than 1000?

It turns out that “bigger” or “smaller” isn’t really something we think about when we are working modulo 2. Subtracting 1000 - 1011 we get 11, and we know that this is right because 1000 + 11 = 1011. In modulo 2 arithmetic, x - y = x + y. Addition and subtraction are the same operation, in modulo 2 arithmetic.

Or we can say x = -x, which means the same thing..

## Multiplication in Binary

Let’s continue exploring modulo 2 arithmetic by talking about multiplication. Addition is just repeated multiplication, so with our understanding of modulo 2 arithmetic, we can calculate 1011 \cdot 110 = 111010.

Keep in mind that this is special, modulo-2 multiplication, without carry. It’s not the same kind of multiplication you get when you write 0b1011 * 0b110 in a program.

If you’re a programmer, you’ll recognize that multiplying by 10 is the same as shifting to the left by one bit. For example, we can shift 1001101 to the left by one bit:

If we think of it as a polynomial, shifting to the left by one bit is the same as multiplying the polynomial by x (and the polynomial x has coefficients 10).

### Code

The straightforward way to multiply two polynomials, mod 2, is with long multiplication.

#include <stdint.h>

// Multiply two polynomials, mod 2.
uint32_t mod2_multiply(uint32_t p, uint32_t q) {
uint32_t result = 0;
for (int i = 0; i < 32; i++) {
if (((q >> i) & 1) != 0) {
result ^= p << i;
}
}
return result;
}

It turns out that this kind of multiplication is important enough that various CPUs have instructions for it, to make it faster to multiply polynomials modulo 2. The x86 has pclmulqdq and ARM has vmull.p8.

## Division in Binary

The final piece of the arithmetic puzzle is division. You can use long division to divide numbers modulo 2. Division gives us the quotient and remainder. We’re not interested in fractions, so the quotient will always be an integer.

For example, we can divide 10001101 ÷ 1011 and get 10110, with a remainder of 111.

Note that unlike normal division, we just need to look at the highest bit in each division step.

See also Wikipedia: Polynomial long division for a more general discussion of how to divide polynomials, although the article doesn’t cover modulo 2 arithmetic.

### Code

To divide modulo 2, we just have to translate the old, familiar long division algorithm to code. This algorithm divides p by q. In each step, we shift q to the left so the highest 1 bit lines up with the remainder, and then subtract (which is XOR).

#include <assert.h>
#include <stdint.h>

// Return the index of the highest bit set, 0-31. Returns -1 if no
// bits are set.
int highest_bit(uint32_t p) {
if (p == 0) {
return -1;
}
int bit = 0;
while (p > 1) {
p >>= 1;
bit++;
}
return bit;
}

// The result of dividing two polynomials, modulo 2.
struct division_result {
uint32_t quotient;  // p / q
uint32_t remainder; // p % q
};

// Divide the polynomial p by q, modulo 2.
struct division_result mod2_divide(uint32_t p, uint32_t q) {
assert(q != 0);
uint32_t quotient = 0;
uint32_t remainder = p;
int remainder_width = highest_bit(p);
int q_width = highest_bit(q);
while (remainder_width >= q_width) {
int shift = remainder_width - q_width;
remainder ^= q << shift;
quotient |= 1 << shift;
remainder_width = highest_bit(remainder);
}
return (struct division_result){
.quotient = quotient,
.remainder = remainder,
};
}

If you want to check that the division algorithm worked correctly, there are two things that we want to check. First, we check that the remainder is smaller than q. If it’s not, then we haven’t finished dividing. Second, we multiply the quotient by q, add the remainder, and check that you get the original value of p.

// Divide the polynomial p by q, modulo 2, and check that the result
// is correct.
void mod2_divide_check(uint32_t p, uint32_t q) {
struct division_result r = mod2_divide(p, q);

// Check that the remainder is strictly lower order than q.
assert(highest_bit(r.remainder) < highest_bit(q));

// Check that we get the original value of p back.
}

## LFSRs, Now with More Mathematics!

If we take a look at LFSRs again, we can explain what LFSRs are doing in terms of polynomials modulo 2.

We can think of the state of an LFSR with k bits as a polynomial with degree smaller than k.

For example, a 16-bit LFSR will have a state that contains coefficients for x^0 up to x^{15}. If the LFSR shifts left, then the state 0x870C is the polynomial P(x):

P(x) = x^{15} + x^{10} + x^9 + x^8 + x^3 + x^2

When we shift the polynomial left one bit, it’s the same as multiplying the polynomial by x:

x P(x) = x^{16} + x^{11} + x^{10} + x^9 + x^4 + x^3

We can think of the taps for the LFSR as a 16-th degree polynomial, equal to x^{16}, plus x^n for each tap at position n. With our taps at 0, 2, 3, and 5, the taps represent the polynomial Q(x):

Q(x) = x^{16} + x^5 + x^3 + x^2 + 1

This is called the characteristic polynomial of the LFSR. Note that the number of bits in the LFSR is equal to the degree of the characteristic polynomial.

Propagating the high bit to the taps is the same as dividing by Q(x) and taking the remainder. This means we have a simple formula for the next state, P^\prime(x):

\begin{split} P^\prime(x) &= x P(x) \bmod Q(x) \\ &= x^{11} + x^{10} + x^9 + x^5 + x^4 + x^2 + 1 \end{split}

Now, how do we pick the taps? How can we calculate the period of an LFSR?

## Irreducible Polynomials

An irreducible polynomial is a polynomial that can’t be factored into the product of two non-constant polynomials. For example, the polynomial P below is irreducible, but Q is reducible because it can be factored.

P(x) = x^4 + x + 1

\begin{split} Q(x) &= x^4 + x^3 + x^2 + 1 \\ &= (x^3 + x + 1) (x + 1) \end{split}

Any LFSR with maximum period must use an irreducible polynomial for the taps. Why?

Think about what happens if you have a polynomial which can be factored,

Q(x) = Q_1(x)Q_2(x)

Suppose that the LFSR starts in state P(x)=Q_1(x) Each time this LFSR advances, if the original state is a multiple of Q_1(x), then the next state will also be a multiple of Q_1(x).

If Q_1(x) and Q_2(x) have no common factors, then you are getting the same result as if you were running two LFSRs in parallel—one for Q_1, and one for Q_2. Running two shorter LFSRs will give a shorter period than you would get for a single, maximum-period LFSR with the same number of bits.

### Finding Irreducible Polynomials

Like many problems, the way that you find irreducible polynomials depends on how big the polynomials should be.

For smaller polynomials, it’s easy enough to try and factor them. If a polynomial with degree n has factors, then at least one of the factors must have degree n/2 or smaller. We can list all of the irreducible polynomials up to degree n/2 and use this list to test whether polynomials of degree n are irreducible.

This requires an exponential increasing amount of CPU power and memory as the polynomials grow larger, but in practice, it works fine if you just need to find a polynomial of degree no more than, say, 40. There are faster algorithms which will let you find much larger irreducible polynomials.

### Code

Here is some code for finding an irreducible polynomial of degree 31 or lower. For higher degree polynomials, the code will require some modification… but this approach is too slow for polynomials of high degree anyway.

#include <assert.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdlib.h>

// A dynamic list of polynomials.
struct poly_list {
uint32_t *polys;
int count;
int alloc_size;
};

// Append a single polynomial to the end of the list, growing the
// array if necessary.
void poly_append(struct poly_list *list, uint32_t poly) {
if (list->count >= list->alloc_size) {
int new_size = list->alloc_size == 0 ? 8 : list->alloc_size * 2;
uint32_t *new_array =
realloc(list->polys, new_size * sizeof(*new_array));
if (new_array == NULL) {
abort();
}
list->polys = new_array;
list->alloc_size = new_size;
}
list->polys[list->count] = poly;
list->count++;
}

// Number of lists of irreducible polynomials.
#define LIST_COUNT 16

// List of irreducible polynomials of each degree, up to degree 15.
struct poly_list irreducible[LIST_COUNT];

// Test if a polynomial is irreducible. Requires that the lists of
// irreducible polynomials be initialized up to and including
// degree/2.
bool is_irreducible(uint32_t poly, int degree) {
// Test polynomials with degree no larger than degree/2. If this
// polynomial is divisible by none of those polynomials, it is
// irreducible.
for (int n = 1; n <= degree / 2; n++) {
uint32_t *list = irreducible[n].polys;
int count = irreducible[n].count;
for (int i = 0; i < count; i++) {
struct division_result r = mod2_divide(poly, list[i]);
if (r.remainder == 0) {
return false;
}
}
}
return true;
}

// Initialize the list of irreducible polynomials in order to find
// irreducible polynomials of the given degree.
void init_irreducible(int degree) {
assert(degree / 2 < LIST_COUNT);
for (int n = 1; n <= degree / 2; n++) {
uint32_t start = (uint32_t)1 << n;
uint32_t end = start << 1;
for (uint32_t poly = start; poly < end; poly++) {
if (is_irreducible(poly, degree)) {
poly_append(&irreducible[n], poly);
}
}
}
}

// Find an irreducible polynomial with the given degree.
uint32_t find_irreducible(int degree) {
init_irreducible(degree);
uint32_t start = (uint32_t)1 << degree;
uint32_t end = start << 1;
for (uint32_t poly = start; poly < end; poly++) {
if (is_irreducible(poly, degree)) {
return poly;
}
}
// Mathematically impossible! For any degree, it is always
// possible to find an irreducible polynomial.
assert(false);
}

## Primitive Polynomials

Here’s where we get into the real math. Not all irreducible polynomials give us a maximum period LFSRs, but we know that the characteristic polynomial must be irreducible. It turns out that the polynomial must not only be irreducible, but also a primitive polynomial. There are different but equivalent ways of defining “primitive polynomial”, but for our purposes, we can just say that a primitive polynomial is a polynomial that makes an LFSR with maximum possible period.

To simplify the notation, I’ll write polynomials as p or q instead of P(x) or Q(x).

Let q be the LFSR’s characteristic polynomial, with degree k. The possible states for the LFSR are, \mathcal{P}_k(\mathbb{F}_2), the set of polynomials with degree less than k. Let S(p) be the set of all states reachable from initial state p.

S(p) = \{ px^n \bmod q : n \in \mathbb{N} \}

If q is a primitive polynomial, then S(p) will contain 2^k-1 states, and our LFSR has maximum period.

### Period and Initial State

We can prove that all sets S(p) for p \ne 0 have the same size. In other words, the period of the LFSR does not depend on which state you start in, as long as you don’t start in the zero state.

First, we can redefine S(p) in terms of S(1):

\begin{split} S(p) &= \{p x^n \bmod q : n \in \mathbb{N} \} \\ &= \{p (x^n \bmod q) \bmod q : n \in \mathbb{N} \} \\ &= \{p s \bmod q : s \in S(1) \} \end{split}

Written this way, it’s obvious that S(p) cannot be larger than S(1). If S(p) is smaller, then it must mean that two different elements, s_1, s_2 \in S(1) become the same element in S(p), which means that

ps_1 = ps_2 \mod q.

By rearranging,

p(s_1 - s_2) = 0 \mod q.

Since q is irreducible, it must either be a factor of p or a factor of s_1 - s_2. If it’s a factor of p, then p=0. If it’s a factor of s_1 - s_2, then s_1 = s_2. Both possibilities contradict the premise, so the size of S(1) and S(p) is the same, if p \ne 0.

### Partitioning All States

We can prove that any set of states, S(p_1) and S(p_2), are either equal to each other or disjoint.

We can exclude S(0) because it is the only set which contains 0.

Suppose that some element is in two sets, p \in S(p_1) \cap S(p_2), where p \ne 0.

From the way these sets are constructed, both S(p_1) and S(p_2) must contain all states reachable from p, S(p). Since all these sets are the same size, they must be the same set.

### LFSR Period

This brings us to the final piece of the puzzle, where we can prove what the period of the LFSR is.

Consider the set of all sets of reachable states, excluding the zero state.

T = \{ S(p) : p \in \mathcal{P}(\mathbb{F}_2), p \ne 0 \}

Each element in T is a set of states that the LFSR can be stuck in. Every nonzero state must appear in at least one of these sets, because p \in \mathcal{P}(\mathbb{F}_2). However, no state appears in more than one of these sets, because we proved that these sets are either disjoint or equal.

Since all of the sets in T are the same size, then the total number of nonzero states must be equal to the number of sets in T, multiplied by the number of states reachable from a nonzero state.

\lvert T \rvert \cdot \lvert S(1) \rvert = 2^k-1

In other words, the period of an LFSR which uses an irreducible polynomial must be a divisor of 2^k-1.

How does this help us?

### Finding Primitive Polynomials

We already have a system for finding irreducible polynomials. Once we find an irreducible polynomial, we can test to see if it is primitive by checking that the period is 2^k-1.

Instead of iterating through 2^k-1 possible states, there’s a shortcut. Since the period of an LFSR with an irreducible characteristic polynomial must be a divisor of 2^k-1, we only have to check the divisors of 2^k-1. In fact, we don’t have to check all the divisors, we just find every prime factor a, and check that the LFSR period is not a divisor of (2^k-1)/a.

For some number n, we can check whether the LFSR’s period is a factor of n by calculating x^n \bmod q. If the period is a factor of n, then x^n \bmod q = 1.

### Code

First, we need to find the factors of 2^k-1 which we want to test.

// Maximum number of factors to test. Note that 32-bit number can
// only have 9 prime factors, at most.
#define MAX_FACTORS 16

// Given an integer n, find all unique factors of the form n/a,
// where a is a prime number and a != n. Returns the number of
// factors.
int factorize(int *factors, int n) {
int count = 0;
int rem = n;
int factor = 2;
for (; rem >= factor * factor; factor++) {
if ((rem % factor) == 0) {
assert(count < MAX_FACTORS);
factors[count] = n / factor;
count++;
while ((rem % factor) == 0) {
rem /= factor;
}
}
}
// If rem == n, then n is prime, and we will return 0 factors.
if (rem > 1 && rem != n) {
assert(count < MAX_FACTORS);
factors[count] = n / rem;
count++;
}
return count;
}

Next, we write a function to calculate x^n \bmod q. Ther’s an algorithm for this that runs in \mathrm{O}(\log n) time. It works by repeatedly squaring polynomials. First, we can write a function for multiplying two polynomials, modulo q.

// Multiply two polynomials, mod 2, and return the result, mod q.
uint32_t multiply_modq(uint32_t a, uint32_t b, uint32_t q,
int width) {
uint32_t result = 0;
// p = b * x^i mod q.
uint32_t p = b;
for (int i = 0; i < width; i++) {
// Calculate p = p * x mod q.
p <<= 1;
if (((p >> width) & 1) != 0) {
p ^= q;
}
// Accumulate the result.
if (((a >> i) & 1) != 0) {
result ^= p;
}
}
return result;
}

The exponentiation algorithm works by observing that:

a^b = \begin{cases} 1 &\text{if } b = 0 \\ (a^2)^{b/2} &\text{if $b$ even, $b \ne 0$} \\ aa^{b-1} &\text{if $b$ odd} \end{cases}

This formula still holds with modular arithmetic, polynomials, and LFSRs.

// Compute pow(a, b), mod q.
uint32_t pow_modq(uint32_t a, int b, uint32_t q, int width) {
if (b == 0) {
return 1;
}
// Loop invariant: The final result is always equal to:
// accum * power ^ exponent.
uint32_t accum = 1;
uint32_t power = a;
for (int exponent = b; exponent > 1; exponent >>= 1) {
if ((exponent & 1) != 0) {
accum = multiply_modq(accum, power, q, width);
}
power = multiply_modq(power, power, q, width);
}
return multiply_modq(accum, power, q, width);
}

Using these functions, we can find a primitive polynomial with any order we want, and therefore a maximum period LFSR of any size.

// Return true if a polynomial is a primitive polynomial. The
// factors must be the result of factorize.
bool is_primitive(uint32_t poly, int degree, int *factors,
int factor_count) {
if (!is_irreducible(poly, degree)) {
return false;
}
for (int i = 0; i < factor_count; i++) {
// This calculates x^n mod poly, where n is the factor we are
// testing. "2" is the representation of the polynomial x.
if (pow_modq(2, factors[i], poly, degree) == 1) {
return false;
}
}
return true;
}

// Find a primitive polynomial with the given degree.
uint32_t find_primitive(int degree) {
// List all factors we want to test.
int factors[MAX_FACTORS];
int factor_count = factorize(factors, (1 << degree) - 1);

// Loop over irreducible polynomials.
init_irreducible(degree);
uint32_t start = (uint32_t)1 << degree;
uint32_t end = start << 1;
for (uint32_t poly = start; poly < end; poly++) {
if (is_primitive(poly, degree, factors, factor_count)) {
return poly;
}
}
// Mathematically impossible! For any degree, it is always
// possible to find a primitive polynomial.
assert(false);
}

## Results

With the code above, we can search for a primitive polynomial, then check that the period of an LFSR with that polynomial has the maximum possible length. Here are the smallest primitive polynomials of various orders up to 24:

SizePolynomial (hex)
80x11d
90x211
100x409
110x805
120x1053
140x402b
160x1002d
200x100009
240x100001b