The Palindrome

Share this post

1111111111 is a random sequence

thepalindrome.org

1111111111 is a random sequence

Biases in our perception of randomness

Tivadar Danka
Mar 6, 2023
16
Share this post

1111111111 is a random sequence

thepalindrome.org
1
Share

You are (probably) wrong about randomness.

I’ll show it to you. Let’s do an experiment. Pick up a pen, paper, or your favorite text editor, and manually generate a random sequence of ten characters, composed of ones and zeros.

My guess is, your answer will be:

0110100110

Probabilistically speaking, I should be right in only about 1 out of 1024 cases.
Statistically speaking, I am right way more frequently.

There is nothing wrong with 0110100110. It is a perfectly valid result. What’s striking is that there are no larger subsequences of repeating 0s or 1s. Intuitively, we consider 1111111111 not as random, but this has the same probability as any other 0-1 sequence of ten characters!

Don’t believe me? I have even performed an experiment to find out what you think is random.

Twitter avatar for @TivadarDanka
Tivadar Danka @TivadarDanka
There is a question about randomness that I am obsessed in lately, and I am asking for your help. Your task is simple: reply with a random 0-1 sequence of ten characters that you come up with. (No random generators are allowed.) I'll discuss the results in an upcoming thread.
11:25 AM ∙ Feb 28, 2023
166Likes7Retweets

The result is (not) surprising: your random 0-1 sequences are not as random as you think. Read on to find out why.

Understanding mathematics is a superpower. Subscribing to The Palindrome will instantly unlock it for you. For sure. (Or at least help you get there, step by step.)

Random sequences…

Let’s go back to our initial experiment: generating random 0-1 sequences.

This is the thought process going on inside my head:

Ok, I’ll start with 0. I should continue with 1, as 00 is not random enough. On the same note, 01 should also continue as 011, as 010 is alternating, thus it is not random enough.

As three 1-s in a row is again not random, 0110 is the way to go. But I can’t continue with another 0: that’s two blocks of two repeated characters next to each other. Not random.

So 01101 it is…

And it goes on like that. What is the issue? That what I perceive as random is biasing my thinking. Things like

  • blocks of alternating characters,

  • blocks of repeated characters,

  • and blocks of identical blocks

are considered not random, whatever that might be. Paradoxically, avoiding them causes sequences that are exactly predictable, like 0110100110.

What is random, then?

Think of the assembly of a random 0-1 sequence as flipping coins. Heads, and we write down 1. Tails, and it’s 0. Each outcome has 1/2 probability, and each toss is independent of each other.

The probability of randomly obtaining 0110100110
The probability of randomly obtaining 0110100110

We can model our sequence-generating process with independent, identically distributed Bernoulli random variables.

Bernoulli-distributed random variables describing the sequence-generation process
Bernoulli-distributed random variables describing the sequence-generation process

As the Xᵢ-s are independent, we can compute the probability of a given sequence by multiplying the individual probabilities.

Probability of obtaining a given sequence
Probability of obtaining a given sequence

And thus, the probability of obtaining the “random” 0110100110 and “not random“ 1111111111 is the same.

This seems paradoxical at first, but there is a clear resolution: true and perceived randomness is not the same.

Surprisingly, one of the telltale signs of true randomness is the presence of runs, that is, subsequences of repeated characters. Let’s talk about them.

…and repeated characters

So, if we randomly generate a 0-1 sequence of ten characters, 0111111110 is just as likely as 0110100110. However, the former one contains a block of eight ones. There are exactly ten possible ways we can cram a block of precisely eight repeated characters, making sequences such as 0111111110 exotic.

This begs the question: what is the frequency of runs?

To make the problem simpler, we’ll only study the longest run of a given sequence. (Recall that a run is simply a subsequence of repeated characters.) For instance, the length of the longest run in 0111110011 is five, as it contains 11111, but no larger run.

As each 0-1 sequence of a fixed size have an equal probability, it is enough flex our combinatorial muscles and count the ones with the longest run having length k.

This is not a simple task. There is an explicit formula, see this ancient landmark paper about The Distribution Theory of Runs by A. M. Mood, but as I don’t want to shock you (or myself) with towering mathematical expressions, I’ll just generate a million random sequences and count the longest runs by sheer brute force. I know, I am getting old. (If you are interested in the code, here is the Google Colab notebook. Be warned though: writing clean code was not my aim.)

Here is (a very close approximation of) the distribution.

As you can read from the plot, the probability of having larger runs is much more significant than what we intuitively expect.

The distribution of longest runs in a 0-1 sequence of 10 characters
The distribution of longest runs in a 0-1 sequence of 10 characters

(I just found out that you can style matplotlib figures like XKCD comics, and I am in love.)

To help you quantify this, here are the tails of the distribution; that is, the probabilities of a ten-character sequence having at least k-long runs.

The probability of at least k-long runs
The probability of at least k-long runs

Some concrete numbers. The chance of at least one 4-long run is 46%; at least one 5-long run is 21%. The expected length of the maximal run is ~3.661436.

These are in stark contrast with what we perceive as random. For humans, 0111110101 looks like a pattern, yet similar runs randomly happen 21% of the time.

What do we think is random, then?

Humans are not so random

To find out, I have performed an experiment on Twitter.

I have asked you to give me a 0-1 sequence of ten characters. What I was mostly interested in were the longest runs. I conjectured two things:

  1. the length of the longest run in most sequences will be 2 or 3,

  2. and sequences with 9 or 10 long runs will be significantly over-represented compared to the truly random.

Why? Simple. Either you are aware of our biases towards randomness and you want to troll me, or you are biased.

Both of my guesses were confirmed. Check out the result:

The distribution of longest runs in human-generated 0-1 sequences
The distribution of longest runs in human-generated 0-1 sequences

To simplify the comparison, here are the distributions for true random vs human-generated sequences, plotted on top of each other.

See the difference?

Let’s do some statistics to see if the difference is real. One way to check that is Pearson’s chi-squared test, which tells us whether a set of categorical observations is coming from the specified distribution or not.

In our case, the categories are the lengths of the longest runs: 1, 2, …, 9, and 10.

The chi-squared test works by calculating the normalized sum of squared differences between the observed and expected distributions.

Pearson's chi-squared test

Our null hypothesis (H₀) is that the observed and expected distributions are the same. If H₀ is true, the test statistic follows a chi-squared distribution with n-1 degrees of freedom, where n is the number of values our distributions can take. (Which is ten in our case.)

The smaller the value of 𝜒² is, the more likely the null hypothesis.

Pearson’s chi-squared test reveals that human-generated random 0-1 sequences come from a different distribution with a whopping 99.97967% chance.

So, your random numbers are definitely not random.

(The user-generated data is far from perfect. There are multiple issues, mainly,

  • as my audience is more knowledgeable with probability and statistics than an average human, many participants were conscious of their biases,

  • and I was unable to reinforce that no random generators were used (some sequences were obtained from quantizing the digits of π or converting random integers to base two).

This experiment also fails to directly address our bias against runs. An “adversarial” approach would be better for that purpose: participants would be given a list of 0-1 sequences which they have to label “random” or “not random”.

Moreover, as Twitter cut off its free API access, I had to manually copy and paste your answers. Yeah, I know. I suspect that there are more answers than that the Twitter UI is showing me, so I’ll redo the plots when I’ll get API access.)

As the sequences grow

What happens as the size of 0-1 sequences grow?

For one, runs can be confidently used to determine if they are indeed random. This is called the Wald-Wolfowitz runs test. If the number of runs significantly deviate from the expected, the consequtive elements are (probably) not independent; that is, they are not random.

So, how big of a runs we should expect?

Erdős and Rényi - two legendary mathematicians - showed that in N-long random 0-1 sequences, the probability of finding a log₂N -long run of 1-s is asymptotically 1.

You can read the result in its full glory in the paper On a new law of large numbers by Paul Erdős and Alfréd Rényi. It’s technical and convoluted, so allow me to unpack it. Fasten your seatbelts.

Rather than studying runs, Erdős and Rényi were interested in the density of 1-s. These are given by partial and running sums. The n-th partial sum Sₙ is defined by simply summing the sequence up to its n-th member.

Partial sums of a 0-1 sequence
Partial sums of a 0-1 sequence

Runs can be expressed in terms of running sums, that is, the difference of partial sums.

Running sums are obtained by defining a window, then sliding that window through our sequence and summing up the values inside it.

Running sums of a 0-1 sequence
Running sums of a 0-1 sequence

If a particular running sum equals the size of the window, it is a run. The average of running sums describe the density of 1-s.

Erdős and Rényi were interested in the maximal running sum of a given sequence.

The maximal running sum

Just like all the previous quantities, the maximal running sum is a random variable.

Phew. After a lengthy setup, here comes the result. If N (the length of the sequence) goes to infinity, the probability of finding a log₂N -long run goes to 1.

On a new law of large numbers by Paul Erdős and Alfréd Rényi
On a new law of large numbers by Paul Erdős and Alfréd Rényi

Once more, I have simulated some sequences and estimated the distribution of the maximal running sums over log₂N-sized windows. Here is the result:

Maximal running sums in random 0-1 sequences
Maximal running sums in random 0-1 sequences

I know, this is not the most exciting figure ever, but this is exactly the point: as the length grows, we will find a log-sized run with increasing probability.

Conclusion

Humans are instinctively biased about randomness. We tend to over-represent small blocks of repeating subsequences, and under-represent larger ones. When asked, we tend to generate predictable and boring sequences such as 0110100110.

However, true randomness doesn’t make a distinction between 0110100110 and 1111111111: both are equally probable, even though we perceive the latter one less random. In fact, the lack of long runs signals the lack of randomness.

References

  • The Distribution Theory of Runs, A. M. Mood

  • On a new law of large numbers, Paul Erdős and Alfréd Rényi

16
Share this post

1111111111 is a random sequence

thepalindrome.org
1
Share
Previous
Next
1 Comment
Share this discussion

1111111111 is a random sequence

thepalindrome.org
Oscar Stiffelman
Apr 13Liked by Tivadar Danka

Randomness is incompressible. Li and Vitanyii offer an interesting variant on the higher order version of this in their wonderful book on Kolmogorov complexity: random sequences must include a small number of nonrandom sequences, otherwise their predictable absence would be a compressible restriction.

Expand full comment
Reply
Share
Top
New
Community

No posts

Ready for more?

© 2023 Tivadar Danka
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing