Chebyshev's Inequality and 25SD Events
In 1867, Russian mathematician Pafnuty Chebyshev proved a general property of random variables governing their spread, building on earlier work by his friend and colleague Irénée-Jules Bienaymé. Later, his student Andrey Markov (of Markov chain fame) would expand upon his study.
This inequality has come to be known as Chebyshev’s inequality, though this name is also often used to refer to Markov’s inequality. Here, I prove both. The statements of the inequalities are as follows:
Statement of Markov’s Inequality
For any non-negative integrable random variable and any , we have:
Statement of Chebyshev’s Inequality
For any random variable with finite nonzero variance and finite mean , and any , we have:
Proof of Markov’s Inequality
Consider the indicator random variable , defined by
Notice that no matter its value. Taking expectations yields
which proves the inequality.
Proof of Chebyshev’s Inequality
Since and are positive, we can square the inequality inside the brackets of the left hand side to obtain the equivalent condition (and hence equivalent probability):
Now, apply Markov’s inequality to the non-negative random variable with :
The right hand side is, by definition, the variance of , which is . Recall that , so we can divide through to obtain the inequality
exactly as required.
Black Swans
When people talk about highly unexpected events (black swans, per Nassim Nicholas Taleb), they often refer to events which are some extreme number of standard deviations “out of distribution”. David Viniar, the CFO of Goldman Sachs during the 2008 financial crisis, famously remarked that the firm’s analysts “were seeing things that were 25-standard deviation events, several days in a row”.
Of course, most financial models are based on an approximately normal (or perhaps log-normal) distribution. Here, the probability of an event more than 25 standard deviations from the mean is less than one part in . For context, if each atom in the universe generated a trillion normally distributed random variables every picosecond since the start of the universe, the chance that we would have seen a 25SD event since the universe began is less than one in a quadrillion!
Chebyshev’s Looseness
What Chebyshev’s inequality shows us is that a 25SD event isn’t always quite as rare as the normal distribution would have you believe. One in 625 is still moderately rare (0.16%), but certainly not as extreme as one in . However, how rare actually are 25SD events? That is, is the Chebyshev bound at all tight? We consider a few other distributions.
For the exponential distribution, we have , and so (since the distribution is non-negative) we only look at the right tail. Here, we seek the probability that , which is given by , equal to roughly one part in 200 billion ().
Meanwhile, consider a famously fat-tailed distribution, like Student’s t-distribution with 3 degrees of freedom (the minimum to ensure finite variance). This distribution has . Its survival function decays like a power law (inverse-cubically); the probability of our 25SD event is just under one in 100 thousand.
For an even sillier example, the uniform distribution on an interval has standard deviation equal to , where is the length of the interval. But in fact this distribution has bounded support, so can only differ from the mean by at most half the length of the interval . This means we cannot possibly have even a 2SD event, let alone a 25SD event!
So even for fairly fat-tailed distributions, Chebyshev’s inequality is really quite a loose bound on the improbability of tail events. However, it is still cool that we have any sort of nontrivial bound whatsoever that is totally independent of the shape of the distribution!