In these first few posts I’ll cover a few common distributions and note their interesting properties. I’ll try to follow standard notation here, so that capital letters $X,Y,Z$ will denote random variables, which are represented as functions $X\colon\Omega\to\mathbb R$ for some sample probability space $\Omega$. $P(A)$ will be the probability of event $A$ and, as is custom, I will abuse notation and write things like $P(X=0)$ for $P(\{x\mid X(x)=0\})$.

This post is part of my series on distributions:

The Poisson distribution is a very popular distribution for modelling discrete data. This distribution can model situations where we’re dealing with a large amount of Bernoulli trials spread over time, each with a very small success rate, and we’re then counting the number of successes in a particular time interval. The Poisson has a single parameter $\lambda$, which is how many trials are performed at every time unit.

The density function of this distribution is $f\colon\{0,1,2,\dots\}\to\mathbb R$, given as

$f(k) := \frac{e^{-\lambda}\lambda^k}{k!}$

Here are a couple of plots for various values of $\lambda$, with its associated code.

from scipy.stats import poisson
from matplotlib import pyplot as plt
from random import uniform

N = 50 # number of time intervals
mus = [uniform(0, N) for i in range(10)] # uniformly random values for mu
Xs = map(poisson, mus) # Poisson distributed random variables
xs = range(N+1) # x values in our distributions

# create a (2,5) grid of subplots, sharing x- and y-axes
fig, ax = plt.subplots(2, 5, sharex='col', sharey='row', figsize=(15,4))

# fill in every subplot with the density (pmf) of the random variables
for i in range(2):
for j in range(5):
ax[i,j].bar(xs, next(Xs).pmf(xs))
ax[i,j].title.set_text(f"mu = {round(mus[i*5+j],2)}")

# set overall title and show the plot
fig.suptitle("Poisson distribution for various values of mu",
fontsize=20, y=1.1)
plt.show()


Alright, fine. There are a few questions that probably have arisen at this point:

1. Why does the above function really constitute a density function?
2. How is it connected to the intuition of the distribution mentioned above?
3. Why is this distribution so interesting?

For the first bit, note that

$\sum_{k=0}^\infty f(k) = e^{-\lambda} \sum_{k=0}^\infty\frac{\lambda^k}{k!} = e^{-\lambda}e^\lambda = 1.$

The second and third questions turn out to have the same answer, as there’s an important detail I should emphasise: the success rates of the Bernoulli trials I mentioned above are allowed to be different. More precisely, we have the following theorem.

Theorem (Law of small numbers). If $A_1,\dots,A_n$ are independent events with $p_i:=P(A_i)$, $X:=\sum_{i=1}^n I(A_i)$, $\lambda:=\sum_{i=1}^n p_i$ and $Y\sim\text{Pois}(\lambda)$ then for any $B\subseteq\mathbb N$,

$|P(X\in B)-P(N\in B)|\leq\min(1,\tfrac{1}{\lambda})\sum_{i=1}^n p_i^2.$

We actually get an even stronger result, in that the $A_i$’s don’t even have to be independent! Namely, an analogous result still holds if the variables are “locally dependent”; see e.g. chapter 4 in this paper.

We won’t show either of those two results here, but merely the special case when all the $p_i$’s are equal to some $p$, which will take us to the binomial distribution. Recall that a random variable $X$ follows the binomial distribution with parameters $n$ and $p$ if $X$ counts the number of successes out of $n$ Bernoulli trials with success rate $p$, so in this case we have that $X\sim\text{Bin}(n,p)$.

Theorem (Special case of the law of small numbers). If $X_n\sim\text{Bin}(n,\tfrac{\lambda}{n})$ for some $\lambda$ then $X_n\to\text{Pois}(\lambda)$ in distribution.

To see that this special case holds, first recall that the density function of $\text{Bin}(n,p)$ is $f\colon\{0,\dots,n\}\to\mathbb R$ given as

$f(k) := {n\choose k}p^k(1-p)^{n-k},$

so that by plugging in and letting $n\to\infty$ we get that

$f(k) = {n\choose k}\tfrac{\lambda^k}{n^k}(1-\tfrac{\lambda}{n})^{n-k} \to \frac{e^{-\lambda}\lambda^k}{k!},$

where we used that $(1-\tfrac{\lambda}{n})^n\to e^{-\lambda}$ along the way. This is precisely the density function for $\text{Pois}(\lambda)$. QED

So, we have (not really but kind of) shown that the Poisson distribution behaves as promised. This means that a Poisson distribution can used to model a vast variety of phenomena:

• The number of emails received in a given hour
• The number of earthquakes on the earth in a given year
• The number of winners of the lottery in a given month

The reason why a Poisson distribution might be a good model for these questions is that we are doing many trials per time unit (many people could send you emails, an earthquake can happen in many places around the world, and there are many people buying lottery tickets) but the individual probability for every trial is small (the chance of a given person sending you an email, or of an earthquake happening at a particular location, or of a particular person owning a lottery ticket winning).