The Types of Distribution You Need to Know

Benjamin Chen
2022年2月10日
讀畢需時 8 分鐘

已更新：2024年1月22日

In this story, we will go over the three main types of distribution:

Discrete Probability Distribution
Binomial Distribution
Continuous Probability Distribution

You could probably already imply what each of these distributions represents based on their names. Let's go through them one by one!

Discrete Probability Distribution

We can illustrate discrete probability distribution by using a coin toss example. Consider an experiment where we toss 2 coins. Let’s also assume X as the number of heads for each toss.

Because we are only tossing 2 coins, it’s easy to map out all of the possible outcomes. The four possible outcomes are:

Tail and Tail
Tail and Head
Head and Tail
Head and Head

For each of these possible outcomes, we can easily calculate the number of heads out of the two tosses.

Tail and Tail (0 heads)
Tail and Head (1 head)
Head and Tail (1 head)
Head and Head (2 heads)

Since each outcome have an equal probability (0.25) of occurrence, we can thus plot out the probability for X. Recall that X is the number of heads.

The probability of landing 0 heads (just outcome 1) is 0.25
The probability of landing 1 head (outcomes 2 and 3) is 0.5
The probability of landing 2 heads (outcome 4) is 0.25

Let’s plot it out to visualize the distribution of X.

Note that X is a discrete variable (discrete numerical variable to be specific). If you don’t remember what is meant by discrete variables, you may refer back to the story on variables (Statistics 2: Numeric and Categorical Variables). In short, we can determine that X is a discrete variable because it is impossible to land 1.5 heads from two coin tosses. X can only be an integer (eg. 1 head, 2 heads, etc.). Because X is a discrete variable, the distribution that we just built is a discrete probability distribution.

Pretty easy right? By definition, a discrete probability distribution is a distribution that describes the probability of occurrence of each value (0, 1, 2 in our example) of a discrete variable. There are some important properties that we must remember for a discrete probability distribution.

Each outcome is mutually exclusive and collectively exhaustive
The summation of the probability of each event is 1

These properties should be self-explanatory. If you don’t remember what they mean, you may refer back to probabilities.

Expected Value, Variance, and Standard Deviation

Alright! Next, let’s introduce a few extra terminologies to the context of distribution:

Expected Value
Variance
Standard Deviation

The Expected Value, by definition, is the average value of a random variable over many experiments. In other words, if we perform our experiment (tossing 2 coins) many times, the expected value is the average value of X over the many trials.

To calculate the expected value, we just follow the following formula:

Let’s apply the formula to our example and see for ourselves.

We’re essentially summing up the product of each X value and its corresponding probability. The expected value that we calculated is 1, which means that if we perform our experiment many times, the average value of X will be 1. You can also verify this expected value from the distribution plot.

You should be familiar with the other two terms, variance and standard deviation. If there is a distribution for our variable, that means there is a spread. The variance and standard deviation measure exactly this spread. We can calculate the two measures with the following formula:

Variance

Standard Deviation

If we plug in the values from the previous example into the equations, we will get:

Variance

Standard Deviation

These calculations should be straightforward as all you have to do is plug the correct values into the formula.

Binomial Distribution

A binomial distribution is a special type of discrete probability distribution. A discrete probability distribution is considered a binomial distribution when the following conditions are met:

n repetition of identical trials
Two mutually exclusive outcomes (success and failure) in each trial
Constant probability of success (pi) in each trial
Trials are independent (the outcome of one trial does not affect the other trials)

Ok, let’s illustrate these conditions with an example.

Say we have three students. Each student can either pass or fail the test and they each have a 0.7 probability of passing the test. What is the probability that two students out of the three passed the test?

Before we calculate the probability, I want to clarify that this example satisfies the conditions of a binomial distribution.

We have three students and we can see these students as identical trials.

We also have two mutually exclusive outcomes. A student can either pass or fail the test.
The probability of passing remains constant at 0.7

And the test result of one student will not affect the test result of other students (independent trials)

All four conditions are satisfied in this example. This means that the number of students who passed (number of successes) will follow a binomial distribution. The number of students who passed (number of successes) is our X variable. Ok, let that sink in because this is where most students get confused.

The X variable in a binomial distribution is the number of successes.

Don't worry if you still don't understand. Let's clarify by solving the example. Because we only have three students in this simple example, it’s easy to map out all of the possibilities. There could only be three cases where two out of three students pass. We could easily calculate the probability for these three cases to occur.

We could then add up the probability of the three cases and that will be the probability that two out of three students passed the test.

Now, imagine if there are a hundred students. The number of possible causes will be too many to map out. This is where we introduce binomial probability. The formula is:

Now let’s attempt to calculate the probability that two out of three students passed again, but this time using the binomial probability.

Find P(X=2) *remember that X is the number of successes

Hooray! We got the same answer without manually mapping out the possibilities! We can also calculate the probability that 0 students passed, 1 student passed and all 3 students passed. We can’t go any higher because we only have three students in this example.

We can plot these probabilities into a plot and the result is the binomial distribution.

Coming back to the previous statement:

The X variable in a binomial distribution is the number of successes.

If you didn't understand what the statement meant, you should now have a much better idea.

In case we are asked to calculate the probability that at least 2 students passed, we can simply add the probability that two passed and the probability that 3 passed.

Mean, Variance and STD for Binomial distribution

Binomial Distribution is a type of discrete distribution, but we can use a simpler formula to find the mean, variance, and standard deviation of a binomial distribution.

Mean

Variance

Standard Deviation

Let’s plug in the values in our example to these equations.

Easy right? Next, let’s talk about the final type of distribution, continuous probability distributions.

Continuous Probability Distribution

From the name of the distribution, you could probably already tell that this is the distribution for continuous numerical variables. Let’s quickly review how we usually represent continuous numerical variables.

Frequency Distribution and Histogram

Pretty familiar right? You may recall from Statistics 3: Basic Data Visualization that we represent continuous numerical variables in frequency distributions and histograms. Here we have these visualizations representing the X variable Amount of Fill (liters).

Now pay close attention to the histogram. If we were to find the probability that the Amount of Fill is less than 1.042 liters (marked by the vertical line on the histogram), then we would have to add up the probability of the first four intervals and a portion of the fifth interval. In other words, we need to find the area under the curve to the left of the vertical line. I’ll mark this area in green.

The areas for the first four intervals are specified by the frequency distribution under the relative frequency column. You should also notice that the relative frequency sums up to a total of 1. This means the total area under the histogram should also be 1. Anyways, for the portion of the area in the fifth interval, we can find the area using some simple math.

The takeaway here is that the area under a distribution represents probability.

But that would also mean the probability of a single point would be zero. Let’s see what this sentence actually means.

If I were to find the probability that the Amount of Fill is 1.042 liters, the probability would be zero.

This is because X=1.042 is a straight vertical line and the area under a straight line is always 0. So in a sense, we will only be able to find the probability for a range of X. I understand this might be slightly confusing, but this is very important because it marks the difference between probability and likelihood. In case, you're curious about the difference, you may refer to Probability vs. Likelihood.

Now let’s come back to the histogram. The histogram is made up of many intervals. Now imagine if we shrink the intervals. As the intervals shrink, the curvature of the histogram will also begin to smoothen and, eventually, the histogram will become a smooth curve. Below, we have another example that illustrates this process.

This smooth curve is what we commonly refer to as the continuous probability distribution. On the y-axis, the label says ‘density’ because this is also called the probability density function (pdf).

There is one important form of density function that I’d like to go over: the normal density function.

Normal Density Function

The normal density function is what we commonly refer to as the bell curve or normal distribution. Below is an example of a normal distribution.

We denote a normal distribution with the following notation:

This notation can be read as:

Characteristics of a Normal Distribution

There are several important characteristics that belong to a normal distribution.

Infinite theoretical range
Bell-shaped
Symmetrical at X= µ
Mean, median and mode are identical at µ
Spread is determined by σ
Follows the empirical rule

All of these should be pretty self-explanatory, except for the last one which you may have never heard of.

The empirical rule is a quick shortcut that applies to all normal distributions. It basically says that for a normal distribution,

The area under the curve within µ ± σ equals about 68%
The area under the curve within µ ± 2σ equals about 95%
The area under the curve within µ ± 3σ equals about 99.7%

This is why the empirical rule is also sometimes referred to as the 68-95-99.7 rule. This rule comes in handy in many situations, and maybe we’ll see how this can be applied in future topics.

Computing Normal Probabilities

Just now when we were talking about histograms, we mentioned that probability is measured by the area under the distribution. We also said that the total area under the curve is 1 and the probability of any individual point is equal to 0. So for the distribution below, if I want to find P(a <= X <= b), then my goal is to find the area in pink.

If you studied calculus before, then you know that we can calculate the pink area using derivatives. The formula would be:

Complicated right? Don’t worry! This is a series on statistics, so we’ll find the area under the curve using the statistics approach. We’ll use a method called standardization. Standardization is a SUPER important topic, so I'm going to dedicate an entire story to it. For now, just know that standardization is a method to find the area (probability) under a normal distribution.

Conclusion

Phew! We made it to the end of another exciting story. In this one, we went over the three main types of distribution and discussed the meaning of many terminologies and concepts. Here is a brief recap:

Discrete Probability Distribution
Expected Value, Variance, and STD for Discrete Distribution
Binomial Distribution
Binomial Probability
Mean, Variance and STD for Binomial Distribution
Continuous Probability Distribution
Normal Distribution
Empirical Rule

A lot of materials indeed but we're not done yet! Remember we still have to cover standardization, the statistical approach to calculate the area under the curve for a normal distribution. It's one of the most important prerequisites for later topics, so study hard!