What's the difference between the Z Distribution and the t-distribution?

Benjamin Chen
2022年3月24日
讀畢需時 5 分鐘

已更新：2024年1月22日

When students study confidence interval and hypothesis testing, one problem students often encounter is "What in the world is a t-distribution?". In this story, I will attempt to answer that question. Before you read on, I strongly recommend you to go over Statistics 8: Standardization and Statistics 9: Sampling Distribution Clearly Explained! first.

Z Distribution

Let's start by defining the Z distribution first! Z distribution is just another term for standard normal distribution. A Z distribution (or standard normal distribution, whichever you like to refer to) is a normal distribution with a mean of 0 and a standard deviation of 1. There are two things you have to know about the Z distribution.

First, all normal distribution, regardless of the mean or standard deviation, can be standardized to a Z distribution. I demonstrated this process in Statistics 8: Standardization. Say I have a variable X that follows a normal distribution with a mean of 10 and a standard deviation of 2.

I can standardize the distribution of variable X into a Z distribution by applying the following formula: (which also explains why the resulting notation is Z)

Now the standardized distribution will look like this:

Note: The two distributions look the same but focus on the change in the notation of the variable (X to Z) and the values on the X-axis.

Second, the Z-table applies specifically to the Z distribution (hence named the Z-table). The Z-table enables us to find the area under the Z-distribution, which I also demonstrated in Statistics 8: Standardization.

Standardization allows us to find the area under any normal distribution. We just have to standardize the normal distribution into a Z distribution and then we can resort to the Z-table to find the area under the curve.

Sampling Distribution

Now, let's turn our focus to the sampling distribution. The sampling distribution is the distribution of a sample statistic (eg. sample mean). The sampling distribution of sample mean, for example, would be composed of many many many different sample means compiled from many many many different samples. Each sample would consist of slightly different values, which explains why there will be a slight variation in the sample mean (which ultimately forms the sampling distribution).

The mean and standard deviation of a sampling distribution of the sample mean can be derived from the population mean and population standard deviation.

We would denote a normal sampling distribution with the following notations:

The first is denoted in terms of the mean and standard deviation of the sampling distribution (of sample mean), whereas the second is denoted in terms of the population mean μ and population standard deviation σ.

Now let's say go back to our previous example where variable X is normally distributed with a (population) mean of 10 and a (population) standard deviation of 2. Say we were to construct a sampling distribution (of sample mean) with a sample size of 9. We could derive the mean and standard deviation of the sampling distribution using the formula that we just discussed.

We would also be able to determine that the sampling distribution of sample mean is normal because the (population) distribution of X is also normal.

Note: There are two ways to determine whether a sampling distribution is normal or not.

If the population distribution is normal, then sampling distribution must be normal
If the sample size is >= 30, then sampling distribution must be normal

Now, we can denote our sampling distribution of the sample mean as:

And it should look like the distribution below:

Now let's recall the statement that "All normal distributions can be standardized", which would of course include this sampling distribution. This is where it gets tricky. The standardization formula when standardizing the sampling distribution (of sample mean) would become:

This shouldn't be too hard to understand. The formula is exactly the same except for the notations are in terms of the mean and standard deviation of the sampling distribution. We can see that the sampling distribution (of sample mean) can also be standardized to a Z distribution. But there's a catch!

The sampling distribution will only be standardized to a Z distribution when the population standard deviation σ is given.

In many scenarios, the population standard deviation σ is actually unknown. When the population standard deviation σ is unknown, we estimate it with the sample standard deviation S. The problem with using S in place of σ is that the sampling distribution (of sample mean) will no longer be standardized to a Z distribution. Instead, the sampling distribution (of sample mean) will be standardized into a t-distribution (hence the notation t below)

T-distribution

A t distribution is very similar to a Z distribution except its shape will vary depending on its degree of freedom. The degree of freedom for a t-distribution is n-1. Thus, in a sense, the shape of a t-distribution will depend on the sample size. Let's take a look at how the shape of a t-distribution varies.

We can see that the smaller the sample size (smaller degrees of freedom), the fatter the t-distribution. As sample size increases, the t-distribution slowly approaches a standard normal distribution (Z distribution). I like to see this trait as a penalty to sampling distributions with small sample sizes. The fatter the sampling distribution, the greater the confidence interval estimation, which implies a greater estimation error for the population parameter. Don't worry if this is getting too confusing. The main takeaway here is that:

If population standard deviation σ is known, the sampling distribution will follow a Z distribution after you standardize it
If population standard deviation σ is unknown (and we need to estimate it with the sample standard deviation S), the sampling distribution will follow a t-distribution with degrees of freedom n-1.

After we determine whether the standardized sampling distribution follows a Z distribution or t-distribution, we then use either the Z table or the t-table to find the area under the distribution.

Let's quickly compare the t-table and Z-table to see what's different. In the Z table, we can see that the Z values are in the first column and row. The values within the table are the area under the curve to the LEFT of the Z value (also demonstrated in the small illustration on the top right-hand corner).

The t-table looks slightly different. We know that t-distributions are shaped differently depending on the degrees of freedom (n-1). So first, we determine the degrees of freedom of the t-distribution in the first column. Then, in the first row, we can find the upper-tail areas. The values within the t-table are the t-values (equivalent to the Z value in the Z table) that correspond to the UPPER tail area (which is also why all of the t-values are positive).

Conclusion

Phew! That was a lot to cover. Hopefully, this story was able to answer your question on "what in the world is a t-distribution?". In short, the t-distribution is a distribution that is similar to a Z distribution (standard normal distribution) but with heavier tails depending on its degrees of freedom. The important thing is you have to be able to determine when a sampling distribution, after standardization, will follow a Z distribution or a t-distribution.

What's the difference between the Z Distribution and the t-distribution?

Z Distribution

Sampling Distribution

T-distribution

Conclusion

最新文章

留言