top of page

Unlocking the Secrets: Drawing Inferences for Paired Means

  • 作家相片: Benjamin Chen
    Benjamin Chen
  • 2023年2月26日
  • 讀畢需時 2 分鐘

已更新:2024年1月19日

In the previous two stories on comparing two population parameters, the statistical theories were built upon one important assumption that the samples had to be independent from each other. But what if the samples aren't independent? For example, let's say I'm interested in seeing how a certain diet plan affects a person's weight (before and after). This would be an example where the samples are not independent. A measurement in the 'before' group would correspond to a measurement in the 'after' group (the same person). These two are called paired values.


Paired values violate the assumption of independence, so how exactly do we deal with them?

Paired Means (Dependent values)


Fortunately, the solution is quite simple. Since we're interested in the difference between the 'before' and 'after', we could actually condense them into one measurement by simply taking the difference between the two values. TA DA!


Now we are back to the one sample case. We can apply everything we learned previously for the one-sample mean to the difference value (between 'before' and 'after').


Everything from now should just be a review, but let's take a look at an example just in case.


Example


A test was carried out over different bodies of water to measure whether the the true average zinc concentration in the bottom water exceeds that of surface water. The two measurements are recorded for each body of water as follow:


Screenshot from Penn State Online

Now looking the two samples here (bottom and surface), it should be easy to tell that we have paired values here. The samples are not independent, which means in order to test whether there is a difference between the two groups, we have to condense them into one value. By taking the difference between the corresponding values (the bottom and surface zinc concentration of the same lake), we are left with only one measurement.

Sample

1

2

3

4

5

6

7

8

9

10

Difference

0.015

0.028

0.177

0.121

0.102

0.107

0.019

0.066

0.058

0.111

Now we can simply follow the steps for carrying out an one-sample t-test. (We know this is a t-test because the population standard deviation of the difference measurement is unknown). For a quick recall of a one-sample t-test, click here!


We start by defining our hypotheses.

We then check the normality assumption. In this question, we have no information that the difference follows a normal distribution, nor is the sample size considered great enough. Therefore, we should check the normality using a normal probability plot. In short, the normal probability plot compares the values in the dataset with their associated quantile values derived from a standard normal distribution.

Screenshot from Penn State Online

A normal distribution should present a roughly straight line on the normal probability plot. In our plot, the data points is roughly straight, so we can conclude that our data is normal.


Then we calculate the t-test statistics and compare it with the critical value.

Our t-test statistics is way out of bounds, meaning that it falls within the rejection region (assuming a 95% confidence level). Therefore, we have to reject the null hypothesis. Conclusively, we have significant evidence that the zinc concentration is higher in the bottom water than in the surface water.

Conclusion


This story should be relatively easy to comprehend. If you're having trouble, then it's probably because your fundamentals of hypothesis testing is still weak. Review hypothesis testing with t-test if you need! In our next story, we'll go into the proper way of comparing two variances.


留言


  • Kaggle
  • GitHub
  • Youtube
  • Linkedin

©2022 by Ben's Blog. Proudly created with Wix.com

bottom of page