top of page

How Do You Determine an Appropriate Sample Size?

  • 作家相片: Benjamin Chen
    Benjamin Chen
  • 2022年3月3日
  • 讀畢需時 3 分鐘

已更新:2024年1月22日

In this story, we will be discussing how to determine the appropriate sample size for confidence interval estimation. This topic is built upon confidence interval, so if you haven’t studied that yet, I recommend you do that first (Statistics 10: Confidence Interval Estimation)

Sample Size Determination


We’ve learned that the formula for the confidence interval of the population mean μ is:

The right-hand side of the equation is the sampling error (also called the margin of error).

The illustration below might give you a better idea of what sampling error stands for.

So, what exactly does sampling error mean? You may recall that a confidence interval is really just an estimate of population parameters expressed in the form of an interval.


This means the greater the sampling error, the greater the uncertainty in the estimation.


Now consider the formula of sampling error carefully. As sample size n increases, the sampling error (E) decreases. This makes perfect sense because as sample size n increases,

  • The standard error (standard deviation of sampling distribution) decreases. (the sampling distribution becomes thinner)

  • The sampling error would also decrease.

With a smaller sampling error, our estimation of the population parameter becomes more precise.


We can see that there is a trade-off relationship between sample size and sampling error. So oftentimes, we need to decide on the appropriate sample size, because a sample size that is too small will lead to a less precise estimate, while a sample size that is too large would be unnecessary and costly to collect.


To address this problem, researchers and statisticians will usually predetermine an acceptable sampling error and calculate the appropriate sample size to achieve that level of sampling error. All it takes is a little manipulation of the formula of sampling error.

Let’s take a look at an example:


Based on 332 randomly selected 2018 CB graduates, it gave a mean monthly salary of HKD 17747 with a standard deviation of HKD 3875. To be 90% confident of correctly reflecting the monthly salary of all CB graduates to within ±750, what sample size is needed?

Because we don’t know the σ value, we use the sample standard deviation S to replace it.

You can see that the final sample size is always rounded up. Remember that the greater the sample size, the more precise your estimate is. So, if we were to round down, we may not be able to achieve the sampling error requested by the problem.


Additionally, when the population standard deviation is unknown, we guess the value based on other prior information such as the sample standard deviation, which is exactly what was done in this case. If even the sample data is unavailable, one practical approach is to develop an estimate of the range of the data and then estimate the standard deviation as range/4. Let’s take a look at another example that illustrates this approach.


Suppose you want to estimate the mean GPA of all the students at your university at a margin of error of 0.3 and 95% confidence. How many students should be sampled?

Because we neither know the σ nor the S value, we use range/4 as an estimate of σ.

Conclusion

Hooray! We are done with another major topic in this statistics series. In this relatively short story, we went over the topic of determining an appropriate sample size to control sampling error within an acceptable limit. In our next story, we will move on to hypothesis testing.

留言


  • Kaggle
  • GitHub
  • Youtube
  • Linkedin

©2022 by Ben's Blog. Proudly created with Wix.com

bottom of page