top of page

Inferring the Slope: Understanding Regression Models

  • 作家相片: Benjamin Chen
    Benjamin Chen
  • 2022年2月16日
  • 讀畢需時 4 分鐘

已更新:2024年1月22日

Slope is, without doubt, one of the most important values in regression. Recall from the previous story where we interpreted the value of slope (b1).

In this example, the slope is 0.0629 which means that “for every 1-unit increase in the X variable, Y is expected to increase on average by 0.0629. The slope tells us critical information about the relationship between our predictor and response variable. Generally,

  • if slope is positive, then X and Y exists a positive linear relationship.

  • if slope is negative, then X and Y exists a negative linear relationship.

  • if slope is zero, then there is no linear relationship between X and Y

On many occasions, all we are interested is if there exhibits a relationship between X and Y (regardless of it being positive or negative). This is where inference about the slope comes into play. We want to test to see if the slope is significantly different from 0, as to conclude whether there is a linear relationship between X and Y.

Inferential Statistics


Let’s quickly review a little bit of inferential statistics in the context of slope. The population regression line is denoted as:

The population y-intercept and population slope (these are population parameters) are denoted in β0 and β1. But since population data are usually not entirely available, our regression line is usually built upon a sample data. The sample regression line is denoted as:

The sample y-intercept and sample slope (these are sample statistics) are denoted in lowercase b0 and b1. Let’s put our focus on the slope b1 and ignore the y-intercept b0 for now.


The sample slope is a point estimate of the population slope. To estimate the population slope using the sample slope, we must consider the sampling distribution of the sample slope b1. (For those who need to review sampling distribution, click this link) In short, the value of the sample slope depends on the sample data you extracted from the population. This means that the sample slope itself, as an estimator, also varies. The sampling distribution is simply the distribution for sample slope.

Sampling Distribution of b1


We can derive the sampling distribution of b1 from the population data. Recall the population regression line:

We usually assume the error term to be normally distributed with:

Because of this assumption, the dependent variable Yi would also be independent and normally distributed with:

Since Yi is normal, we can further conclude that the sampling distribution of b1 is also normal with the following mean and variance:

σ², however, isn't usually available (the variance of the error term of the population regression line). Therefore, we have to estimate it with the the variance of the errors around the sample regression line, also called mean squared error (MSE). In cased you haven't realized, MSE is an unbiased estimator of σ².

Because we're estimating the variance of the error, the variance for the sampling distribution of b1 would consequently become:

Okay, that was a lot numbers and formulas. Don't worry too much about it! The key point here is we derived the mean and the variance (also STD) for the sampling distribution of b1. After we have these two values, we can easily derive the confidence interval and perform hypothesis testing for the population regression slope.

Confidence Intervals for the Population Regression Slope


After we confirm that the sampling distribution of b1 is normal as well as its mean and variance, we can derive the confidence interval for the population regression slope.

Therefore, the confidence interval for population regression slope would become:

Our interpretation for confidence interval would be:




For those who need to review confidence interval estimation, click this link

Hypothesis Testing for Population Regression Slope


Remember our ultimate objective is to test whether slope is significantly different from 0. To test this, we must follow the typical procedures of a hypothesis testing. For those who need to review hypothesis testing, click this link.


Step 1: Define your null and alternative hypothesis

Step 2: Compute t test statistics

Step 3: Compare t test statistics to critical value (critical value approach) or compute the p-value and compare it to the level of confidence (p-value approach)


Step 4: Conclude


Assuming a 95% confidence level, if the t test statistics is greater than 1.96 (critical value approach) or if the p-value is less than 0.05 (p-value approach), then we reject the null hypothesis. When we reject the null hypothesis, we are concluding that the slope is significantly different from 0 and a linear relationship exists between the response and predictor variable.

Regression Summary Output


When we perform regression analysis through running computer software, the regression output will typically look like this:

By now you should be able to understand what most of these numbers represent. Since this topic focuses on inferences about the slope, the variable of interest is the p-value. The value instantly tells us whether the slope between the X variable and the y variable is significantly different from 0. In the output above, the p-value for variable X is 0.0123, which is less than 0.05. We can say that we are 95% confident that the slope is significantly different from 0.

Conclusion


Yes! We are done with another chapter on simple linear regression. We talked about the inferences about the slope and why it is often the variable of interest for researchers. In today’s era, calculations are mostly performed automatically by computers, so knowing how to interpret a regression summary output is more practically useful (e.g. looking at the p-value and knowing how to interpret it). Nonetheless, it’s still important to understand the computational logic behind these numbers, as to truly understand their meaning. In our next story, we will go over the Analysis of Variance (ANOVA) table and the F-test.

留言


  • Kaggle
  • GitHub
  • Youtube
  • Linkedin

©2022 by Ben's Blog. Proudly created with Wix.com

bottom of page