

In this post we’ll describe what we can learn from a residuals vs fitted plot, and then make the plot for several R datasets and analyze them. Here, one plots on the x-axis, and on the y-axis. You may also be interested in qq plots, scale location plots, or the residuals vs leverage plot. The coefficient, or slope, is 4.3, which indicates that, for every hour of training, the mean test score increases by 4.3 points.įor more information on coefficients, go to Regression coefficients.In this post we describe the fitted vs residuals plot, which allows us to detect several types of violations in the linear regression assumptions.

In the equation, x is the hours of in-house training (from 0 to 20) and y is the test score. If the coefficient is positive, as the term increases, the mean value of the response increases.įor example, a manager determines that an employee's score on a job skills test can be predicted using the regression model, y = 130 + 4.3x. If the coefficient is negative, as the term increases, the mean value of the response decreases.

The sign of the coefficient indicates the direction of the relationship between the term and the response. The coefficient of the term represents the change in the mean response for one-unit change in that term. In the regression equation, Y is the response variable, b 0 is the constant or intercept, b 1 is the estimated coefficient for the linear term (also known as the slope of the line), and x 1 is the value of the term. The regression equation for the linear model takes the following form: Y= b 0 + b 1x 1. The regression equation is an algebraic representation of the regression line. Use the regression equation to describe the relationship between the response and the terms in the model. If the p-value of the term is significant, you can examine the regression equation and the coefficients to understand how the term is related to the response. For more information on detecting outliers, go to Unusual observations. Consider removing data values that are associated with abnormal, one-time events (special causes). Correct any data entry or measurement errors. Try to identify the cause of any outliers. Look for any outliers, which can have a strong effect on the results.

Check the p-value for the terms in the model to make sure they are statistically significant, and apply process knowledge to evaluate practical significance. To determine which model is best, examine the plot and the goodness-of-fit statistics. If you fit a linear model and see curvature in the data, repeat the analysis and select the quadratic or cubic model.
