No interaction effects are in the equation for the response that we defined as Y = A + B + C + D + E + F + G. Remember that we know that this model is wrong. The predictions for the model are now stored in the worksheet.
Under Add terms using selected predictors and model terms in Interactions through order, select 4.In Continuous predictors, enter ' A_1'-'D_1'.Choose Stat > Regression > Regression > Fit Regression Model.
Here's how to fit that model in Minitab Statistical Software: To start, I'll try fitting a model that has all the predictors that we can use in the training data set, and all of the interactions between those terms. For clarity, I'll append _1 to the variable names when I'm using the training data set, and _2 to the names when I'm using the validation data set. Then we’ll do regression on the training sample to identify some models we think are the most like the real relationship.
Let’s say that we collect 500 data points and decide that we can use half to train the model and half to validate the model. E, F, and G are independent of the variables that we can include in the model. The remaining predictors, E, F, and G are real variables, but they’re going to become part of the error variation in our analysis. For our example, we’re going to assume that the data we can collect for prediction are only A, B, C, and D. But it's impossible to account for every variable that affects the number of points scored every game. For example, we can make a good guess about the number of points a basketball player will score in his next game based on the player's historical performance, the opponent's quality, and various other factors. In regression, we usually cannot measure or identify all of the predictor variables that influence a response variable. Let’s say that we have some data where we know that Y = A + B + C + D + E + F + G. If you do, consider skipping the steps where I set the base for the random numbers: If you produce different random numbers, the conclusion of the exercise will still be the same for almost everyone!) (The steps I used to set up the data appear at the end, if you want to follow along. This will let me show you how Minitab Statistical Software’s Predict makes it easy to get the numbers that you need to evaluate your model with the training data set. I’m going to use a hypothetical example so that you can see how it works when we really know the correct model to use. Regression is a good analysis for this, because a validation data set can help you to verify that you’ve selected the best model. Last time I posted, I showed you how to divide a data set into training and validation samples in Minitab with the promise that next time I would show you a way to use the validation sample.