In-sample and out-of-sample forecasting are two approaches used when evaluating the performance of predictive models. In-sample forecasting involves using historical data that was included in the model’s training set. Here, a model is fitted directly on this data, and its predictions can be assessed on the same dataset. This allows developers to measure how accurately the model can predict outcomes based on known information. However, this approach may not provide a realistic measure of the model’s performance since it has "seen" the data before.
On the other hand, out-of-sample forecasting refers to predicting outcomes based on data that was not used during the model training process. This typically involves splitting the data into training and testing sets. A common practice is to train the model on a portion of the data (the training set) and then evaluate its performance on the remaining data (the testing set). This approach provides a better indication of how the model will perform in real-world scenarios where new, unseen data is encountered. For example, if you build a model to predict stock prices, you would train it on historical prices from the last ten years and then evaluate it on the following year’s data.
In summary, the key difference lies in the data used for evaluation. In-sample forecasting tests the model on the same data it was trained with, which may not reliably reflect its predictive capabilities. In contrast, out-of-sample forecasting uses separate data to assess how the model generalizes to new situations. For effective model evaluation, relying primarily on out-of-sample data is crucial, as it mimics practical applications more closely and helps identify potential overfitting issues.