Member-only story
Using Regression Imputation to Fill in Missing Values
A step by step guide to the regression imputation method in R programming
Understanding Missing Data
Before we get started with the imputation methods, let’s understand what missing data is. Missing data refers to the absence of value in the expected place. This could happen because of various reasons such as data collection issues, data entry errors, etc.
Types of Missing Data
There are three types of missing data:
- Missing Completely at Random (MCAR): The missingness has no relationship with any values, observed or missing.
- Missing at Random (MAR): The missingness has a systematic relationship with observed values but not the missing data.
- Not Missing at Random (NMAR): The missingness has a relationship with the missing values.
There are many different types of imputation methods we can use. Some commonly referenced ones are mean/median/mode imputation. However, these will only look at the distribution of the values of the variable with missing entries. If we know there is a correlation between the missing value and other variables, we can often get better guesses by regressing the missing…