Sitemap

Member-only story

Using Regression Imputation to Fill in Missing Values

A step by step guide to the regression imputation method in R programming

Nina Chen
6 min readDec 6, 2023

--

Understanding Missing Data

Before we get started with the imputation methods, let’s understand what missing data is. Missing data refers to the absence of value in the expected place. This could happen because of various reasons such as data collection issues, data entry errors, etc.

Types of Missing Data

There are three types of missing data:

  1. Missing Completely at Random (MCAR): The missingness has no relationship with any values, observed or missing.
  2. Missing at Random (MAR): The missingness has a systematic relationship with observed values but not the missing data.
  3. Not Missing at Random (NMAR): The missingness has a relationship with the missing values.

There are many different types of imputation methods we can use. Some commonly referenced ones are mean/median/mode imputation. However, these will only look at the distribution of the values of the variable with missing entries. If we know there is a correlation between the missing value and other variables, we can often get better guesses by regressing the missing…

Nina Chen
Nina Chen

Written by Nina Chen

Passionate about AWS, ML, data science, sustainability, and everything in between.

No responses yet