miceforest logo.

Welcome to miceforest’s documentation!

miceforest imputes missing data using random forests in an iterative method known as Multiple Imputation by Chained Equations (MICE). It was designed to be:

  • Fast Uses lightgbm as a backend, and has efficient mean matching solutions.

  • Memory Efficient Capable of performing multiple imputation without copying the dataset. If the dataset can fit in memory, it can (probably) be imputed.

  • Flexible Can handle pandas DataFrames and numpy arrays. The imputation process can be completely customized. Can handle categorical data automatically.

  • Used In Production Kernels can be saved and impute new, unseen datasets. Imputing new data is often orders of magnitude faster than including the new data in a new mice procedure. Imputation models can be built off of a kernel dataset, even if there are no missing values. New data can also be imputed in place.

There are very extensive beginner and advanced tutorials on the github readme. Below is a table of contents for the topics covered:

Tutorial Table of Contents: