Welcome to miceforest’s Documentation!

This documentation is meant to describe class methods and parameters only, for a thorough walkthrough of usage, please see the Github README.

In general, the user will only be interacting with these two classes:

Classes:

How miceforest Works

Multiple Imputation by Chained Equations ‘fills in’ (imputes) missing data in a dataset through an iterative series of predictive models. In each iteration, each specified variable in the dataset is imputed using the other variables in the dataset. These iterations should be run until it appears that convergence has been met.

This process is continued until all specified variables have been imputed. Additional iterations can be run if it appears that the average imputed values have not converged, although no more than 5 iterations are usually necessary.

This package provides fast, memory efficient Multiple Imputation by Chained Equations (MICE) with lightgbm. The R version of this package may be found here.

miceforest was designed to be:

Fast

Uses lightgbm as a backend

Has efficient mean matching solutions.

Can utilize GPU training

Flexible

Can impute pandas dataframes

Handles categorical data automatically

Fits into a sklearn pipeline

User can customize every aspect of the imputation process

Production Ready

Can impute new, unseen datasets quickly

Kernels are efficiently compressed during saving and loading

Data can be imputed in place to save memory

Can build models on non-missing data