Welcome to miceforest’s Documentation!
This documentation is meant to describe class methods and parameters only, for a thorough walkthrough of usage, please see the Github README.
In general, the user will only be interacting with these two classes:
Classes:
How miceforest Works
Multiple Imputation by Chained Equations ‘fills in’ (imputes) missing data in a dataset through an iterative series of predictive models. In each iteration, each specified variable in the dataset is imputed using the other variables in the dataset. These iterations should be run until it appears that convergence has been met.
This process is continued until all specified variables have been imputed. Additional iterations can be run if it appears that the average imputed values have not converged, although no more than 5 iterations are usually necessary.
This package provides fast, memory efficient Multiple Imputation by Chained Equations (MICE) with lightgbm. The R version of this package may be found here.
miceforest was designed to be:
- Fast
Uses lightgbm as a backend
Has efficient mean matching solutions.
Can utilize GPU training
- Flexible
Can impute pandas dataframes
Handles categorical data automatically
Fits into a sklearn pipeline
User can customize every aspect of the imputation process
- Production Ready
Can impute new, unseen datasets quickly
Kernels are efficiently compressed during saving and loading
Data can be imputed in place to save memory
Can build models on non-missing data