
Welcome to miceforest’s documentation!
miceforest
imputes missing data using LightGBM in an iterative method known as Multiple Imputation by Chained Equations (MICE). It was designed to be:
- Fast
Uses lightgbm as a backend
Has efficient mean matching solutions.
Can utilize GPU training
- Flexible
Can impute pandas dataframes and numpy arrays
Handles categorical data automatically
Fits into a sklearn pipeline
User can customize every aspect of the imputation process
- Production Ready
Can impute new, unseen datasets very quickly
Kernels are efficiently compressed during saving and loading
Data can be imputed in place to save memory
Can build models on non-missing data
There are very extensive beginner and advanced tutorials on the github readme. Below is a table of contents for the topics covered:
Contents: