miceforest logo.

Welcome to miceforest’s documentation!

miceforest imputes missing data using LightGBM in an iterative method known as Multiple Imputation by Chained Equations (MICE). It was designed to be:

  • Fast
    • Uses lightgbm as a backend

    • Has efficient mean matching solutions.

    • Can utilize GPU training

  • Flexible
    • Can impute pandas dataframes and numpy arrays

    • Handles categorical data automatically

    • Fits into a sklearn pipeline

    • User can customize every aspect of the imputation process

  • Production Ready
    • Can impute new, unseen datasets very quickly

    • Kernels are efficiently compressed during saving and loading

    • Data can be imputed in place to save memory

    • Can build models on non-missing data

There are very extensive beginner and advanced tutorials on the github readme. Below is a table of contents for the topics covered: