ImputedData Class

class miceforest.imputed_data.ImputedData(impute_data: DataFrame, datasets: List[int], variable_schema: List[str] | Dict[str, List[str]] | None = None, save_all_iterations_data: bool = True, copy_data: bool = True, random_seed_array: ndarray | None = None)

Bases: object

complete_data(dataset: int = 0, iteration: int = -1, inplace: bool = False, variables: List[str] | None = None)

Return dataset with missing values imputed.

Parameters:
  • dataset (int) – The dataset to complete.

  • iteration (int) – Impute data with values obtained at this iteration. If -1, returns the most up-to-date iterations, even if different between variables. If not -1, iteration must have been saved in imputed values.

  • inplace (bool) – Should the data be completed in place? If True, self.working_data is imputed,and nothing is returned. This is useful if the dataset is very large. If False, a copy of the data is returned, with missing values imputed.

Return type:

The completed data, with values imputed for specified variables.

iteration_count(dataset: slice | int = slice(None, None, None), variable: slice | str = slice(None, None, None))

Grabs the iteration count for specified variables, datasets. If the iteration count is not consistent across the provided datasets/variables, an error will be thrown. Providing None will use all datasets/variables.

This is to ensure the process is in a consistent state when the iteration count is needed.

Parameters:
  • datasets (None or int) – The datasets to check the iteration count for. If None, all datasets are assumed (and assured) to have the same iteration count, otherwise error.

  • variables (str or None) – The variable to check the iteration count for. If None, all variables are assumed (and assured) to have the same iteration count, otherwise error.

Return type:

An integer representing the iteration count.

plot_imputed_distributions(variables: List[str] | None = None, iteration: int = -1)

Plot the imputed value distributions. Red lines are the distribution of original data Black lines are the distribution of the imputed values.

Parameters:
  • datasets (None, int, list[int])

  • variables (None, list[str]) – The variables to plot. If None, all numeric variables are plotted.

  • iteration (int) – The iteration to plot the distribution for. If None, the latest iteration is plotted. save_all_iterations must be True if specifying an iteration.

  • adj_args – Additional arguments passed to plt.subplots_adjust()

plot_mean_convergence(variables: List[str] | None = None)

Plots the average value and standard deviation of imputations over each iteration. The lines show the average imputation value for a dataset over the iteration. The bars show the average standard deviation of the imputation values within datasets.

Parameters:

variables (Optional[List[str]], default=None) – The variables to plot. By default, all numeric, imputed variables are plotted.