ImputedData Class
- class miceforest.imputed_data.ImputedData(impute_data: DataFrame, datasets: List[int], variable_schema: List[str] | Dict[str, List[str]] | None = None, save_all_iterations_data: bool = True, copy_data: bool = True, random_seed_array: ndarray | None = None)
Bases:
object- complete_data(dataset: int = 0, iteration: int = -1, inplace: bool = False, variables: List[str] | None = None)
Return dataset with missing values imputed.
- Parameters:
dataset (int) – The dataset to complete.
iteration (int) – Impute data with values obtained at this iteration. If
-1, returns the most up-to-date iterations, even if different between variables. If not -1, iteration must have been saved in imputed values.inplace (bool) – Should the data be completed in place? If True, self.working_data is imputed,and nothing is returned. This is useful if the dataset is very large. If False, a copy of the data is returned, with missing values imputed.
- Return type:
The completed data, with values imputed for specified variables.
- iteration_count(dataset: slice | int = slice(None, None, None), variable: slice | str = slice(None, None, None))
Grabs the iteration count for specified variables, datasets. If the iteration count is not consistent across the provided datasets/variables, an error will be thrown. Providing None will use all datasets/variables.
This is to ensure the process is in a consistent state when the iteration count is needed.
- Parameters:
datasets (None or int) – The datasets to check the iteration count for. If
None, all datasets are assumed (and assured) to have the same iteration count, otherwise error.variables (str or None) – The variable to check the iteration count for. If
None, all variables are assumed (and assured) to have the same iteration count, otherwise error.
- Return type:
An integer representing the iteration count.
- plot_imputed_distributions(variables: List[str] | None = None, iteration: int = -1)
Plot the imputed value distributions. Red lines are the distribution of original data Black lines are the distribution of the imputed values.
- Parameters:
datasets (None, int, list[int])
variables (None, list[str]) – The variables to plot. If None, all numeric variables are plotted.
iteration (int) – The iteration to plot the distribution for. If None, the latest iteration is plotted. save_all_iterations must be True if specifying an iteration.
adj_args – Additional arguments passed to plt.subplots_adjust()
- plot_mean_convergence(variables: List[str] | None = None)
Plots the average value and standard deviation of imputations over each iteration. The lines show the average imputation value for a dataset over the iteration. The bars show the average standard deviation of the imputation values within datasets.
- Parameters:
variables (Optional[List[str]], default=None) – The variables to plot. By default, all numeric, imputed variables are plotted.