miceforest.ImputedData
- class miceforest.ImputedData(impute_data, datasets=5, variable_schema=None, imputation_order='ascending', train_nonmissing=False, categorical_feature='auto', save_all_iterations=True, copy_data=True)[source]
Bases:
object
Imputed Data
This class should not be instantiated directly. Instead, it is returned when ImputationKernel.impute_new_data() is called. For parameter arguments, see ImputationKernel documentation.
- __init__(impute_data, datasets=5, variable_schema=None, imputation_order='ascending', train_nonmissing=False, categorical_feature='auto', save_all_iterations=True, copy_data=True)[source]
Methods
__init__
(impute_data[, datasets, ...])complete_data
([dataset, iteration, inplace, ...])Return dataset with missing values imputed.
dataset_count
()Return the number of datasets.
get_correlations
(datasets, variables)Return the correlations between datasets for the specified variables.
get_means
(datasets[, variables])Return a dict containing the average imputation value for specified variables at each iteration.
iteration_count
([datasets, variables])Grabs the iteration count for specified variables, datasets.
plot_correlations
([datasets, variables])Plot the correlations between datasets.
plot_imputed_distributions
([datasets, ...])Plot the imputed value distributions.
plot_mean_convergence
([datasets, variables])Plots the average value of imputations over each iteration.
- complete_data(dataset=0, iteration=None, inplace=False, variables=None)[source]
Return dataset with missing values imputed.
- Parameters
dataset (int) – The dataset to complete.
iteration (int) – Impute data with values obtained at this iteration. If None, returns the most up-to-date iterations, even if different between variables. If not none, iteration must have been saved in imputed values.
inplace (bool) – Should the data be completed in place? If True, self.working_data is imputed,and nothing is returned. This is useful if the dataset is very large. If False, a copy of the data is returned, with missing values imputed.
- Returns
- Return type
The completed data, with values imputed for specified variables.
- dataset_count()[source]
Return the number of datasets. Datasets are defined by how many different sets of imputation values we have accumulated.
- get_correlations(datasets, variables)[source]
Return the correlations between datasets for the specified variables.
- Parameters
variables (list[str], list[int]) – The variables to return the correlations for.
- Returns
The correlations at each iteration for the specified variables.
- Return type
dict
- get_means(datasets, variables=None)[source]
Return a dict containing the average imputation value for specified variables at each iteration.
- iteration_count(datasets=None, variables=None)[source]
Grabs the iteration count for specified variables, datasets. If the iteration count is not consistent across the provided datasets/variables, an error will be thrown. Providing None will use all datasets/variables.
This is to ensure the process is in a consistent state when the iteration count is needed.
- Parameters
datasets (int or list[int]) – The datasets to check the iteration count for.
variables (int, str, list[int] or list[str]:) – The variables to check the iteration count for. Variables can be specified by their names or indexes.
- Returns
- Return type
An integer representing the iteration count.
- plot_correlations(datasets=None, variables=None, **adj_args)[source]
Plot the correlations between datasets. See get_correlations() for more details.
- Parameters
datasets (None or list[int]) – The datasets to plot.
variables (None,list) – The variables to plot.
adj_args – Additional arguments passed to plt.subplots_adjust()
- plot_imputed_distributions(datasets=None, variables=None, iteration=None, **adj_args)[source]
Plot the imputed value distributions. Red lines are the distribution of original data Black lines are the distribution of the imputed values.
- Parameters
datasets (None, int, list[int]) –
variables (None, str, int, list[str], or list[int]) – The variables to plot. If None, all numeric variables are plotted.
iteration (None, int) – The iteration to plot the distribution for. If None, the latest iteration is plotted. save_all_iterations must be True if specifying an iteration.
adj_args – Additional arguments passed to plt.subplots_adjust()