Refinement
Refinement module for Macrodata Refinement (MDR).
This module provides functions and classes for refining macrodata through various statistical and analytical methods.
- class mdr.core.refinement.RefinementConfig(smoothing_factor, outlier_threshold, imputation_method, normalization_type)[source]
Bases:
objectConfiguration for data refinement operations.
- Parameters:
- mdr.core.refinement.smooth_data(data, factor)[source]
Apply smoothing to the input data.
- Parameters:
data (<MagicMock id='136017417386736'>) – Input data array to smooth
factor (float) – Smoothing factor (0 < factor <= 1)
- Returns:
Smoothed data array
- Return type:
<MagicMock id=’136017417378960’>
- mdr.core.refinement.remove_outliers(data, threshold)[source]
Remove outliers from the data using the specified threshold.
- Parameters:
data (<MagicMock id='136017412148240'>) – Input data array
threshold (float) – Z-score threshold for outlier detection
- Returns:
Data array with outliers replaced by median values
- Return type:
<MagicMock id=’136017412156016’>
- mdr.core.refinement.impute_missing_values(data, method='mean', window_size=3)[source]
Impute missing values in the data.
- Parameters:
- Returns:
Data array with missing values imputed
- Return type:
<MagicMock id=’136017419757424’>
- mdr.core.refinement.refine_data(data, config)[source]
Apply a complete refinement pipeline to the data.
- Parameters:
data (<MagicMock id='136017419650624'>) – Input data array
config (RefinementConfig) – Refinement configuration
- Returns:
Refined data array
- Return type:
<MagicMock id=’136017419658400’>
- mdr.core.refinement.apply_refinement_pipeline(data_dict, config)[source]
Apply refinement pipeline to a dictionary of data arrays.
- Parameters:
data_dict (Dict[str, <MagicMock id='136017414292288'>]) – Dictionary mapping variable names to data arrays
config (RefinementConfig) – Refinement configuration
- Returns:
Dictionary with refined data arrays
- Return type:
Overview
The refinement module provides functions and classes for refining macrodata through various
statistical and analytical methods. Key capabilities include:
Removing outliers from data
Imputing missing values
Smoothing noisy data
Applying a complete refinement pipeline
Core Components
RefinementConfig
- class mdr.core.refinement.RefinementConfig(smoothing_factor, outlier_threshold, imputation_method, normalization_type)[source]
Configuration for data refinement operations.
- Parameters:
- __post_init__()[source]
Validate the configuration parameters.
- Return type:
None
The RefinementConfig class is used to configure the refinement process, specifying parameters
such as smoothing factor, outlier threshold, imputation method, and normalization type.
Data Refinement Functions
- mdr.core.refinement.smooth_data(data, factor)[source]
Apply smoothing to the input data.
- Parameters:
data (<MagicMock id='136017417386736'>) – Input data array to smooth
factor (float) – Smoothing factor (0 < factor <= 1)
- Returns:
Smoothed data array
- Return type:
<MagicMock id=’136017417378960’>
- mdr.core.refinement.remove_outliers(data, threshold)[source]
Remove outliers from the data using the specified threshold.
- Parameters:
data (<MagicMock id='136017412148240'>) – Input data array
threshold (float) – Z-score threshold for outlier detection
- Returns:
Data array with outliers replaced by median values
- Return type:
<MagicMock id=’136017412156016’>
- mdr.core.refinement.impute_missing_values(data, method='mean', window_size=3)[source]
Impute missing values in the data.
- Parameters:
- Returns:
Data array with missing values imputed
- Return type:
<MagicMock id=’136017419757424’>
- mdr.core.refinement.refine_data(data, config)[source]
Apply a complete refinement pipeline to the data.
- Parameters:
data (<MagicMock id='136017419650624'>) – Input data array
config (RefinementConfig) – Refinement configuration
- Returns:
Refined data array
- Return type:
<MagicMock id=’136017419658400’>
- mdr.core.refinement.apply_refinement_pipeline(data_dict, config)[source]
Apply refinement pipeline to a dictionary of data arrays.
- Parameters:
data_dict (Dict[str, <MagicMock id='136017414292288'>]) – Dictionary mapping variable names to data arrays
config (RefinementConfig) – Refinement configuration
- Returns:
Dictionary with refined data arrays
- Return type:
Usage Examples
Basic refinement of a single data array:
import numpy as np
from mdr.core.refinement import RefinementConfig, refine_data
# Create sample data with outliers and missing values
data = np.array([1.0, 2.0, np.nan, 4.0, 100.0])
# Configure refinement
config = RefinementConfig(
smoothing_factor=0.2,
outlier_threshold=2.5,
imputation_method="linear",
normalization_type="minmax"
)
# Refine the data
refined_data = refine_data(data, config)
print("Original data:", data)
print("Refined data:", refined_data)
Refinement of multiple variables:
import numpy as np
from mdr.core.refinement import RefinementConfig, apply_refinement_pipeline
# Create a dictionary of data variables
data_dict = {
"temperature": np.array([20.5, 21.3, np.nan, 21.7, 45.0]),
"pressure": np.array([101.3, 101.4, 80.0, np.nan, np.nan])
}
# Configure refinement
config = RefinementConfig(
smoothing_factor=0.2,
outlier_threshold=2.5,
imputation_method="linear",
normalization_type="minmax"
)
# Apply refinement to all variables
refined_dict = apply_refinement_pipeline(data_dict, config)