Refinement

Refinement module for Macrodata Refinement (MDR).

This module provides functions and classes for refining macrodata through various statistical and analytical methods.

class mdr.core.refinement.RefinementConfig(smoothing_factor, outlier_threshold, imputation_method, normalization_type)[source]

Bases: object

Configuration for data refinement operations.

Parameters:
  • smoothing_factor (float)

  • outlier_threshold (float)

  • imputation_method (str)

  • normalization_type (str)

smoothing_factor: float
outlier_threshold: float
imputation_method: str
normalization_type: str
__post_init__()[source]

Validate the configuration parameters.

Return type:

None

mdr.core.refinement.smooth_data(data, factor)[source]

Apply smoothing to the input data.

Parameters:
  • data (<MagicMock id='136017417386736'>) – Input data array to smooth

  • factor (float) – Smoothing factor (0 < factor <= 1)

Returns:

Smoothed data array

Return type:

<MagicMock id=’136017417378960’>

mdr.core.refinement.remove_outliers(data, threshold)[source]

Remove outliers from the data using the specified threshold.

Parameters:
  • data (<MagicMock id='136017412148240'>) – Input data array

  • threshold (float) – Z-score threshold for outlier detection

Returns:

Data array with outliers replaced by median values

Return type:

<MagicMock id=’136017412156016’>

mdr.core.refinement.impute_missing_values(data, method='mean', window_size=3)[source]

Impute missing values in the data.

Parameters:
  • data (<MagicMock id='136017419749648'>) – Input data array with potential NaN values

  • method (str) – Imputation method (‘mean’, ‘median’, ‘linear’, ‘forward’)

  • window_size (int) – Size of the window for local imputation methods

Returns:

Data array with missing values imputed

Return type:

<MagicMock id=’136017419757424’>

mdr.core.refinement.refine_data(data, config)[source]

Apply a complete refinement pipeline to the data.

Parameters:
  • data (<MagicMock id='136017419650624'>) – Input data array

  • config (RefinementConfig) – Refinement configuration

Returns:

Refined data array

Return type:

<MagicMock id=’136017419658400’>

mdr.core.refinement.apply_refinement_pipeline(data_dict, config)[source]

Apply refinement pipeline to a dictionary of data arrays.

Parameters:
  • data_dict (Dict[str, <MagicMock id='136017414292288'>]) – Dictionary mapping variable names to data arrays

  • config (RefinementConfig) – Refinement configuration

Returns:

Dictionary with refined data arrays

Return type:

Dict[str, <MagicMock id=’136017414160496’>]

Overview

The refinement module provides functions and classes for refining macrodata through various statistical and analytical methods. Key capabilities include:

  • Removing outliers from data

  • Imputing missing values

  • Smoothing noisy data

  • Applying a complete refinement pipeline

Core Components

RefinementConfig

class mdr.core.refinement.RefinementConfig(smoothing_factor, outlier_threshold, imputation_method, normalization_type)[source]

Configuration for data refinement operations.

Parameters:
  • smoothing_factor (float)

  • outlier_threshold (float)

  • imputation_method (str)

  • normalization_type (str)

__post_init__()[source]

Validate the configuration parameters.

Return type:

None

The RefinementConfig class is used to configure the refinement process, specifying parameters such as smoothing factor, outlier threshold, imputation method, and normalization type.

Data Refinement Functions

mdr.core.refinement.smooth_data(data, factor)[source]

Apply smoothing to the input data.

Parameters:
  • data (<MagicMock id='136017417386736'>) – Input data array to smooth

  • factor (float) – Smoothing factor (0 < factor <= 1)

Returns:

Smoothed data array

Return type:

<MagicMock id=’136017417378960’>

mdr.core.refinement.remove_outliers(data, threshold)[source]

Remove outliers from the data using the specified threshold.

Parameters:
  • data (<MagicMock id='136017412148240'>) – Input data array

  • threshold (float) – Z-score threshold for outlier detection

Returns:

Data array with outliers replaced by median values

Return type:

<MagicMock id=’136017412156016’>

mdr.core.refinement.impute_missing_values(data, method='mean', window_size=3)[source]

Impute missing values in the data.

Parameters:
  • data (<MagicMock id='136017419749648'>) – Input data array with potential NaN values

  • method (str) – Imputation method (‘mean’, ‘median’, ‘linear’, ‘forward’)

  • window_size (int) – Size of the window for local imputation methods

Returns:

Data array with missing values imputed

Return type:

<MagicMock id=’136017419757424’>

mdr.core.refinement.refine_data(data, config)[source]

Apply a complete refinement pipeline to the data.

Parameters:
  • data (<MagicMock id='136017419650624'>) – Input data array

  • config (RefinementConfig) – Refinement configuration

Returns:

Refined data array

Return type:

<MagicMock id=’136017419658400’>

mdr.core.refinement.apply_refinement_pipeline(data_dict, config)[source]

Apply refinement pipeline to a dictionary of data arrays.

Parameters:
  • data_dict (Dict[str, <MagicMock id='136017414292288'>]) – Dictionary mapping variable names to data arrays

  • config (RefinementConfig) – Refinement configuration

Returns:

Dictionary with refined data arrays

Return type:

Dict[str, <MagicMock id=’136017414160496’>]

Usage Examples

Basic refinement of a single data array:

import numpy as np
from mdr.core.refinement import RefinementConfig, refine_data

# Create sample data with outliers and missing values
data = np.array([1.0, 2.0, np.nan, 4.0, 100.0])

# Configure refinement
config = RefinementConfig(
    smoothing_factor=0.2,
    outlier_threshold=2.5,
    imputation_method="linear",
    normalization_type="minmax"
)

# Refine the data
refined_data = refine_data(data, config)

print("Original data:", data)
print("Refined data:", refined_data)

Refinement of multiple variables:

import numpy as np
from mdr.core.refinement import RefinementConfig, apply_refinement_pipeline

# Create a dictionary of data variables
data_dict = {
    "temperature": np.array([20.5, 21.3, np.nan, 21.7, 45.0]),
    "pressure": np.array([101.3, 101.4, 80.0, np.nan, np.nan])
}

# Configure refinement
config = RefinementConfig(
    smoothing_factor=0.2,
    outlier_threshold=2.5,
    imputation_method="linear",
    normalization_type="minmax"
)

# Apply refinement to all variables
refined_dict = apply_refinement_pipeline(data_dict, config)