Usage Guide
This guide demonstrates how to use the Macrodata Refinement (MDR) package for data processing, validation, and transformation.
Basic Workflow
A typical MDR workflow involves the following steps:
Loading data
Validating data quality
Refining data (removing outliers, imputing missing values, smoothing)
Transforming data
Visualizing results
Saving processed data
Quick Example
Here’s a quick example that demonstrates these steps:
import numpy as np
from mdr.core.refinement import RefinementConfig, refine_data
from mdr.core.validation import validate_data
from mdr.visualization.plots import plot_refinement_comparison
import matplotlib.pyplot as plt
# 1. Create sample data with outliers and missing values
data = np.array([1.0, 2.0, np.nan, 4.0, 100.0])
# 2. Validate the data
validation_result = validate_data(
{"sample": data},
checks=["missing", "outliers"],
params={
"missing": {"threshold": 0.1},
"outliers": {"threshold": 2.5}
}
)
# Print validation results
for var_name, result in validation_result.items():
if result.is_valid:
print(f"{var_name} passed validation")
else:
print(f"{var_name} failed validation:")
for msg in result.error_messages:
print(f" - {msg}")
# 3. Configure refinement
config = RefinementConfig(
smoothing_factor=0.2,
outlier_threshold=2.5,
imputation_method="linear",
normalization_type="minmax"
)
# 4. Refine the data
refined_data = refine_data(data, config)
# 5. Visualize the results
fig, axes = plot_refinement_comparison(data, refined_data)
plt.tight_layout()
plt.show()
print("Original data:", data)
print("Refined data:", refined_data)
Data Refinement
Data refinement is the core functionality of MDR. It includes outlier removal, missing value imputation, and data smoothing.
Creating a Refinement Configuration
First, create a configuration object that specifies how the refinement should be performed:
from mdr.core.refinement import RefinementConfig
config = RefinementConfig(
smoothing_factor=0.2, # Smoothing intensity (0-1)
outlier_threshold=2.5, # Z-score threshold for outliers
imputation_method="linear", # Method for filling missing values
normalization_type="minmax" # Type of normalization to apply
)
Applying Refinement
You can refine a single data array:
from mdr.core.refinement import refine_data
refined_data = refine_data(data, config)
Or refine multiple variables at once:
from mdr.core.refinement import apply_refinement_pipeline
data_dict = {
"temperature": np.array([20.5, 21.3, np.nan, 21.7, 45.0]),
"pressure": np.array([101.3, 101.4, 80.0, np.nan, np.nan])
}
refined_dict = apply_refinement_pipeline(data_dict, config)
Data Validation
MDR provides tools to validate data quality before refinement.
Available Validation Checks
Range: Check if values are within expected ranges
Missing: Check the percentage of missing values
Outliers: Identify statistical outliers
Consistency: Check for internal consistency between variables
Validation Example
from mdr.core.validation import validate_data
validation_results = validate_data(
data_dict,
checks=["range", "missing", "outliers"],
params={
"range": {
"min_value": 0.0,
"max_value": 100.0
},
"missing": {
"threshold": 0.1 # Allow up to 10% missing values
},
"outliers": {
"threshold": 2.5, # Z-score threshold for outliers
"method": "zscore"
}
}
)
Data Transformation
After refining your data, you may need to transform it for further analysis.
Available Transformations
Normalize: Scale data to a standard range
Scale: Apply linear scaling
Log: Apply logarithmic transformation
Power: Apply power transformation
Transformation Example
from mdr.core.transformation import transform_data
transformations = [
{"type": "normalize", "method": "minmax"},
{"type": "scale", "factor": 2.0, "offset": 1.0}
]
transformed_data = transform_data(refined_data, transformations)
Visualization
MDR provides various visualization tools to help understand your data before and after processing.
Time Series Plots
from mdr.visualization.plots import plot_time_series
fig, ax = plot_time_series(data_dict, time_values)
plt.show()
Refinement Comparison
from mdr.visualization.plots import plot_refinement_comparison
fig, axes = plot_refinement_comparison(original_data, refined_data)
plt.show()
Validation Results
from mdr.visualization.plots import plot_validation_results
fig, axes = plot_validation_results(validation_results)
plt.show()
Command-Line Interface
MDR provides a command-line interface for common operations:
Refining Data
mdr refine input.csv output.csv --smoothing-factor 0.2 --outlier-threshold 3.0
Validating Data
mdr validate input.csv --output-file validation_results.json
Converting File Formats
mdr convert input.csv output.parquet
Advanced Usage
For more advanced usage examples, please refer to the Examples section, which includes:
Working with multiple data sources
Custom validation strategies
Integration with other analysis workflows
API server deployment