Data Readers
Data readers for Macrodata Refinement (MDR).
This module provides functions and classes for reading macrodata from various file formats.
- class mdr.io.readers.DataSource(value)[source]
Bases:
EnumTypes of data sources.
- FILE = 1
- DATABASE = 2
- API = 3
- MEMORY = 4
- class mdr.io.readers.DataReader(source_type=DataSource.FILE)[source]
Bases:
ABCAbstract base class for data readers.
- Parameters:
source_type (DataSource)
- __init__(source_type=DataSource.FILE)[source]
Initialize the data reader.
- Parameters:
source_type (DataSource) – Type of data source
- class mdr.io.readers.FileReader(encoding='utf-8')[source]
Bases:
DataReaderBase class for file-based data readers.
- Parameters:
encoding (str)
- class mdr.io.readers.CSVReader(delimiter=',', quotechar='"', encoding='utf-8')[source]
Bases:
FileReaderReader for CSV files.
- read(source, header=True, index_col=None, na_values=None, parse_dates=False, **options)[source]
Read data from a CSV file.
- Parameters:
- Returns:
Dictionary mapping column names to data arrays
- Return type:
- class mdr.io.readers.JSONReader(encoding='utf-8')[source]
Bases:
FileReaderReader for JSON files.
- Parameters:
encoding (str)
- class mdr.io.readers.ExcelReader(encoding='utf-8')[source]
Bases:
FileReaderReader for Excel files.
- Parameters:
encoding (str)
- class mdr.io.readers.ParquetReader(encoding='utf-8')[source]
Bases:
FileReaderReader for Parquet files.
- Parameters:
encoding (str)
- class mdr.io.readers.HDF5Reader(encoding='utf-8')[source]
Bases:
FileReaderReader for HDF5 files.
- Parameters:
encoding (str)
- mdr.io.readers.get_reader(file_type, **options)[source]
Get a reader for the specified file type.
- Parameters:
file_type (str) – Type of file (‘csv’, ‘json’, ‘excel’, ‘parquet’, ‘hdf5’)
**options – Additional options for the reader
- Returns:
Appropriate DataReader instance
- Return type:
- mdr.io.readers.read_csv(filepath, delimiter=',', header=True, **options)[source]
Read data from a CSV file.
- mdr.io.readers.read_json(filepath, orient='columns', **options)[source]
Read data from a JSON file.
- mdr.io.readers.read_parquet(filepath, columns=None, **options)[source]
Read data from a Parquet file.
Overview
The readers module provides functions for reading data from various file formats
into numpy arrays or dictionaries of arrays. These functions handle data loading,
parsing, and initial preprocessing to prepare data for the MDR refinement pipeline.
Supported File Formats
The module supports reading data from the following formats:
CSV: Comma-separated values files
JSON: JavaScript Object Notation files
Excel: Microsoft Excel workbooks (.xlsx, .xls)
Parquet: Apache Parquet columnar storage files
HDF5: Hierarchical Data Format version 5 files
Core Functions
- mdr.io.readers.read_csv(filepath, delimiter=',', header=True, **options)[source]
Read data from a CSV file.
- mdr.io.readers.read_json(filepath, orient='columns', **options)[source]
Read data from a JSON file.
- mdr.io.readers.read_excel(filepath, sheet_name=0, **options)[source]
Read data from an Excel file.
- mdr.io.readers.read_parquet(filepath, columns=None, **options)[source]
Read data from a Parquet file.
- mdr.io.readers.read_hdf5(filepath, key, **options)[source]
Read data from an HDF5 file.
Usage Examples
Reading from a CSV file:
from mdr.io.readers import read_csv
# Read data from a CSV file
data_dict = read_csv("path/to/data.csv")
# Print the variable names and shapes
for var_name, values in data_dict.items():
print(f"{var_name}: {values.shape}")
Reading from an Excel file with multiple sheets:
from mdr.io.readers import read_excel
# Read data from specific sheets
data_dict = read_excel(
"path/to/data.xlsx",
sheets=["Temperature", "Pressure"],
column_mapping={
"Temperature": {"Temp (C)": "temperature"},
"Pressure": {"Press (hPa)": "pressure"}
}
)