Commit fe30fa35 authored by ChengshuoShen's avatar ChengshuoShen

add_hdf5_reader_skill

parent fc99988f
# Fusion HDF5 Skill
This skill teaches how to read and write fusion experiment data files named as `{shot_no}.hdf5` using `jddb.processor` module.
---
## Quick Start
### 1. Create FileRepo
`FileRepo` defines the data directory. **First detect the storage layout, then create FileRepo with correct template.**
#### Auto-detect Storage Layout
Given a root directory, scan its structure to determine the template pattern:
```python
import os
from jddb.file_repo import FileRepo
def detect_file_repo(root_path: str) -> FileRepo:
"""
Auto-detect storage layout and create FileRepo.
Supported layouts:
- Flat: HDF5 files directly in root
- $shot_2$00: folders like 1051500, 1051600 (every 100 shots, suffix 00)
- $shot_2$XX: folders like 10515XX, 10516XX (every 100 shots, suffix XX)
- $shot_1$0: folders like 1051500, 1051510 (every 10 shots, suffix 0)
- $shot_1$X: folders like 105150X, 105151X (every 10 shots, suffix X)
"""
root_path = root_path.rstrip('/\\')
# Check if HDF5 files directly in root
items = os.listdir(root_path)
hdf5_in_root = any(f.endswith('.hdf5') for f in items)
if hdf5_in_root:
# Flat layout
return FileRepo(root_path + "\\")
# Get subdirectories (potential shot folders)
subdirs = [d for d in items if os.path.isdir(os.path.join(root_path, d))]
if not subdirs:
raise ValueError(f"No HDF5 files or subdirectories found in {root_path}")
# Analyze folder naming pattern
sample_dir = subdirs[0]
# Check if folder name is numeric
if sample_dir.isdigit():
folder_num = int(sample_dir)
# Detect suffix pattern (00, 0, etc.)
if sample_dir.endswith('00') and folder_num % 100 == 0:
# Every 100 shots, suffix 00
return FileRepo(root_path + "\\$shot_2$00\\")
elif sample_dir.endswith('0') and folder_num % 10 == 0:
# Every 10 shots, suffix 0
return FileRepo(root_path + "\\$shot_1$0\\")
else:
# Unknown numeric pattern, try as-is
return FileRepo(root_path + "\\")
elif sample_dir.endswith('XX'):
# Pattern like 10515XX
return FileRepo(root_path + "\\$shot_2$XX\\")
elif sample_dir.endswith('X'):
# Pattern like 105150X
return FileRepo(root_path + "\\$shot_1$X\\")
else:
raise ValueError(f"Unknown folder pattern: {sample_dir}")
# Usage:
file_repo = detect_file_repo("E:\\DXAI\\data\\paper")
print(f"Detected: {file_repo.base_path}")
# Output: Detected: E:\DXAI\data\paper\$shot_2$00\
```
#### Template Pattern Reference
| Folder Examples | Pattern | Meaning |
|----------------|---------|---------|
| `1051500.hdf5, 1051501.hdf5` (flat) | `root\` | No grouping |
| `1051500\, 1051600\` | `$shot_2$00` | shot // 100, suffix 00 |
| `10515XX\, 10516XX\` | `$shot_2$XX` | shot // 100, suffix XX |
| `1051500\, 1051510\` | `$shot_1$0` | shot // 10, suffix 0 |
| `105150X\, 105151X\` | `$shot_1$X` | shot // 10, suffix X |
### 2. Get Shot List
```python
# Scan all shots in FileRepo
shot_list = file_repo.get_all_shots()
# Or specify manually
shot_list = [1051500, 1051501, 1051505]
```
### 3. Read Data Using Shot
```python
from jddb.processor import Shot
shot = Shot(shot_no=1051500, file_repo=file_repo)
# Get all signal tags
print(shot.tags)
# Get a signal
signal = shot.get_signal('\\ip')
print(signal.data) # numpy array
print(signal.attributes) # {'SampleRate': ..., 'StartTime': ...}
print(signal.time) # time axis
# Get labels (meta info)
print(shot.labels)
```
### 4. Write Data Using Shot
```python
from jddb.processor import Shot, Signal
import numpy as np
shot = Shot(1051500, file_repo)
# Create new signal
new_signal = Signal(
data=np.random.randn(1000),
attributes={"SampleRate": 1000, "StartTime": 0}
)
# Add or update signal
shot.update_signal('new_tag', new_signal)
# Save to original FileRepo
shot.save()
# Or save to different FileRepo
output_repo = FileRepo("E:\\output\\")
shot.save(output_repo)
```
### 5. Batch Processing Using ShotSet
```python
from jddb.processor import ShotSet
# Create ShotSet (auto-scan or specify shot_list)
shot_set = ShotSet(file_repo)
# or: shot_set = ShotSet(file_repo, shot_list=[1051500, 1051501])
# Iterate shots
for shot_no in shot_set.shot_list:
shot = shot_set.get_shot(shot_no)
signal = shot.get_signal('\\ip')
print(f"Shot {shot_no}: {len(signal.data)} points")
# Apply processor to all shots
from jddb.processor.basic_processors import ResamplingProcessor
shot_set.process(
processor=ResamplingProcessor(sample_rate=1000),
input_tags=['\\ip'],
output_tags=['\\ip_resampled']
)
```
---
## Key Classes
| Class | Description |
|-------|-------------|
| `FileRepo` | Manages HDF5 file paths |
| `Shot` | Single shot (HDF5 file), collection of Signals |
| `Signal` | Single signal with data + attributes |
| `ShotSet` | Collection of Shots, supports batch processing |
---
## Dependencies
- `jddb.file_repo.FileRepo`
- `jddb.processor.Shot`
- `jddb.processor.Signal`
- `jddb.processor.ShotSet`
# fusion_hdf5_skill initialization file
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment