add_hdf5_reader_skill

fe30fa35 · ChengshuoShen · fc99988f · fe30fa35 · fe30fa35
Commit fe30fa35 authored Jan 21, 2026 by ChengshuoShen
2 changed files
--- a/jddb-agents/jddb_skills/fusion_hdf5_skill/SKILL.md
+++ b/jddb-agents/jddb_skills/fusion_hdf5_skill/SKILL.md
+# Fusion HDF5 Skill
+
+This skill teaches how to read and write fusion experiment data files named as `{shot_no}.hdf5` using `jddb.processor` module.
+
+---
+
+## Quick Start
+
+### 1. Create FileRepo
+
+`FileRepo` defines the data directory. **First detect the storage layout, then create FileRepo with correct template.**
+
+#### Auto-detect Storage Layout
+
+Given a root directory, scan its structure to determine the template pattern:
+
+```python
+import os
+from jddb.file_repo import FileRepo
+
+def detect_file_repo(root_path: str) -> FileRepo:
+    """
+    Auto-detect storage layout and create FileRepo.
+    
+    Supported layouts:
+    - Flat: HDF5 files directly in root
+    - $shot_2$00: folders like 1051500, 1051600 (every 100 shots, suffix 00)
+    - $shot_2$XX: folders like 10515XX, 10516XX (every 100 shots, suffix XX)
+    - $shot_1$0: folders like 1051500, 1051510 (every 10 shots, suffix 0)
+    - $shot_1$X: folders like 105150X, 105151X (every 10 shots, suffix X)
+    """
+    root_path = root_path.rstrip('/\\')
+    
+    # Check if HDF5 files directly in root
+    items = os.listdir(root_path)
+    hdf5_in_root = any(f.endswith('.hdf5') for f in items)
+    
+    if hdf5_in_root:
+        # Flat layout
+        return FileRepo(root_path + "\\")
+    
+    # Get subdirectories (potential shot folders)
+    subdirs = [d for d in items if os.path.isdir(os.path.join(root_path, d))]
+    if not subdirs:
+        raise ValueError(f"No HDF5 files or subdirectories found in {root_path}")
+    
+    # Analyze folder naming pattern
+    sample_dir = subdirs[0]
+    
+    # Check if folder name is numeric
+    if sample_dir.isdigit():
+        folder_num = int(sample_dir)
+        
+        # Detect suffix pattern (00, 0, etc.)
+        if sample_dir.endswith('00') and folder_num % 100 == 0:
+            # Every 100 shots, suffix 00
+            return FileRepo(root_path + "\\$shot_2$00\\")
+        elif sample_dir.endswith('0') and folder_num % 10 == 0:
+            # Every 10 shots, suffix 0
+            return FileRepo(root_path + "\\$shot_1$0\\")
+        else:
+            # Unknown numeric pattern, try as-is
+            return FileRepo(root_path + "\\")
+    
+    elif sample_dir.endswith('XX'):
+        # Pattern like 10515XX
+        return FileRepo(root_path + "\\$shot_2$XX\\")
+    
+    elif sample_dir.endswith('X'):
+        # Pattern like 105150X
+        return FileRepo(root_path + "\\$shot_1$X\\")
+    
+    else:
+        raise ValueError(f"Unknown folder pattern: {sample_dir}")
+
+# Usage:
+file_repo = detect_file_repo("E:\\DXAI\\data\\paper")
+print(f"Detected: {file_repo.base_path}")
+# Output: Detected: E:\DXAI\data\paper\$shot_2$00\
+```
+
+#### Template Pattern Reference
+
+| Folder Examples | Pattern | Meaning |
+|----------------|---------|---------|
+| `1051500.hdf5, 1051501.hdf5` (flat) | `root\` | No grouping |
+| `1051500\, 1051600\` | `$shot_2$00` | shot // 100, suffix 00 |
+| `10515XX\, 10516XX\` | `$shot_2$XX` | shot // 100, suffix XX |
+| `1051500\, 1051510\` | `$shot_1$0` | shot // 10, suffix 0 |
+| `105150X\, 105151X\` | `$shot_1$X` | shot // 10, suffix X |
+
+### 2. Get Shot List
+
+```python
+# Scan all shots in FileRepo
+shot_list = file_repo.get_all_shots()
+
+# Or specify manually
+shot_list = [1051500, 1051501, 1051505]
+```
+
+### 3. Read Data Using Shot
+
+```python
+from jddb.processor import Shot
+
+shot = Shot(shot_no=1051500, file_repo=file_repo)
+
+# Get all signal tags
+print(shot.tags)
+
+# Get a signal
+signal = shot.get_signal('\\ip')
+print(signal.data)        # numpy array
+print(signal.attributes)  # {'SampleRate': ..., 'StartTime': ...}
+print(signal.time)        # time axis
+
+# Get labels (meta info)
+print(shot.labels)
+```
+
+### 4. Write Data Using Shot
+
+```python
+from jddb.processor import Shot, Signal
+import numpy as np
+
+shot = Shot(1051500, file_repo)
+
+# Create new signal
+new_signal = Signal(
+    data=np.random.randn(1000),
+    attributes={"SampleRate": 1000, "StartTime": 0}
+)
+
+# Add or update signal
+shot.update_signal('new_tag', new_signal)
+
+# Save to original FileRepo
+shot.save()
+
+# Or save to different FileRepo
+output_repo = FileRepo("E:\\output\\")
+shot.save(output_repo)
+```
+
+### 5. Batch Processing Using ShotSet
+
+```python
+from jddb.processor import ShotSet
+
+# Create ShotSet (auto-scan or specify shot_list)
+shot_set = ShotSet(file_repo)
+# or: shot_set = ShotSet(file_repo, shot_list=[1051500, 1051501])
+
+# Iterate shots
+for shot_no in shot_set.shot_list:
+    shot = shot_set.get_shot(shot_no)
+    signal = shot.get_signal('\\ip')
+    print(f"Shot {shot_no}: {len(signal.data)} points")
+
+# Apply processor to all shots
+from jddb.processor.basic_processors import ResamplingProcessor
+shot_set.process(
+    processor=ResamplingProcessor(sample_rate=1000),
+    input_tags=['\\ip'],
+    output_tags=['\\ip_resampled']
+)
+```
+
+---
+
+## Key Classes
+
+| Class | Description |
+|-------|-------------|
+| `FileRepo` | Manages HDF5 file paths |
+| `Shot` | Single shot (HDF5 file), collection of Signals |
+| `Signal` | Single signal with data + attributes |
+| `ShotSet` | Collection of Shots, supports batch processing |
+
+---
+
+## Dependencies
+
+- `jddb.file_repo.FileRepo`
+- `jddb.processor.Shot`
+- `jddb.processor.Signal`
+- `jddb.processor.ShotSet`
--- a/jddb-agents/jddb_skills/fusion_hdf5_skill/__init__.py
+++ b/jddb-agents/jddb_skills/fusion_hdf5_skill/__init__.py
+# fusion_hdf5_skill initialization file