Examples¶
Ready-to-run code examples and configuration files for common Dataphy workflows.
Configuration Examples¶
Basic Episode Augmentation¶
Simple augmentation config for getting started:
basic_aug.yaml
version: 1
pipeline:
  sync_views: true
  steps:
    - name: color_jitter
      magnitude: 0.1
    - name: cutout
      holes: 1
      size_range: [8, 16]
  background:
    adapter: none
seed: 42
Robotics-Optimized Augmentation¶
Configuration optimized for robotics datasets:
robotics_aug.yaml
version: 1
pipeline:
  # Synchronize augmentations across all camera views
  sync_views: true
  steps:
    # Preserve spatial context for robot-object relationships
    - name: random_crop_pad
      keep_ratio_min: 0.88
    # Small spatial shifts to simulate minor camera movements
    - name: random_translate
      px: 8
    # Lighting variations for different environments
    - name: color_jitter
      magnitude: 0.15
    # Subtle texture effects for sensor realism
    - name: random_conv
      kernel_variance: 0.035
    # Small occlusion patches to simulate partial blocking
    - name: cutout
      holes: 1
      size_range: [8, 16]
  background:
    adapter: none
seed: 42
Python Scripts¶
Dataset Exploration Script¶
explore_dataset.py
#!/usr/bin/env python3
"""
Explore a robotics dataset and print detailed information.
"""
from pathlib import Path
from dataphy.dataset.registry import create_dataset_loader
def explore_dataset(dataset_path: str):
    """Explore and print dataset information."""
    # Load dataset (auto-detect format)
    loader = create_dataset_loader(dataset_path)
    # Get dataset info
    info = loader.get_dataset_info()
    print(f"Dataset Overview:")
    print(f"  Format: {info.format}")
    print(f"  Episodes: {info.total_episodes}")
    print(f"  Total timesteps: {info.total_timesteps:,}")
    # List episodes
    episodes = loader.get_episode_ids()
    print(f"\nEpisodes: {len(episodes)}")
    for i, episode_id in enumerate(episodes[:5]):
        print(f"  {i}: {episode_id}")
    if len(episodes) > 5:
        print(f"  ... and {len(episodes) - 5} more")
    # Examine first episode
    if episodes:
        episode_data = loader.get_episode(episodes[0])
        print(f"\nFirst Episode ({episodes[0]}):")
        print(f"  Timesteps: {len(episode_data)}")
        # Show first timestep structure
        if episode_data:
            timestep = episode_data[0]
            print(f"  Action shape: {timestep['action'].shape}")
            print(f"  Observation keys: {list(timestep['observation'].keys())}")
            # Show image cameras if available
            obs = timestep['observation']
            if 'images' in obs:
                cameras = list(obs['images'].keys())
                print(f"  Cameras: {cameras}")
                for cam in cameras:
                    shape = obs['images'][cam].shape
                    print(f"    {cam}: {shape}")
if __name__ == "__main__":
    import sys
    if len(sys.argv) != 2:
        print("Usage: python explore_dataset.py <dataset_path>")
        sys.exit(1)
    dataset_path = sys.argv[1]
    explore_dataset(dataset_path)
Batch Augmentation Script¶
batch_augment.py
#!/usr/bin/env python3
"""
Apply augmentations to multiple episodes in batch.
"""
from pathlib import Path
from dataphy.dataset.registry import create_dataset_loader, DatasetFormat
from dataphy.dataset.episode_augmentor import EpisodeAugmentor
def batch_augment_episodes(
    dataset_path: str,
    config_file: str,
    episode_indices: list,
    cameras: list = None
):
    """Augment multiple episodes in batch."""
    # Setup
    loader = create_dataset_loader(dataset_path, DatasetFormat.LEROBOT)
    augmentor = EpisodeAugmentor(loader)
    # Get available episodes
    episodes = augmentor.list_episodes()
    print(f"Dataset has {len(episodes)} episodes")
    # Validate episode indices
    valid_indices = []
    for idx in episode_indices:
        if 0 <= idx < len(episodes):
            valid_indices.append(idx)
        else:
            print(f"Skipping invalid episode index: {idx}")
    print(f"Will augment {len(valid_indices)} episodes")
    if cameras:
        print(f"Target cameras: {cameras}")
    else:
        print(f"Target cameras: ALL")
    # Process each episode
    for i, episode_idx in enumerate(valid_indices):
        episode_name = episodes[episode_idx]
        print(f"\n[{i+1}/{len(valid_indices)}] Augmenting {episode_name}")
        try:
            # Check available cameras
            available_cameras = augmentor.get_available_cameras(episode_name)
            print(f"  Available cameras: {available_cameras}")
            # Validate requested cameras
            if cameras:
                invalid_cameras = [c for c in cameras if c not in available_cameras]
                if invalid_cameras:
                    print(f"  Invalid cameras for this episode: {invalid_cameras}")
                    continue
            # Apply augmentation
            augmentor.augment_episode(
                episode_id=episode_idx,
                config_file=config_file,
                camera_streams=cameras,
                preserve_original=True
            )
            print(f"  Completed successfully")
        except Exception as e:
            print(f"  Error: {e}")
            continue
    # Show final status
    backups = augmentor.list_backups()
    print(f"\nFinal Status:")
    print(f"  Episodes augmented: {len([idx for idx in valid_indices if episodes[idx] in backups])}")
    print(f"  Total backups: {len(backups)}")
if __name__ == "__main__":
    # Configuration
    dataset_path = "./dataset"
    config_file = "./examples/full_augmentation_pipeline.yaml"
    # Episodes to augment (indices)
    target_episodes = [0, 1, 2, 5, 10]
    # Specific cameras (None = all cameras)
    target_cameras = ["observation.images.webcam"]
    # target_cameras = None  # For all cameras
    batch_augment_episodes(
        dataset_path=dataset_path,
        config_file=config_file,
        episode_indices=target_episodes,
        cameras=target_cameras
    )
CLI Usage Examples¶
Complete Dataset Workflow¶
dataset_workflow.sh
#!/bin/bash
# Complete dataset workflow from fetch to visualization
DATASET_NAME="giraffe_cleaning"
REPO_ID="carpit680/giraffe_clean_desk2"
CONFIG_FILE="examples/full_augmentation_pipeline.yaml"
echo "Starting complete dataset workflow"
# Step 1: Fetch dataset
echo "\n📥 Step 1: Fetching dataset..."
dataphy dataset fetch \
  --format lerobot \
  --repo-id $REPO_ID \
  --output ./datasets/$DATASET_NAME
# Step 2: Explore dataset
echo "\nStep 2: Exploring dataset..."
dataphy dataset load \
  --dataset-path ./datasets/$DATASET_NAME \
  --info
# Step 3: List episodes
echo "\nStep 3: Listing episodes..."
dataphy augment dataset \
  --dataset-path ./datasets/$DATASET_NAME \
  --list-episodes
# Step 4: Augment first few episodes
echo "\nStep 4: Augmenting episodes..."
for episode in 0 1 2; do
  echo "  Augmenting episode $episode"
  dataphy augment dataset \
    --dataset-path ./datasets/$DATASET_NAME \
    --config $CONFIG_FILE \
    --episode $episode
done
# Step 5: Visualize results
echo "\nStep 5: Visualizing results..."
dataphy dataset visualize \
  --format lerobot \
  --dataset-path ./datasets/$DATASET_NAME \
  --episode 0
echo "\nWorkflow completed!"
Augmentation Parameter Testing¶
test_augmentations.sh
#!/bin/bash
# Test different augmentation parameters
DATASET_PATH="./datasets/test_dataset"
EPISODE=0
echo "🧪 Testing different augmentation parameters"
# Test 1: Gentle augmentation
echo "\n🔹 Test 1: Gentle augmentation"
cat > gentle_aug.yaml << EOF
version: 1
pipeline:
  sync_views: true
  steps:
    - name: color_jitter
      magnitude: 0.05
    - name: cutout
      holes: 1
      size_range: [4, 8]
  background:
    adapter: none
seed: 42
EOF
dataphy augment dataset \
  --dataset-path $DATASET_PATH \
  --config gentle_aug.yaml \
  --episode $EPISODE
# Test 2: Aggressive augmentation
echo "\n🔹 Test 2: Aggressive augmentation"
cat > aggressive_aug.yaml << EOF
version: 1
pipeline:
  sync_views: true
  steps:
    - name: random_crop_pad
      keep_ratio_min: 0.75
    - name: color_jitter
      magnitude: 0.25
    - name: cutout
      holes: 2
      size_range: [16, 32]
  background:
    adapter: none
seed: 42
EOF
# Restore first, then test
dataphy augment dataset \
  --dataset-path $DATASET_PATH \
  --restore episode_000000
dataphy augment dataset \
  --dataset-path $DATASET_PATH \
  --config aggressive_aug.yaml \
  --episode $EPISODE
echo "\nParameter testing completed"
Usage Instructions¶
Running Python Examples¶
- Ensure Dataphy is installed:
- Run exploration script:
- Run batch augmentation:
Running CLI Examples¶
- Make scripts executable:
- Run complete workflow:
- Test augmentation parameters:
Next Steps¶
- API Reference: Explore all available functions
- Tutorials: Learn step-by-step workflows
- Experiment: Modify these examples for your specific use case