Basic Usage Tutorial¶

This tutorial covers the fundamental operations you can perform with the Dataphy SDK.

Dataset Operations¶

Fetching Datasets¶

The Dataphy SDK supports multiple data sources and formats:

# From Hugging Face Hub (LeRobot format)
dataphy dataset fetch \
  --format lerobot \
  --repo-id carpit680/giraffe_clean_desk2 \
  --output ./datasets/giraffe

# With specific parameters
dataphy dataset fetch \
  --format lerobot \
  --repo-id lerobot/svla_so100_sorting \
  --output ./datasets/sorting \
  --split train \
  --revision main

Inspecting Datasets¶

Get comprehensive dataset information:

# Remote dataset info
dataphy dataset info --format lerobot --repo-id carpit680/giraffe_clean_desk2

# Local dataset info
dataphy dataset load --dataset-path ./datasets/giraffe --info

Example output:

Dataset Information:
  Format: LeRobot
  Total episodes: 41
  Total timesteps: 15,632
  Cameras: ['observation.images.laptop', 'observation.images.webcam']
  Action space: 7D continuous
  Observation space: Multi-modal (images + state)

Navigate through episodes and timesteps:

# List all episodes
dataphy dataset load --dataset-path ./datasets/giraffe --list-episodes

# Examine specific episode by index (0-based)
dataphy dataset load --dataset-path ./datasets/giraffe --episode 0

# Examine specific episode by name
dataphy dataset load --dataset-path ./datasets/giraffe --episode episode_000005

# Examine specific timestep
dataphy dataset load --dataset-path ./datasets/giraffe --episode 0 --timestep 100

Visualization¶

2D Interactive Visualization¶

Launch the rerun.io viewer for immersive dataset exploration:

# Visualize entire dataset
dataphy dataset visualize --format lerobot --dataset-path ./datasets/giraffe

# Visualize specific episode
dataphy dataset visualize \
  --format lerobot \
  --dataset-path ./datasets/giraffe \
  --episode episode_000000

# Visualize timestep range
dataphy dataset visualize \
  --format lerobot \
  --dataset-path ./datasets/giraffe \
  --timestep-range 0,100

# Focus on specific camera
dataphy dataset visualize \
  --format lerobot \
  --dataset-path ./datasets/giraffe \
  --camera observation.images.laptop

Visualization Features¶

The 2D viewer provides:

Multi-camera feeds: See all camera angles simultaneously
Robot state visualization: Joint positions, end-effector pose
Action visualization: Planned vs executed actions
Timestep scrubbing: Navigate through episodes frame by frame
Data inspection: Click on any element to see raw data

Programmatic Usage¶

Python API¶

Use the SDK programmatically in your Python code:

from dataphy.dataset.registry import create_dataset_loader, DatasetFormat
from dataphy.dataset.episode_augmentor import EpisodeAugmentor

# Create dataset loader
loader = create_dataset_loader("./datasets/giraffe", DatasetFormat.LEROBOT)

# Get dataset information
info = loader.get_dataset_info()
print(f"Dataset has {info.total_episodes} episodes")

# List episodes
episodes = loader.get_episode_ids()
print(f"Episodes: {episodes[:5]}...")  # First 5

# Load specific episode
episode_data = loader.get_episode("episode_000000")
print(f"Episode has {len(episode_data)} timesteps")

# Load specific timestep
timestep = loader.get_timestep("episode_000000", 0)
print(f"Actions: {timestep['action']}")
print(f"Observations keys: {list(timestep['observation'].keys())}")

Working with Episodes¶

# Create episode augmentor
augmentor = EpisodeAugmentor(loader)

# List available cameras for an episode
cameras = augmentor.get_available_cameras("episode_000000")
print(f"Available cameras: {cameras}")

# Get backup status
backups = augmentor.list_backups()
print(f"Episodes with backups: {backups}")

Advanced Features¶

Format Auto-Detection¶

The SDK automatically detects dataset formats:

# Auto-detect format (recommended)
loader = create_dataset_loader("./datasets/giraffe")

# Explicit format specification
loader = create_dataset_loader("./datasets/giraffe", DatasetFormat.LEROBOT)

Supported Formats¶

LeRobot: Robotics datasets from Hugging Face Hub
Image Folder: Standard image directory structures

Error Handling¶

The SDK provides helpful error messages:

# Invalid dataset path
dataphy dataset load --dataset-path ./nonexistent
# Error: Dataset directory not found: ./nonexistent

# Invalid episode
dataphy dataset load --dataset-path ./datasets/giraffe --episode 999
# Error: Episode index 999 out of range. Available: 0-40

# Invalid format
dataphy dataset fetch --format invalid --repo-id test
# Error: Unknown format 'invalid'. Supported: ['lerobot']

Tips and Best Practices¶

Performance Tips¶

Use episode indices: --episode 0 is faster than --episode episode_000000
Cache datasets locally: Avoid re-downloading for repeated use
Specify timestep ranges: For large episodes, use --timestep-range to limit loading

Dataset Organization¶

my-datasets/
├── giraffe_cleaning/     # One task
├── sorting_objects/      # Another task
├── navigation_indoor/    # Different domain
└── configs/             # Augmentation configs
    ├── gentle.yaml
    ├── aggressive.yaml
    └── robotics.yaml

Common Workflows¶

Exploration: fetch → info → visualize
Development: load → augment → visualize
Production: fetch → augment → export to training format

Next Steps¶

Episode Augmentation: Learn advanced augmentation techniques
API Reference: Explore all available functions and classes
Examples: See complete code examples