Basic Usage Tutorial¶
This tutorial covers the fundamental operations you can perform with the Dataphy SDK.
Dataset Operations¶
Fetching Datasets¶
The Dataphy SDK supports multiple data sources and formats:
# From Hugging Face Hub (LeRobot format)
dataphy dataset fetch \
--format lerobot \
--repo-id carpit680/giraffe_clean_desk2 \
--output ./datasets/giraffe
# With specific parameters
dataphy dataset fetch \
--format lerobot \
--repo-id lerobot/svla_so100_sorting \
--output ./datasets/sorting \
--split train \
--revision main
Inspecting Datasets¶
Get comprehensive dataset information:
# Remote dataset info
dataphy dataset info --format lerobot --repo-id carpit680/giraffe_clean_desk2
# Local dataset info
dataphy dataset load --dataset-path ./datasets/giraffe --info
Example output:
Dataset Information:
Format: LeRobot
Total episodes: 41
Total timesteps: 15,632
Cameras: ['observation.images.laptop', 'observation.images.webcam']
Action space: 7D continuous
Observation space: Multi-modal (images + state)
Episode Navigation¶
Navigate through episodes and timesteps:
# List all episodes
dataphy dataset load --dataset-path ./datasets/giraffe --list-episodes
# Examine specific episode by index (0-based)
dataphy dataset load --dataset-path ./datasets/giraffe --episode 0
# Examine specific episode by name
dataphy dataset load --dataset-path ./datasets/giraffe --episode episode_000005
# Examine specific timestep
dataphy dataset load --dataset-path ./datasets/giraffe --episode 0 --timestep 100
Visualization¶
2D Interactive Visualization¶
Launch the rerun.io viewer for immersive dataset exploration:
# Visualize entire dataset
dataphy dataset visualize --format lerobot --dataset-path ./datasets/giraffe
# Visualize specific episode
dataphy dataset visualize \
--format lerobot \
--dataset-path ./datasets/giraffe \
--episode episode_000000
# Visualize timestep range
dataphy dataset visualize \
--format lerobot \
--dataset-path ./datasets/giraffe \
--timestep-range 0,100
# Focus on specific camera
dataphy dataset visualize \
--format lerobot \
--dataset-path ./datasets/giraffe \
--camera observation.images.laptop
Visualization Features¶
The 2D viewer provides:
- Multi-camera feeds: See all camera angles simultaneously
- Robot state visualization: Joint positions, end-effector pose
- Action visualization: Planned vs executed actions
- Timestep scrubbing: Navigate through episodes frame by frame
- Data inspection: Click on any element to see raw data
Programmatic Usage¶
Python API¶
Use the SDK programmatically in your Python code:
from dataphy.dataset.registry import create_dataset_loader, DatasetFormat
from dataphy.dataset.episode_augmentor import EpisodeAugmentor
# Create dataset loader
loader = create_dataset_loader("./datasets/giraffe", DatasetFormat.LEROBOT)
# Get dataset information
info = loader.get_dataset_info()
print(f"Dataset has {info.total_episodes} episodes")
# List episodes
episodes = loader.get_episode_ids()
print(f"Episodes: {episodes[:5]}...") # First 5
# Load specific episode
episode_data = loader.get_episode("episode_000000")
print(f"Episode has {len(episode_data)} timesteps")
# Load specific timestep
timestep = loader.get_timestep("episode_000000", 0)
print(f"Actions: {timestep['action']}")
print(f"Observations keys: {list(timestep['observation'].keys())}")
Working with Episodes¶
# Create episode augmentor
augmentor = EpisodeAugmentor(loader)
# List available cameras for an episode
cameras = augmentor.get_available_cameras("episode_000000")
print(f"Available cameras: {cameras}")
# Get backup status
backups = augmentor.list_backups()
print(f"Episodes with backups: {backups}")
Advanced Features¶
Format Auto-Detection¶
The SDK automatically detects dataset formats:
# Auto-detect format (recommended)
loader = create_dataset_loader("./datasets/giraffe")
# Explicit format specification
loader = create_dataset_loader("./datasets/giraffe", DatasetFormat.LEROBOT)
Supported Formats¶
-
LeRobot: Robotics datasets from Hugging Face Hub
-
Image Folder: Standard image directory structures
Error Handling¶
The SDK provides helpful error messages:
# Invalid dataset path
dataphy dataset load --dataset-path ./nonexistent
# Error: Dataset directory not found: ./nonexistent
# Invalid episode
dataphy dataset load --dataset-path ./datasets/giraffe --episode 999
# Error: Episode index 999 out of range. Available: 0-40
# Invalid format
dataphy dataset fetch --format invalid --repo-id test
# Error: Unknown format 'invalid'. Supported: ['lerobot']
Tips and Best Practices¶
Performance Tips¶
- Use episode indices:
--episode 0is faster than--episode episode_000000 - Cache datasets locally: Avoid re-downloading for repeated use
- Specify timestep ranges: For large episodes, use
--timestep-rangeto limit loading
Dataset Organization¶
my-datasets/
├── giraffe_cleaning/ # One task
├── sorting_objects/ # Another task
├── navigation_indoor/ # Different domain
└── configs/ # Augmentation configs
├── gentle.yaml
├── aggressive.yaml
└── robotics.yaml
Common Workflows¶
- Exploration:
fetch→info→visualize - Development:
load→augment→visualize - Production:
fetch→augment→ export to training format
Next Steps¶
- Episode Augmentation: Learn advanced augmentation techniques
- API Reference: Explore all available functions and classes
- Examples: See complete code examples