Skip to content

Basic Usage Tutorial

This tutorial covers the fundamental operations you can perform with the Dataphy SDK.

Dataset Operations

Fetching Datasets

The Dataphy SDK supports multiple data sources and formats:

# From Hugging Face Hub (LeRobot format)
dataphy dataset fetch \
  --format lerobot \
  --repo-id carpit680/giraffe_clean_desk2 \
  --output ./datasets/giraffe

# With specific parameters
dataphy dataset fetch \
  --format lerobot \
  --repo-id lerobot/svla_so100_sorting \
  --output ./datasets/sorting \
  --split train \
  --revision main

Inspecting Datasets

Get comprehensive dataset information:

# Remote dataset info
dataphy dataset info --format lerobot --repo-id carpit680/giraffe_clean_desk2

# Local dataset info
dataphy dataset load --dataset-path ./datasets/giraffe --info

Example output:

Dataset Information:
  Format: LeRobot
  Total episodes: 41
  Total timesteps: 15,632
  Cameras: ['observation.images.laptop', 'observation.images.webcam']
  Action space: 7D continuous
  Observation space: Multi-modal (images + state)

Episode Navigation

Navigate through episodes and timesteps:

# List all episodes
dataphy dataset load --dataset-path ./datasets/giraffe --list-episodes

# Examine specific episode by index (0-based)
dataphy dataset load --dataset-path ./datasets/giraffe --episode 0

# Examine specific episode by name
dataphy dataset load --dataset-path ./datasets/giraffe --episode episode_000005

# Examine specific timestep
dataphy dataset load --dataset-path ./datasets/giraffe --episode 0 --timestep 100

Visualization

2D Interactive Visualization

Launch the rerun.io viewer for immersive dataset exploration:

# Visualize entire dataset
dataphy dataset visualize --format lerobot --dataset-path ./datasets/giraffe

# Visualize specific episode
dataphy dataset visualize \
  --format lerobot \
  --dataset-path ./datasets/giraffe \
  --episode episode_000000

# Visualize timestep range
dataphy dataset visualize \
  --format lerobot \
  --dataset-path ./datasets/giraffe \
  --timestep-range 0,100

# Focus on specific camera
dataphy dataset visualize \
  --format lerobot \
  --dataset-path ./datasets/giraffe \
  --camera observation.images.laptop

Visualization Features

The 2D viewer provides:

  • Multi-camera feeds: See all camera angles simultaneously
  • Robot state visualization: Joint positions, end-effector pose
  • Action visualization: Planned vs executed actions
  • Timestep scrubbing: Navigate through episodes frame by frame
  • Data inspection: Click on any element to see raw data

Programmatic Usage

Python API

Use the SDK programmatically in your Python code:

from dataphy.dataset.registry import create_dataset_loader, DatasetFormat
from dataphy.dataset.episode_augmentor import EpisodeAugmentor

# Create dataset loader
loader = create_dataset_loader("./datasets/giraffe", DatasetFormat.LEROBOT)

# Get dataset information
info = loader.get_dataset_info()
print(f"Dataset has {info.total_episodes} episodes")

# List episodes
episodes = loader.get_episode_ids()
print(f"Episodes: {episodes[:5]}...")  # First 5

# Load specific episode
episode_data = loader.get_episode("episode_000000")
print(f"Episode has {len(episode_data)} timesteps")

# Load specific timestep
timestep = loader.get_timestep("episode_000000", 0)
print(f"Actions: {timestep['action']}")
print(f"Observations keys: {list(timestep['observation'].keys())}")

Working with Episodes

# Create episode augmentor
augmentor = EpisodeAugmentor(loader)

# List available cameras for an episode
cameras = augmentor.get_available_cameras("episode_000000")
print(f"Available cameras: {cameras}")

# Get backup status
backups = augmentor.list_backups()
print(f"Episodes with backups: {backups}")

Advanced Features

Format Auto-Detection

The SDK automatically detects dataset formats:

# Auto-detect format (recommended)
loader = create_dataset_loader("./datasets/giraffe")

# Explicit format specification
loader = create_dataset_loader("./datasets/giraffe", DatasetFormat.LEROBOT)

Supported Formats

  • LeRobot: Robotics datasets from Hugging Face Hub

  • Image Folder: Standard image directory structures

Error Handling

The SDK provides helpful error messages:

# Invalid dataset path
dataphy dataset load --dataset-path ./nonexistent
# Error: Dataset directory not found: ./nonexistent

# Invalid episode
dataphy dataset load --dataset-path ./datasets/giraffe --episode 999
# Error: Episode index 999 out of range. Available: 0-40

# Invalid format
dataphy dataset fetch --format invalid --repo-id test
# Error: Unknown format 'invalid'. Supported: ['lerobot']

Tips and Best Practices

Performance Tips

  1. Use episode indices: --episode 0 is faster than --episode episode_000000
  2. Cache datasets locally: Avoid re-downloading for repeated use
  3. Specify timestep ranges: For large episodes, use --timestep-range to limit loading

Dataset Organization

my-datasets/
├── giraffe_cleaning/     # One task
├── sorting_objects/      # Another task
├── navigation_indoor/    # Different domain
└── configs/             # Augmentation configs
    ├── gentle.yaml
    ├── aggressive.yaml
    └── robotics.yaml

Common Workflows

  1. Exploration: fetchinfovisualize
  2. Development: loadaugmentvisualize
  3. Production: fetchaugment → export to training format

Next Steps