Getting Started¶
This guide will help you install and start using the Dataphy SDK for robotics data management and augmentation.
Installation¶
Requirements¶
- Python 3.10 or higher
- Poetry (recommended) or pip
Install with Poetry (Recommended)¶
# Clone the repository
git clone https://github.com/dataphy/dataphy.git
cd dataphy-sdk
# Install base dependencies
poetry install
# Install with all extras for full functionality
poetry install --extras "torch aws hf parquet rerun"
# Upgrade rerun for visualization compatibility
poetry run dataphy-upgrade-rerun
# Verify installation
poetry run dataphy version
Install with pip¶
Quick Verification¶
Test your installation:
# Show available commands
dataphy --help
# List supported dataset formats
dataphy dataset list-formats
# Show version information
dataphy version
First Steps¶
1. Fetch a Dataset¶
Let's start by downloading a sample robotics dataset:
# Fetch a LeRobot dataset
dataphy dataset fetch \
--format lerobot \
--repo-id carpit680/giraffe_clean_desk2 \
--output ./my-dataset
2. Explore the Dataset¶
# Get dataset information
dataphy dataset info --format lerobot --repo-id carpit680/giraffe_clean_desk2
# Load and inspect locally
dataphy dataset load --dataset-path ./my-dataset --info
# List episodes
dataphy dataset load --dataset-path ./my-dataset --list-episodes
# Examine specific episode
dataphy dataset load --dataset-path ./my-dataset --episode 0 --timestep 10
3. Visualize in 2D¶
# Launch interactive 2D visualization
dataphy dataset visualize --format lerobot --dataset-path ./my-dataset
4. Apply Augmentations¶
Create a simple augmentation config:
aug.yaml
version: 1
pipeline:
sync_views: true
steps:
- name: color_jitter
magnitude: 0.1
- name: cutout
holes: 1
size_range: [8, 16]
background:
adapter: none
seed: 42
Apply augmentations:
# Augment single episode (modifies original)
dataphy augment dataset \
--dataset-path ./my-dataset \
--config aug.yaml \
--episode 0
# Create augmented dataset (preserves original)
dataphy augment dataset \
--dataset-path ./my-dataset \
--output-path ./augmented-dataset \
--config examples/dataset_augmentation_config.yaml \
--num-augmented 2
# Visualize results
dataphy dataset visualize --format lerobot --dataset-path ./augmented-dataset
Dataset Augmentation Deep Dive¶
Dataphy provides two powerful augmentation approaches:
1. Episode Augmentation¶
- Purpose: Modify specific episodes in-place
- Use case: Quick experimentation, single episode fixes
- Result: Original dataset modified with backups
2. Full Dataset Augmentation¶
- Purpose: Create entirely new datasets with multiple versions
- Use case: Training data expansion, research experiments
- Result: New dataset with original + augmented episodes
# Compare dataset sizes
dataphy dataset load --dataset-path ./my-dataset --info
# Episodes: 25
dataphy dataset load --dataset-path ./augmented-dataset --info
# Episodes: 75 (25 original + 50 augmented)
LeRobot Version Compatibility¶
Dataphy automatically detects and handles different LeRobot dataset formats:
- v1.0: Early format with direct video structure
- v1.1: Intermediate format with chunked data
- v2.0: Modern format (e.g.,
carpit680/giraffe_clean_desk2) - v2.1: Latest format (e.g.,
lerobot/svla_so100_sorting)
# The system automatically detects and adapts:
# Detected LeRobot version: v2.0
# Structure type: chunked_with_meta
What's Next?¶
- Basic Usage Tutorial: Learn core dataset operations
- Episode Augmentation Guide: Master single episode augmentation
- Dataset Augmentation Tutorial: Create expanded datasets
- API Reference: Explore programmatic usage
- Configuration Files: Ready-to-use augmentation configs
- Python Examples: See ready-to-run code samples
Getting Help¶
- Documentation: You're reading it!
- GitHub Issues: Report bugs or request features
- Examples: Check the
examples/directory in the repository
Configuration¶
Environment Variables¶
The SDK respects these environment variables:
DATAPHY_CACHE_DIR: Custom cache directory for datasetsHF_TOKEN: Hugging Face token for private datasetsRERUN_STRICT_MODE: Enable strict mode for visualization
Config Files¶
You can create a global config file at ~/.dataphy/config.yaml: