Configuration Files¶
This page provides comprehensive examples of configuration files for different augmentation scenarios in the Dataphy SDK.
Episode Augmentation Configurations¶
These configurations are used with dataphy augment dataset for single episode modifications.
Basic Episode Configuration¶
version: 1
pipeline:
sync_views: true # Synchronized across cameras
steps:
- name: random_crop_pad
keep_ratio_min: 0.88
- name: color_jitter
magnitude: 0.15
- name: cutout
holes: 1
size_range: [8, 16]
background:
adapter: none
seed: 42
Advanced Episode Configuration¶
version: 1
pipeline:
sync_views: true
steps:
- name: random_crop_pad
keep_ratio_min: 0.85
keep_ratio_max: 1.0
padding_mode: reflect
- name: random_translate
px: 8
- name: color_jitter
magnitude: 0.2
- name: random_conv
kernel_variance: 0.04
- name: gaussian_blur
sigma: 0.5
probability: 0.3
- name: cutout
holes: [1, 2]
size_range: [8, 24]
probability: 0.4
background:
adapter: none
seed: 1337
Dataset Augmentation Configurations¶
These configurations are used with dataphy augment dataset for creating new augmented datasets.
Research Configuration¶
Perfect for reproducible research experiments:
# Research Dataset Augmentation Configuration
# Optimized for reproducible experiments with moderate augmentation
pipeline:
sync_views: true # Synchronized across all cameras
steps:
# Spatial augmentations for robustness
- name: random_crop_pad
type: RandomCropPad
params:
keep_ratio_min: 0.85
probability: 0.8
- name: random_translate
type: RandomTranslate
params:
px: 8
probability: 0.6
# Photometric augmentations for lighting variations
- name: color_jitter
type: ColorJitter
params:
magnitude: 0.15
probability: 0.85
- name: random_conv
type: RandomConv
params:
kernel_variance: 0.05
probability: 0.25
# Global settings for reproducibility
settings:
device: auto
seed: 42
deterministic: true
output_quality: 95
preserve_aspect_ratio: true
# Dataset configuration
dataset:
preserve_original: true
naming_scheme: "episode_{original_id}_aug_{aug_index:03d}"
update_metadata: true
# Progress tracking
logging:
level: INFO
progress_bar: true
continue_on_error: true
detailed_stats: true
Training Configuration¶
Designed for maximum data diversity in training scenarios:
# Training Dataset Augmentation Configuration
# Heavy augmentation for maximum data diversity
pipeline:
sync_views: true
steps:
# Spatial augmentations
- name: random_crop_pad
type: RandomCropPad
params:
keep_ratio_min: 0.75
probability: 0.9
- name: random_translate
type: RandomTranslate
params:
px: 12
probability: 0.7
# Photometric augmentations
- name: color_jitter
type: ColorJitter
params:
magnitude: 0.25
probability: 0.95
- name: random_conv
type: RandomConv
params:
kernel_variance: 0.08
probability: 0.4
- name: cutout
type: Cutout
params:
holes: 2
size_range: [20, 60]
probability: 0.3
# Occlusion augmentations
- name: cutout
type: Cutout
params:
holes: 2
size_range: [16, 48]
probability: 0.4
# Performance settings
settings:
device: auto
batch_size: 4
num_workers: 2
seed: null # Random seed each run
output_quality: 90
cache_frames: true
max_cache_size: "4GB"
# Dataset configuration
dataset:
preserve_original: false # Augmented-only dataset
naming_scheme: "aug_{aug_index:04d}_{original_id}"
update_metadata: true
# Advanced settings
advanced:
multiprocessing: true
max_processes: 4
frame_batch_size: 32
video_codec: "avc1"
validate_output: true
# Logging
logging:
level: INFO
progress_bar: true
continue_on_error: true
max_retry_attempts: 3
error_summary: true
Lightweight Configuration¶
Minimal augmentation for preserving data fidelity:
# Light Dataset Augmentation Configuration
# Minimal augmentation preserving data fidelity
pipeline:
sync_views: true
steps:
# Very subtle photometric changes
- name: color_jitter
type: ColorJitter
params:
magnitude: 0.08
probability: 0.6
- name: random_translate
type: RandomTranslate
params:
px: 3
probability: 0.3
# High quality settings
settings:
device: auto
seed: 42
output_quality: 98
preserve_aspect_ratio: true
# Dataset settings
dataset:
preserve_original: true
update_metadata: true
# Minimal logging
logging:
level: WARNING
progress_bar: true
Camera-Specific Configuration¶
Optimized for specific camera characteristics:
# Webcam-Specific Augmentation Configuration
# Optimized for webcam feed characteristics
pipeline:
sync_views: false # Different augmentation per camera
steps:
# Focus on lighting and noise (common webcam issues)
- name: color_jitter
type: ColorJitter
params:
brightness: 0.2
contrast: 0.3 # Higher contrast variation for webcams
saturation: 0.15
hue: 0.04
probability: 0.9
- name: random_conv
type: RandomConv
params:
kernel_variance: 0.03 # Simulate sensor noise
probability: 0.6
- name: cutout
type: Cutout
params:
holes: 1
size_range: [8, 24]
probability: 0.3
# Minimal spatial augmentation
- name: random_crop_pad
type: RandomCropPad
params:
keep_ratio_min: 0.9 # Preserve most of the frame
probability: 0.4
settings:
device: auto
seed: 42
output_quality: 92
dataset:
preserve_original: true
camera_selection: "specific"
# This would be set in CLI: --cameras observation.images.webcam
logging:
level: INFO
progress_bar: true
Background Replacement Configurations¶
Advanced configurations with background replacement (requires additional setup):
Green Screen Configuration¶
# Background Replacement with Green Screen Detection
pipeline:
sync_views: true
steps:
- name: color_jitter
type: ColorJitter
params:
brightness: 0.15
contrast: 0.15
probability: 0.8
- name: random_crop_pad
type: RandomCropPad
params:
keep_ratio_min: 0.85
probability: 0.6
# Background replacement
background:
adapter: green_aug
params:
background_dir: "./backgrounds"
mask_threshold: 0.1
blur_edges: true
probability: 0.7
settings:
device: auto
seed: 42
Inpainting Configuration¶
# Background Replacement with Inpainting
pipeline:
sync_views: true
steps:
- name: color_jitter
type: ColorJitter
params:
brightness: 0.1
contrast: 0.1
probability: 0.7
background:
adapter: inpaint
params:
inpaint_model: "lama"
mask_dilation: 5
blend_edges: true
settings:
device: cuda # Inpainting requires GPU
seed: 42
Performance Optimization Configurations¶
High Performance Configuration¶
# Performance-Optimized Configuration
# For fast processing on powerful hardware
pipeline:
sync_views: true
steps:
- name: color_jitter
type: ColorJitter
params:
brightness: 0.15
contrast: 0.15
probability: 0.8
- name: random_crop_pad
type: RandomCropPad
params:
keep_ratio_min: 0.85
probability: 0.7
settings:
device: cuda
batch_size: 8
num_workers: 4
cache_frames: true
max_cache_size: "8GB"
advanced:
multiprocessing: true
max_processes: 8
frame_batch_size: 64
low_memory_mode: false
dataset:
preserve_original: true
logging:
level: WARNING # Reduce logging overhead
progress_bar: true
Memory Efficient Configuration¶
# Memory-Efficient Configuration
# For systems with limited RAM
pipeline:
sync_views: true
steps:
- name: color_jitter
type: ColorJitter
params:
brightness: 0.15
probability: 0.8
settings:
device: cpu
batch_size: 1
num_workers: 1
cache_frames: false
cleanup_temp: true
advanced:
low_memory_mode: true
frame_batch_size: 8
max_processes: 1
multiprocessing: false
logging:
level: INFO
progress_bar: true
Validation and Testing Configurations¶
Validation Configuration¶
# Validation Dataset Configuration
# Conservative augmentation for validation sets
pipeline:
sync_views: true
steps:
# Only photometric augmentations, no spatial changes
- name: color_jitter
type: ColorJitter
params:
brightness: 0.05
contrast: 0.05
saturation: 0.03
hue: 0.01
probability: 0.5
settings:
device: auto
seed: 12345 # Different seed from training
deterministic: true
output_quality: 98
dataset:
preserve_original: true
naming_scheme: "val_{original_id}_aug_{aug_index:02d}"
logging:
level: INFO
progress_bar: true
Testing Configuration¶
# Testing Configuration
# For quick validation of augmentation pipeline
pipeline:
sync_views: true
steps:
- name: color_jitter
type: ColorJitter
params:
brightness: 0.1
probability: 1.0 # Always apply for testing
settings:
device: cpu # Use CPU for consistent testing
seed: 999
deterministic: true
dataset:
preserve_original: true
update_metadata: true
advanced:
validate_output: true
checksum_verification: false
logging:
level: DEBUG
progress_bar: true
detailed_stats: true
Usage Examples¶
Using Configurations with CLI¶
# Research experiment
dataphy augment dataset \
--dataset-path ./source_data \
--output-path ./research_data \
--config research_augmentation.yaml \
--num-augmented 3
# Training dataset creation
dataphy augment dataset \
--dataset-path ./original \
--output-path ./training \
--config training_augmentation.yaml \
--num-augmented 5 \
--no-preserve-original
# Webcam-only augmentation
dataphy augment dataset \
--dataset-path ./data \
--output-path ./webcam_aug \
--config webcam_augmentation.yaml \
--cameras observation.images.webcam \
--num-augmented 2
Programmatic Usage¶
from dataphy.dataset.augmentor import DatasetAugmentor, AugmentationConfig
from dataphy.dataset.registry import create_dataset_loader
# Load configuration
config = AugmentationConfig(
pipeline_config="research_augmentation.yaml",
target="dataset",
num_augmented_episodes=3,
preserve_original=True,
sync_views=True,
random_seed=42
)
# Create augmentor and run
loader = create_dataset_loader("./source_data")
augmentor = DatasetAugmentor(loader)
results = augmentor.augment_full_dataset(config, "./output_data")
Configuration Tips¶
1. Choosing the Right Configuration¶
- Research: Use deterministic seeds and moderate augmentation
- Training: Use heavy augmentation without preserving originals
- Validation: Use minimal, photometric-only augmentation
- Testing: Use simple, consistent augmentation
2. Performance Optimization¶
- GPU Available: Set
device: cudaand increasebatch_size - Limited Memory: Use
low_memory_mode: trueandcache_frames: false - High CPU: Increase
max_processesandnum_workers
3. Quality vs Speed Tradeoffs¶
- High Quality:
output_quality: 98,preserve_aspect_ratio: true - Fast Processing:
output_quality: 85,cleanup_temp: true - Storage Efficient: Lower quality, shorter naming schemes
4. Error Handling¶
- Production:
continue_on_error: true,max_retry_attempts: 3 - Development:
continue_on_error: false, detailed logging - Batch Processing: Enable validation and checksums
Complete Configuration Reference¶
Comprehensive Configuration with All Parameters¶
For a complete reference showing ALL possible parameters, see the comprehensive configuration file:
This configuration demonstrates every available parameter across all transform types:
- 26 Transform Types: All spatial, photometric, noise, blur, artifact, occlusion, and depth transforms
- Cross-cutting Parameters: p, apply_to, sync_views, mask_protect, seed_policy for every transform
- Global Settings: Device selection, memory management, performance tuning
- Background Adapters: GreenAug, RoboEngine, Inpainting configurations
- Advanced Features: Multiprocessing, validation, logging, metrics
- Dataset Options: Naming schemes, compression, metadata handling
Use this as a reference to understand all available options, then create simplified configs for your specific use case.