Skip to content

Configuration Files

This page provides comprehensive examples of configuration files for different augmentation scenarios in the Dataphy SDK.

Episode Augmentation Configurations

These configurations are used with dataphy augment dataset for single episode modifications.

Basic Episode Configuration

basic_episode_aug.yaml
version: 1
pipeline:
  sync_views: true # Synchronized across cameras

  steps:
    - name: random_crop_pad
      keep_ratio_min: 0.88
    - name: color_jitter
      magnitude: 0.15
    - name: cutout
      holes: 1
      size_range: [8, 16]

  background:
    adapter: none

seed: 42

Advanced Episode Configuration

advanced_episode_aug.yaml
version: 1
pipeline:
  sync_views: true

  steps:
    - name: random_crop_pad
      keep_ratio_min: 0.85
      keep_ratio_max: 1.0
      padding_mode: reflect
    - name: random_translate
      px: 8
    - name: color_jitter
      magnitude: 0.2
    - name: random_conv
      kernel_variance: 0.04
    - name: gaussian_blur
      sigma: 0.5
      probability: 0.3
    - name: cutout
      holes: [1, 2]
      size_range: [8, 24]
      probability: 0.4

  background:
    adapter: none

seed: 1337

Dataset Augmentation Configurations

These configurations are used with dataphy augment dataset for creating new augmented datasets.

Research Configuration

Perfect for reproducible research experiments:

research_augmentation.yaml
# Research Dataset Augmentation Configuration
# Optimized for reproducible experiments with moderate augmentation

pipeline:
  sync_views: true # Synchronized across all cameras

  steps:
    # Spatial augmentations for robustness
    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.85
        probability: 0.8

    - name: random_translate
      type: RandomTranslate
      params:
        px: 8
        probability: 0.6

    # Photometric augmentations for lighting variations
    - name: color_jitter
      type: ColorJitter
      params:
        magnitude: 0.15
        probability: 0.85

    - name: random_conv
      type: RandomConv
      params:
        kernel_variance: 0.05
        probability: 0.25

# Global settings for reproducibility
settings:
  device: auto
  seed: 42
  deterministic: true
  output_quality: 95
  preserve_aspect_ratio: true

# Dataset configuration
dataset:
  preserve_original: true
  naming_scheme: "episode_{original_id}_aug_{aug_index:03d}"
  update_metadata: true

# Progress tracking
logging:
  level: INFO
  progress_bar: true
  continue_on_error: true
  detailed_stats: true

Training Configuration

Designed for maximum data diversity in training scenarios:

training_augmentation.yaml
# Training Dataset Augmentation Configuration
# Heavy augmentation for maximum data diversity

pipeline:
  sync_views: true

  steps:
    # Spatial augmentations
    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.75
        probability: 0.9

    - name: random_translate
      type: RandomTranslate
      params:
        px: 12
        probability: 0.7

    # Photometric augmentations
    - name: color_jitter
      type: ColorJitter
      params:
        magnitude: 0.25
        probability: 0.95

    - name: random_conv
      type: RandomConv
      params:
        kernel_variance: 0.08
        probability: 0.4

    - name: cutout
      type: Cutout
      params:
        holes: 2
        size_range: [20, 60]
        probability: 0.3

    # Occlusion augmentations
    - name: cutout
      type: Cutout
      params:
        holes: 2
        size_range: [16, 48]
        probability: 0.4

# Performance settings
settings:
  device: auto
  batch_size: 4
  num_workers: 2
  seed: null # Random seed each run
  output_quality: 90
  cache_frames: true
  max_cache_size: "4GB"

# Dataset configuration
dataset:
  preserve_original: false # Augmented-only dataset
  naming_scheme: "aug_{aug_index:04d}_{original_id}"
  update_metadata: true

# Advanced settings
advanced:
  multiprocessing: true
  max_processes: 4
  frame_batch_size: 32
  video_codec: "avc1"
  validate_output: true

# Logging
logging:
  level: INFO
  progress_bar: true
  continue_on_error: true
  max_retry_attempts: 3
  error_summary: true

Lightweight Configuration

Minimal augmentation for preserving data fidelity:

light_augmentation.yaml
# Light Dataset Augmentation Configuration
# Minimal augmentation preserving data fidelity

pipeline:
  sync_views: true

  steps:
    # Very subtle photometric changes
    - name: color_jitter
      type: ColorJitter
      params:
        magnitude: 0.08
        probability: 0.6

    - name: random_translate
      type: RandomTranslate
      params:
        px: 3
        probability: 0.3

# High quality settings
settings:
  device: auto
  seed: 42
  output_quality: 98
  preserve_aspect_ratio: true

# Dataset settings
dataset:
  preserve_original: true
  update_metadata: true

# Minimal logging
logging:
  level: WARNING
  progress_bar: true

Camera-Specific Configuration

Optimized for specific camera characteristics:

webcam_augmentation.yaml
# Webcam-Specific Augmentation Configuration
# Optimized for webcam feed characteristics

pipeline:
  sync_views: false # Different augmentation per camera

  steps:
    # Focus on lighting and noise (common webcam issues)
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.2
        contrast: 0.3 # Higher contrast variation for webcams
        saturation: 0.15
        hue: 0.04
        probability: 0.9

    - name: random_conv
      type: RandomConv
      params:
        kernel_variance: 0.03 # Simulate sensor noise
        probability: 0.6

    - name: cutout
      type: Cutout
      params:
        holes: 1
        size_range: [8, 24]
        probability: 0.3

    # Minimal spatial augmentation
    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.9 # Preserve most of the frame
        probability: 0.4

settings:
  device: auto
  seed: 42
  output_quality: 92

dataset:
  preserve_original: true
  camera_selection: "specific"
  # This would be set in CLI: --cameras observation.images.webcam

logging:
  level: INFO
  progress_bar: true

Background Replacement Configurations

Advanced configurations with background replacement (requires additional setup):

Green Screen Configuration

green_screen_augmentation.yaml
# Background Replacement with Green Screen Detection
pipeline:
  sync_views: true

  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.15
        contrast: 0.15
        probability: 0.8

    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.85
        probability: 0.6

# Background replacement
background:
  adapter: green_aug
  params:
    background_dir: "./backgrounds"
    mask_threshold: 0.1
    blur_edges: true
    probability: 0.7

settings:
  device: auto
  seed: 42

Inpainting Configuration

inpaint_augmentation.yaml
# Background Replacement with Inpainting
pipeline:
  sync_views: true

  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.1
        contrast: 0.1
        probability: 0.7

background:
  adapter: inpaint
  params:
    inpaint_model: "lama"
    mask_dilation: 5
    blend_edges: true

settings:
  device: cuda # Inpainting requires GPU
  seed: 42

Performance Optimization Configurations

High Performance Configuration

performance_optimized.yaml
# Performance-Optimized Configuration
# For fast processing on powerful hardware

pipeline:
  sync_views: true
  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.15
        contrast: 0.15
        probability: 0.8
    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.85
        probability: 0.7

settings:
  device: cuda
  batch_size: 8
  num_workers: 4
  cache_frames: true
  max_cache_size: "8GB"

advanced:
  multiprocessing: true
  max_processes: 8
  frame_batch_size: 64
  low_memory_mode: false

dataset:
  preserve_original: true

logging:
  level: WARNING # Reduce logging overhead
  progress_bar: true

Memory Efficient Configuration

memory_efficient.yaml
# Memory-Efficient Configuration
# For systems with limited RAM

pipeline:
  sync_views: true
  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.15
        probability: 0.8

settings:
  device: cpu
  batch_size: 1
  num_workers: 1
  cache_frames: false
  cleanup_temp: true

advanced:
  low_memory_mode: true
  frame_batch_size: 8
  max_processes: 1
  multiprocessing: false

logging:
  level: INFO
  progress_bar: true

Validation and Testing Configurations

Validation Configuration

validation_augmentation.yaml
# Validation Dataset Configuration
# Conservative augmentation for validation sets

pipeline:
  sync_views: true
  steps:
    # Only photometric augmentations, no spatial changes
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.05
        contrast: 0.05
        saturation: 0.03
        hue: 0.01
        probability: 0.5

settings:
  device: auto
  seed: 12345 # Different seed from training
  deterministic: true
  output_quality: 98

dataset:
  preserve_original: true
  naming_scheme: "val_{original_id}_aug_{aug_index:02d}"

logging:
  level: INFO
  progress_bar: true

Testing Configuration

test_augmentation.yaml
# Testing Configuration
# For quick validation of augmentation pipeline

pipeline:
  sync_views: true
  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.1
        probability: 1.0 # Always apply for testing

settings:
  device: cpu # Use CPU for consistent testing
  seed: 999
  deterministic: true

dataset:
  preserve_original: true
  update_metadata: true

advanced:
  validate_output: true
  checksum_verification: false

logging:
  level: DEBUG
  progress_bar: true
  detailed_stats: true

Usage Examples

Using Configurations with CLI

# Research experiment
dataphy augment dataset \
  --dataset-path ./source_data \
  --output-path ./research_data \
  --config research_augmentation.yaml \
  --num-augmented 3

# Training dataset creation
dataphy augment dataset \
  --dataset-path ./original \
  --output-path ./training \
  --config training_augmentation.yaml \
  --num-augmented 5 \
  --no-preserve-original

# Webcam-only augmentation
dataphy augment dataset \
  --dataset-path ./data \
  --output-path ./webcam_aug \
  --config webcam_augmentation.yaml \
  --cameras observation.images.webcam \
  --num-augmented 2

Programmatic Usage

from dataphy.dataset.augmentor import DatasetAugmentor, AugmentationConfig
from dataphy.dataset.registry import create_dataset_loader

# Load configuration
config = AugmentationConfig(
    pipeline_config="research_augmentation.yaml",
    target="dataset",
    num_augmented_episodes=3,
    preserve_original=True,
    sync_views=True,
    random_seed=42
)

# Create augmentor and run
loader = create_dataset_loader("./source_data")
augmentor = DatasetAugmentor(loader)
results = augmentor.augment_full_dataset(config, "./output_data")

Configuration Tips

1. Choosing the Right Configuration

  • Research: Use deterministic seeds and moderate augmentation
  • Training: Use heavy augmentation without preserving originals
  • Validation: Use minimal, photometric-only augmentation
  • Testing: Use simple, consistent augmentation

2. Performance Optimization

  • GPU Available: Set device: cuda and increase batch_size
  • Limited Memory: Use low_memory_mode: true and cache_frames: false
  • High CPU: Increase max_processes and num_workers

3. Quality vs Speed Tradeoffs

  • High Quality: output_quality: 98, preserve_aspect_ratio: true
  • Fast Processing: output_quality: 85, cleanup_temp: true
  • Storage Efficient: Lower quality, shorter naming schemes

4. Error Handling

  • Production: continue_on_error: true, max_retry_attempts: 3
  • Development: continue_on_error: false, detailed logging
  • Batch Processing: Enable validation and checksums

Complete Configuration Reference

Comprehensive Configuration with All Parameters

For a complete reference showing ALL possible parameters, see the comprehensive configuration file:

comprehensive_augmentation_config.yaml
--8<-- "examples/comprehensive_augmentation_config.yaml"

This configuration demonstrates every available parameter across all transform types:

  • 26 Transform Types: All spatial, photometric, noise, blur, artifact, occlusion, and depth transforms
  • Cross-cutting Parameters: p, apply_to, sync_views, mask_protect, seed_policy for every transform
  • Global Settings: Device selection, memory management, performance tuning
  • Background Adapters: GreenAug, RoboEngine, Inpainting configurations
  • Advanced Features: Multiprocessing, validation, logging, metrics
  • Dataset Options: Naming schemes, compression, metadata handling

Use this as a reference to understand all available options, then create simplified configs for your specific use case.