Configuration Files¶

This page provides comprehensive examples of configuration files for different augmentation scenarios in the Dataphy SDK.

Episode Augmentation Configurations¶

These configurations are used with dataphy augment dataset for single episode modifications.

Basic Episode Configuration¶

basic_episode_aug.yaml

version: 1
pipeline:
  sync_views: true # Synchronized across cameras

  steps:
    - name: random_crop_pad
      keep_ratio_min: 0.88
    - name: color_jitter
      magnitude: 0.15
    - name: cutout
      holes: 1
      size_range: [8, 16]

  background:
    adapter: none

seed: 42

Advanced Episode Configuration¶

advanced_episode_aug.yaml

version: 1
pipeline:
  sync_views: true

  steps:
    - name: random_crop_pad
      keep_ratio_min: 0.85
      keep_ratio_max: 1.0
      padding_mode: reflect
    - name: random_translate
      px: 8
    - name: color_jitter
      magnitude: 0.2
    - name: random_conv
      kernel_variance: 0.04
    - name: gaussian_blur
      sigma: 0.5
      probability: 0.3
    - name: cutout
      holes: [1, 2]
      size_range: [8, 24]
      probability: 0.4

  background:
    adapter: none

seed: 1337

Dataset Augmentation Configurations¶

These configurations are used with dataphy augment dataset for creating new augmented datasets.

Research Configuration¶

Perfect for reproducible research experiments:

research_augmentation.yaml

# Research Dataset Augmentation Configuration
# Optimized for reproducible experiments with moderate augmentation

pipeline:
  sync_views: true # Synchronized across all cameras

  steps:
    # Spatial augmentations for robustness
    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.85
        probability: 0.8

    - name: random_translate
      type: RandomTranslate
      params:
        px: 8
        probability: 0.6

    # Photometric augmentations for lighting variations
    - name: color_jitter
      type: ColorJitter
      params:
        magnitude: 0.15
        probability: 0.85

    - name: random_conv
      type: RandomConv
      params:
        kernel_variance: 0.05
        probability: 0.25

# Global settings for reproducibility
settings:
  device: auto
  seed: 42
  deterministic: true
  output_quality: 95
  preserve_aspect_ratio: true

# Dataset configuration
dataset:
  preserve_original: true
  naming_scheme: "episode_{original_id}_aug_{aug_index:03d}"
  update_metadata: true

# Progress tracking
logging:
  level: INFO
  progress_bar: true
  continue_on_error: true
  detailed_stats: true

Training Configuration¶

Designed for maximum data diversity in training scenarios:

training_augmentation.yaml

# Training Dataset Augmentation Configuration
# Heavy augmentation for maximum data diversity

pipeline:
  sync_views: true

  steps:
    # Spatial augmentations
    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.75
        probability: 0.9

    - name: random_translate
      type: RandomTranslate
      params:
        px: 12
        probability: 0.7

    # Photometric augmentations
    - name: color_jitter
      type: ColorJitter
      params:
        magnitude: 0.25
        probability: 0.95

    - name: random_conv
      type: RandomConv
      params:
        kernel_variance: 0.08
        probability: 0.4

    - name: cutout
      type: Cutout
      params:
        holes: 2
        size_range: [20, 60]
        probability: 0.3

    # Occlusion augmentations
    - name: cutout
      type: Cutout
      params:
        holes: 2
        size_range: [16, 48]
        probability: 0.4

# Performance settings
settings:
  device: auto
  batch_size: 4
  num_workers: 2
  seed: null # Random seed each run
  output_quality: 90
  cache_frames: true
  max_cache_size: "4GB"

# Dataset configuration
dataset:
  preserve_original: false # Augmented-only dataset
  naming_scheme: "aug_{aug_index:04d}_{original_id}"
  update_metadata: true

# Advanced settings
advanced:
  multiprocessing: true
  max_processes: 4
  frame_batch_size: 32
  video_codec: "avc1"
  validate_output: true

# Logging
logging:
  level: INFO
  progress_bar: true
  continue_on_error: true
  max_retry_attempts: 3
  error_summary: true

Lightweight Configuration¶

Minimal augmentation for preserving data fidelity:

light_augmentation.yaml

# Light Dataset Augmentation Configuration
# Minimal augmentation preserving data fidelity

pipeline:
  sync_views: true

  steps:
    # Very subtle photometric changes
    - name: color_jitter
      type: ColorJitter
      params:
        magnitude: 0.08
        probability: 0.6

    - name: random_translate
      type: RandomTranslate
      params:
        px: 3
        probability: 0.3

# High quality settings
settings:
  device: auto
  seed: 42
  output_quality: 98
  preserve_aspect_ratio: true

# Dataset settings
dataset:
  preserve_original: true
  update_metadata: true

# Minimal logging
logging:
  level: WARNING
  progress_bar: true

Camera-Specific Configuration¶

Optimized for specific camera characteristics:

webcam_augmentation.yaml

# Webcam-Specific Augmentation Configuration
# Optimized for webcam feed characteristics

pipeline:
  sync_views: false # Different augmentation per camera

  steps:
    # Focus on lighting and noise (common webcam issues)
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.2
        contrast: 0.3 # Higher contrast variation for webcams
        saturation: 0.15
        hue: 0.04
        probability: 0.9

    - name: random_conv
      type: RandomConv
      params:
        kernel_variance: 0.03 # Simulate sensor noise
        probability: 0.6

    - name: cutout
      type: Cutout
      params:
        holes: 1
        size_range: [8, 24]
        probability: 0.3

    # Minimal spatial augmentation
    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.9 # Preserve most of the frame
        probability: 0.4

settings:
  device: auto
  seed: 42
  output_quality: 92

dataset:
  preserve_original: true
  camera_selection: "specific"
  # This would be set in CLI: --cameras observation.images.webcam

logging:
  level: INFO
  progress_bar: true

Background Replacement Configurations¶

Advanced configurations with background replacement (requires additional setup):

Green Screen Configuration¶

green_screen_augmentation.yaml

# Background Replacement with Green Screen Detection
pipeline:
  sync_views: true

  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.15
        contrast: 0.15
        probability: 0.8

    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.85
        probability: 0.6

# Background replacement
background:
  adapter: green_aug
  params:
    background_dir: "./backgrounds"
    mask_threshold: 0.1
    blur_edges: true
    probability: 0.7

settings:
  device: auto
  seed: 42

Inpainting Configuration¶

inpaint_augmentation.yaml

# Background Replacement with Inpainting
pipeline:
  sync_views: true

  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.1
        contrast: 0.1
        probability: 0.7

background:
  adapter: inpaint
  params:
    inpaint_model: "lama"
    mask_dilation: 5
    blend_edges: true

settings:
  device: cuda # Inpainting requires GPU
  seed: 42

Performance Optimization Configurations¶

High Performance Configuration¶

performance_optimized.yaml

# Performance-Optimized Configuration
# For fast processing on powerful hardware

pipeline:
  sync_views: true
  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.15
        contrast: 0.15
        probability: 0.8
    - name: random_crop_pad
      type: RandomCropPad
      params:
        keep_ratio_min: 0.85
        probability: 0.7

settings:
  device: cuda
  batch_size: 8
  num_workers: 4
  cache_frames: true
  max_cache_size: "8GB"

advanced:
  multiprocessing: true
  max_processes: 8
  frame_batch_size: 64
  low_memory_mode: false

dataset:
  preserve_original: true

logging:
  level: WARNING # Reduce logging overhead
  progress_bar: true

Memory Efficient Configuration¶

memory_efficient.yaml

# Memory-Efficient Configuration
# For systems with limited RAM

pipeline:
  sync_views: true
  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.15
        probability: 0.8

settings:
  device: cpu
  batch_size: 1
  num_workers: 1
  cache_frames: false
  cleanup_temp: true

advanced:
  low_memory_mode: true
  frame_batch_size: 8
  max_processes: 1
  multiprocessing: false

logging:
  level: INFO
  progress_bar: true

Validation and Testing Configurations¶

Validation Configuration¶

validation_augmentation.yaml

# Validation Dataset Configuration
# Conservative augmentation for validation sets

pipeline:
  sync_views: true
  steps:
    # Only photometric augmentations, no spatial changes
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.05
        contrast: 0.05
        saturation: 0.03
        hue: 0.01
        probability: 0.5

settings:
  device: auto
  seed: 12345 # Different seed from training
  deterministic: true
  output_quality: 98

dataset:
  preserve_original: true
  naming_scheme: "val_{original_id}_aug_{aug_index:02d}"

logging:
  level: INFO
  progress_bar: true

Testing Configuration¶

test_augmentation.yaml

# Testing Configuration
# For quick validation of augmentation pipeline

pipeline:
  sync_views: true
  steps:
    - name: color_jitter
      type: ColorJitter
      params:
        brightness: 0.1
        probability: 1.0 # Always apply for testing

settings:
  device: cpu # Use CPU for consistent testing
  seed: 999
  deterministic: true

dataset:
  preserve_original: true
  update_metadata: true

advanced:
  validate_output: true
  checksum_verification: false

logging:
  level: DEBUG
  progress_bar: true
  detailed_stats: true

Usage Examples¶

Using Configurations with CLI¶

# Research experiment
dataphy augment dataset \
  --dataset-path ./source_data \
  --output-path ./research_data \
  --config research_augmentation.yaml \
  --num-augmented 3

# Training dataset creation
dataphy augment dataset \
  --dataset-path ./original \
  --output-path ./training \
  --config training_augmentation.yaml \
  --num-augmented 5 \
  --no-preserve-original

# Webcam-only augmentation
dataphy augment dataset \
  --dataset-path ./data \
  --output-path ./webcam_aug \
  --config webcam_augmentation.yaml \
  --cameras observation.images.webcam \
  --num-augmented 2

Programmatic Usage¶

from dataphy.dataset.augmentor import DatasetAugmentor, AugmentationConfig
from dataphy.dataset.registry import create_dataset_loader

# Load configuration
config = AugmentationConfig(
    pipeline_config="research_augmentation.yaml",
    target="dataset",
    num_augmented_episodes=3,
    preserve_original=True,
    sync_views=True,
    random_seed=42
)

# Create augmentor and run
loader = create_dataset_loader("./source_data")
augmentor = DatasetAugmentor(loader)
results = augmentor.augment_full_dataset(config, "./output_data")

Configuration Tips¶

1. Choosing the Right Configuration¶

Research: Use deterministic seeds and moderate augmentation
Training: Use heavy augmentation without preserving originals
Validation: Use minimal, photometric-only augmentation
Testing: Use simple, consistent augmentation

2. Performance Optimization¶

GPU Available: Set device: cuda and increase batch_size
Limited Memory: Use low_memory_mode: true and cache_frames: false
High CPU: Increase max_processes and num_workers

3. Quality vs Speed Tradeoffs¶

High Quality: output_quality: 98, preserve_aspect_ratio: true
Fast Processing: output_quality: 85, cleanup_temp: true
Storage Efficient: Lower quality, shorter naming schemes

4. Error Handling¶

Production: continue_on_error: true, max_retry_attempts: 3
Development: continue_on_error: false, detailed logging
Batch Processing: Enable validation and checksums

Complete Configuration Reference¶

Comprehensive Configuration with All Parameters¶

For a complete reference showing ALL possible parameters, see the comprehensive configuration file:

comprehensive_augmentation_config.yaml

--8<-- "examples/comprehensive_augmentation_config.yaml"

This configuration demonstrates every available parameter across all transform types:

26 Transform Types: All spatial, photometric, noise, blur, artifact, occlusion, and depth transforms
Cross-cutting Parameters: p, apply_to, sync_views, mask_protect, seed_policy for every transform
Global Settings: Device selection, memory management, performance tuning
Background Adapters: GreenAug, RoboEngine, Inpainting configurations
Advanced Features: Multiprocessing, validation, logging, metrics
Dataset Options: Naming schemes, compression, metadata handling

Use this as a reference to understand all available options, then create simplified configs for your specific use case.