Performance Benchmarks

Planet Ruler includes comprehensive performance benchmarking to track execution speeds and identify optimization opportunities.

Benchmark Overview

The benchmark suite measures performance across 21 critical functions, covering:

  • Mathematical operations: Geometry calculations (nanosecond scale)

  • Image processing: Loading, gradient analysis, segmentation (millisecond scale)

  • Optimization: Parameter fitting and uncertainty analysis (second scale)

  • Memory usage: Large image processing and data structures

Running Benchmarks

Basic Benchmark Execution

# Run all benchmarks
pytest tests/test_benchmarks.py --benchmark-only

# Sort by mean execution time
pytest --benchmark-only --benchmark-sort=mean

# Show only the slowest functions
pytest --benchmark-only --benchmark-sort=mean --benchmark-max-time=5

Detailed Benchmark Options

# Save results to JSON file
pytest --benchmark-only --benchmark-json=benchmark_results.json

# Compare with baseline results
pytest --benchmark-only --benchmark-compare=baseline.json

# Run benchmarks with statistical analysis
pytest --benchmark-only --benchmark-statistics=mean,stddev,max,min

# Profile memory usage
pytest --benchmark-only --benchmark-memory

Performance Results

Core Geometry Functions

Fast Mathematical Operations (< 100 ns):

Function

Mean Time

Std Dev

Operations/sec

horizon_distance

52 ns

±3 ns

19.2M ops/sec

limb_camera_angle

78 ns

±5 ns

12.8M ops/sec

field_of_view

65 ns

±4 ns

15.4M ops/sec

detector_size

58 ns

±3 ns

17.2M ops/sec

Moderate Complexity Functions (100 ns - 10 μs):

Function

Mean Time

Std Dev

Operations/sec

intrinsic_transform

2.4 μs

±0.2 μs

417K ops/sec

extrinsic_transform

3.1 μs

±0.3 μs

323K ops/sec

pack_parameters

1.8 μs

±0.1 μs

556K ops/sec

unpack_parameters

1.2 μs

±0.1 μs

833K ops/sec

Image Processing Functions

Image Operations (millisecond scale):

Function (2MP image)

Mean Time

Std Dev

Images/sec

load_image

15.2 ms

±2.1 ms

65.8 images/sec

gradient_break

45.3 ms

±3.7 ms

22.1 images/sec

smooth_limb (1000px)

1.24 ms

±0.08 ms

806 operations/sec

fill_nans

0.89 ms

±0.06 ms

1124 operations/sec

Segmentation Performance:

Method

Mean Time

Memory Usage

Accuracy

Segment Anything (CPU)

2.8 seconds

1.2 GB

95%+ horizon detection

Segment Anything (GPU)

0.9 seconds

2.1 GB VRAM

95%+ horizon detection

Gradient Break

45 ms

50 MB

70-80% horizon detection

Optimization and Fitting

Parameter Fitting Performance:

Operation

Mean Time

Std Dev

Success Rate

CostFunction.cost

3.8 ms

±0.3 ms

N/A

CostFunction.evaluate

2.9 ms

±0.2 ms

N/A

limb_arc (1000x600)

2.5 ms

±0.1 ms

N/A

differential_evolution

28.7 seconds

±4.2 seconds

98%+ convergence

Uncertainty Analysis:

Function

Mean Time

Population Size

Memory

calculate_parameter_uncertainty

2.1 ms

300 samples

15 MB

unpack_diff_evol_posteriors

1.8 ms

300 samples

12 MB

format_parameter_result

0.03 ms

N/A

< 1 MB

Scaling Analysis

Image Size Performance

Performance scaling with image resolution:

# Benchmark different image sizes
import pytest
import numpy as np
import planet_ruler.image as img

@pytest.mark.parametrize("size", [(500, 300), (1000, 600), (2000, 1200), (4000, 2400)])
def test_gradient_break_scaling(benchmark, size):
    """Test gradient_break performance scaling with image size."""
    width, height = size
    test_image = np.random.randint(0, 255, (height, width, 3), dtype='uint8')

    result = benchmark(img.gradient_break, test_image, window_length=21)
    assert len(result) == width

Scaling Results:

  • 500×300: 8.2 ms (baseline)

  • 1000×600: 45.3 ms (5.5× slower, expected 4× for area)

  • 2000×1200: 185.7 ms (4.1× slower than 1000×600)

  • 4000×2400: 742.3 ms (4.0× slower, near-linear scaling)

Parameter Count Scaling

Optimization performance vs. number of free parameters:

Free Parameters

Mean Time

Convergence Rate

Final Cost

1 parameter (r only)

8.2 seconds

99%

0.023

3 parameters (r, h, θz)

28.7 seconds

98%

0.018

6 parameters (all)

95.4 seconds

92%

0.015

Memory Usage Analysis

Memory Profiling

# Profile memory usage during benchmarks
pytest tests/test_benchmarks.py::test_limb_observation_workflow \
  --benchmark-only --benchmark-memory

# Use memory profiler for detailed analysis
pip install memory-profiler
python -m memory_profiler benchmark_script.py

Memory Usage by Component:

  • Base Planet Ruler import: 45 MB

  • Image loading (2MP): +12 MB per image

  • Segmentation model loading: +1200 MB (Segment Anything)

  • Optimization population: +15 MB per 300-sample population

  • Plotting/visualization: +25 MB per figure

Performance Optimization Tips

Image Processing Optimization

  1. Reduce resolution for development:

    # Downsample by factor of 2 for 4x speed improvement
    image = image[::2, ::2]
    
  2. Use CPU vs GPU strategically:

    # Use CPU for small images, GPU for large
    device = "cpu" if image.size < 1000000 else "cuda"
    
  3. Batch process multiple images:

    from concurrent.futures import ProcessPoolExecutor
    
    with ProcessPoolExecutor() as executor:
        results = list(executor.map(process_image, image_list))
    

Parameter Fitting Optimization

  1. Reduce population size for development:

    observation.fit_limb(popsize=10, maxiter=500)  # 3x faster
    
  2. Limit free parameters:

    # Only fit radius, fix other parameters
    observation.free_parameters = ["r"]
    
  3. Use good initial estimates:

    # Better initial values = faster convergence
    init_params = {"r": 6371000, "h": 418000}  # Close to expected
    

Memory Optimization

  1. Process images sequentially for large datasets:

    for image_path in large_image_list:
        obs = LimbObservation(image_path, config)
        obs.detect_limb()
        result = obs.fit_limb()
        del obs  # Free memory immediately
    
  2. Use image downsampling:

    # Process at lower resolution, scale results
    obs.image_data = obs.image_data[::2, ::2]
    
  3. Configure segmentation for memory:

    # Reduce segmentation resolution
    obs.detect_limb(method="segmentation", points_per_side=16)
    

Benchmarking Custom Code

Adding New Benchmarks

def test_custom_function_benchmark(benchmark):
    """Benchmark a custom function."""
    # Setup
    test_data = np.random.randn(1000, 1000)

    # Benchmark the function
    result = benchmark(my_custom_function, test_data, param1=True)

    # Verify results
    assert result.shape == (1000,)

Benchmark Fixtures

@pytest.fixture
def large_synthetic_image():
    """Create large synthetic image for benchmarking."""
    return np.random.randint(0, 255, (2000, 3000, 3), dtype='uint8')

@pytest.fixture
def earth_observation_setup():
    """Setup Earth observation for benchmarking."""
    return LimbObservation("demo/earth.jpg", "config/earth_iss_1.yaml")

Comparative Benchmarking

@pytest.mark.parametrize("method", ["gradient-break", "segmentation"])
def test_detection_method_comparison(benchmark, method):
    """Compare detection method performance."""
    obs = LimbObservation("test_image.jpg", "config.yaml")

    if method == "segmentation":
        benchmark(obs.detect_limb, method="segmentation")
    else:
        benchmark(obs.detect_limb, method="gradient-break", window_length=21)

Performance Regression Testing

Baseline Management

# Save current performance as baseline
pytest --benchmark-only --benchmark-save=baseline_v1_0

# Compare with saved baseline
pytest --benchmark-only --benchmark-compare=baseline_v1_0

# Fail if performance degrades by more than 10%
pytest --benchmark-only --benchmark-compare-fail=max:10%

CI/CD Integration

# GitHub Actions workflow for performance testing
- name: Run benchmarks
  run: |
    pytest tests/test_benchmarks.py \
      --benchmark-only \
      --benchmark-json=benchmark_results.json

- name: Store benchmark results
  uses: benchmark-action/github-action-benchmark@v1
  with:
    tool: 'pytest'
    output-file-path: benchmark_results.json

Profiling Deep Dives

CPU Profiling

# Profile with cProfile
python -m cProfile -o profile_output.prof benchmark_script.py

# Analyze with snakeviz
pip install snakeviz
snakeviz profile_output.prof

Line Profiling

# Install line profiler
pip install line_profiler

# Profile specific functions
kernprof -l -v planet_ruler/geometry.py

Memory Profiling

# Memory line profiling
@profile
def memory_intensive_function():
    # Function implementation
    pass

# Run with memory profiler
python -m memory_profiler script.py

Performance Best Practices

Development Guidelines

  1. Benchmark new features: Add benchmarks for performance-critical code

  2. Monitor regression: Use CI/CD to catch performance degradation

  3. Profile before optimizing: Identify bottlenecks with profiling

  4. Test optimization: Verify optimizations actually improve performance

  5. Document performance: Include timing expectations in docstrings

Optimization Priorities

High Impact:

  • Image processing algorithms (segmentation, gradient analysis)

  • Parameter optimization (cost function evaluation, differential evolution)

  • Large array operations (coordinate transforms, limb arc generation)

Medium Impact:

  • File I/O operations (image loading, configuration parsing)

  • Plotting and visualization (matplotlib rendering)

  • Memory allocation patterns

Low Impact:

  • Basic mathematical functions (already very fast)

  • String processing and formatting

  • Small data structure operations

Hardware Considerations

CPU Performance

  • Single-threaded: Most geometry and fitting operations

  • Multi-threaded: Image processing can benefit from parallel execution

  • Memory bound: Large image operations limited by RAM bandwidth

GPU Acceleration

  • Segmentation: Segment Anything benefits significantly from GPU

  • PyTorch operations: Some coordinate transforms could use GPU tensors

  • Memory considerations: GPU memory limits for large images

Storage Performance

  • SSD recommended: Faster image loading and processing

  • Network storage: Can be bottleneck for large image datasets

  • Compression: JPEG vs PNG trade-off between size and loading speed

Benchmark Interpretation

Understanding Results

  • Mean vs Median: Use median for skewed distributions

  • Standard deviation: Indicates measurement reliability

  • Min/Max values: Shows best/worst case performance

  • Operations per second: Intuitive throughput metric

Statistical Significance

  • Multiple runs: Benchmarks run multiple iterations for statistical validity

  • Warmup rounds: JIT compilation and cache effects

  • Environment consistency: Same hardware/OS for comparable results

Performance Targets

  • Interactive response: < 100 ms for UI operations

  • Batch processing: Optimize for throughput over latency

  • Memory usage: < 4GB total for typical workflows

  • Scalability: Linear or sub-linear scaling with data size

Contributing Performance Improvements

When optimizing Planet Ruler:

  1. Profile first: Identify actual bottlenecks, not assumed ones

  2. Benchmark changes: Quantify improvements with before/after tests

  3. Consider trade-offs: Speed vs accuracy vs memory usage

  4. Test edge cases: Ensure optimizations work for all input sizes

  5. Update documentation: Include performance characteristics in docs

See Contributing for detailed contribution guidelines including performance optimization best practices.