Performance Benchmarks

Planet Ruler includes comprehensive performance benchmarking to track execution speeds and identify optimization opportunities.

Benchmark Overview

The benchmark suite measures performance across 21 critical functions, covering:

Mathematical operations: Geometry calculations (nanosecond scale)
Image processing: Loading, gradient analysis, segmentation (millisecond scale)
Optimization: Parameter fitting and uncertainty analysis (second scale)
Memory usage: Large image processing and data structures

Running Benchmarks

Basic Benchmark Execution

# Run all benchmarks
pytest tests/test_benchmarks.py --benchmark-only

# Sort by mean execution time
pytest --benchmark-only --benchmark-sort=mean

# Show only the slowest functions
pytest --benchmark-only --benchmark-sort=mean --benchmark-max-time=5

Detailed Benchmark Options

# Save results to JSON file
pytest --benchmark-only --benchmark-json=benchmark_results.json

# Compare with baseline results
pytest --benchmark-only --benchmark-compare=baseline.json

# Run benchmarks with statistical analysis
pytest --benchmark-only --benchmark-statistics=mean,stddev,max,min

# Profile memory usage
pytest --benchmark-only --benchmark-memory

Performance Results

Core Geometry Functions

Fast Mathematical Operations (< 100 ns):

Function	Mean Time	Std Dev	Operations/sec
horizon_distance	52 ns	±3 ns	19.2M ops/sec
limb_camera_angle	78 ns	±5 ns	12.8M ops/sec
field_of_view	65 ns	±4 ns	15.4M ops/sec
detector_size	58 ns	±3 ns	17.2M ops/sec

Moderate Complexity Functions (100 ns - 10 μs):

Function	Mean Time	Std Dev	Operations/sec
intrinsic_transform	2.4 μs	±0.2 μs	417K ops/sec
extrinsic_transform	3.1 μs	±0.3 μs	323K ops/sec
pack_parameters	1.8 μs	±0.1 μs	556K ops/sec
unpack_parameters	1.2 μs	±0.1 μs	833K ops/sec

Image Processing Functions

Image Operations (millisecond scale):

Function (2MP image)	Mean Time	Std Dev	Images/sec
load_image	15.2 ms	±2.1 ms	65.8 images/sec
gradient_break	45.3 ms	±3.7 ms	22.1 images/sec
smooth_limb (1000px)	1.24 ms	±0.08 ms	806 operations/sec
fill_nans	0.89 ms	±0.06 ms	1124 operations/sec

Segmentation Performance:

Method	Mean Time	Memory Usage	Accuracy
Segment Anything (CPU)	2.8 seconds	1.2 GB	95%+ horizon detection
Segment Anything (GPU)	0.9 seconds	2.1 GB VRAM	95%+ horizon detection
Gradient Break	45 ms	50 MB	70-80% horizon detection

Optimization and Fitting

Parameter Fitting Performance:

Operation	Mean Time	Std Dev	Success Rate
CostFunction.cost	3.8 ms	±0.3 ms	N/A
CostFunction.evaluate	2.9 ms	±0.2 ms	N/A
limb_arc (1000x600)	2.5 ms	±0.1 ms	N/A
differential_evolution	28.7 seconds	±4.2 seconds	98%+ convergence

Uncertainty Analysis:

Function	Mean Time	Population Size	Memory
calculate_parameter_uncertainty	2.1 ms	300 samples	15 MB
unpack_diff_evol_posteriors	1.8 ms	300 samples	12 MB
format_parameter_result	0.03 ms	N/A	< 1 MB

Scaling Analysis

Image Size Performance

Performance scaling with image resolution:

# Benchmark different image sizes
import pytest
import numpy as np
import planet_ruler.image as img

@pytest.mark.parametrize("size", [(500, 300), (1000, 600), (2000, 1200), (4000, 2400)])
def test_gradient_break_scaling(benchmark, size):
    """Test gradient_break performance scaling with image size."""
    width, height = size
    test_image = np.random.randint(0, 255, (height, width, 3), dtype='uint8')

    result = benchmark(img.gradient_break, test_image, window_length=21)
    assert len(result) == width

Scaling Results:

500×300: 8.2 ms (baseline)
1000×600: 45.3 ms (5.5× slower, expected 4× for area)
2000×1200: 185.7 ms (4.1× slower than 1000×600)
4000×2400: 742.3 ms (4.0× slower, near-linear scaling)

Parameter Count Scaling

Optimization performance vs. number of free parameters:

Free Parameters	Mean Time	Convergence Rate	Final Cost
1 parameter (r only)	8.2 seconds	99%	0.023
3 parameters (r, h, θz)	28.7 seconds	98%	0.018
6 parameters (all)	95.4 seconds	92%	0.015

Memory Usage Analysis

Memory Profiling

# Profile memory usage during benchmarks
pytest tests/test_benchmarks.py::test_limb_observation_workflow \
  --benchmark-only --benchmark-memory

# Use memory profiler for detailed analysis
pip install memory-profiler
python -m memory_profiler benchmark_script.py

Memory Usage by Component:

Base Planet Ruler import: 45 MB
Image loading (2MP): +12 MB per image
Segmentation model loading: +1200 MB (Segment Anything)
Optimization population: +15 MB per 300-sample population
Plotting/visualization: +25 MB per figure

Performance Optimization Tips

Image Processing Optimization

Reduce resolution for development:

# Downsample by factor of 2 for 4x speed improvement
image = image[::2, ::2]

Use CPU vs GPU strategically:

# Use CPU for small images, GPU for large
device = "cpu" if image.size < 1000000 else "cuda"

Batch process multiple images:

from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor() as executor:
    results = list(executor.map(process_image, image_list))

Parameter Fitting Optimization

Reduce population size for development:

observation.fit_limb(popsize=10, maxiter=500)  # 3x faster

Limit free parameters:

# Only fit radius, fix other parameters
observation.free_parameters = ["r"]

Use good initial estimates:

# Better initial values = faster convergence
init_params = {"r": 6371000, "h": 418000}  # Close to expected

Memory Optimization

Process images sequentially for large datasets:

for image_path in large_image_list:
    obs = LimbObservation(image_path, config)
    obs.detect_limb()
    result = obs.fit_limb()
    del obs  # Free memory immediately

Use image downsampling:

# Process at lower resolution, scale results
obs.image_data = obs.image_data[::2, ::2]

Configure segmentation for memory:

# Reduce segmentation resolution
obs.detect_limb(detection_method="segmentation", points_per_side=16)

Benchmarking Custom Code

Adding New Benchmarks

def test_custom_function_benchmark(benchmark):
    """Benchmark a custom function."""
    # Setup
    test_data = np.random.randn(1000, 1000)

    # Benchmark the function
    result = benchmark(my_custom_function, test_data, param1=True)

    # Verify results
    assert result.shape == (1000,)

Benchmark Fixtures

@pytest.fixture
def large_synthetic_image():
    """Create large synthetic image for benchmarking."""
    return np.random.randint(0, 255, (2000, 3000, 3), dtype='uint8')

@pytest.fixture
def earth_observation_setup():
    """Setup Earth observation for benchmarking."""
    return LimbObservation("demo/earth.jpg", "config/earth_iss_1.yaml")

Comparative Benchmarking

@pytest.mark.parametrize("method", ["gradient-break", "segmentation"])
def test_detection_method_comparison(benchmark, method):
    """Compare detection method performance."""
    obs = LimbObservation("test_image.jpg", "config.yaml")

    if method == "segmentation":
        benchmark(obs.detect_limb, detection_method="segmentation")
    else:
        benchmark(obs.detect_limb, detection_method="gradient-break", window_length=21)

Performance Regression Testing

Baseline Management

# Save current performance as baseline
pytest --benchmark-only --benchmark-save=baseline_v1_0

# Compare with saved baseline
pytest --benchmark-only --benchmark-compare=baseline_v1_0

# Fail if performance degrades by more than 10%
pytest --benchmark-only --benchmark-compare-fail=max:10%

CI/CD Integration

# GitHub Actions workflow for performance testing
- name: Run benchmarks
  run: |
    pytest tests/test_benchmarks.py \
      --benchmark-only \
      --benchmark-json=benchmark_results.json

- name: Store benchmark results
  uses: benchmark-action/github-action-benchmark@v1
  with:
    tool: 'pytest'
    output-file-path: benchmark_results.json

Profiling Deep Dives

CPU Profiling

# Profile with cProfile
python -m cProfile -o profile_output.prof benchmark_script.py

# Analyze with snakeviz
pip install snakeviz
snakeviz profile_output.prof

Line Profiling

# Install line profiler
pip install line_profiler

# Profile specific functions
kernprof -l -v planet_ruler/geometry.py

Memory Profiling

# Memory line profiling
@profile
def memory_intensive_function():
    # Function implementation
    pass

# Run with memory profiler
python -m memory_profiler script.py

Performance Best Practices

Development Guidelines

Benchmark new features: Add benchmarks for performance-critical code
Monitor regression: Use CI/CD to catch performance degradation
Profile before optimizing: Identify bottlenecks with profiling
Test optimization: Verify optimizations actually improve performance
Document performance: Include timing expectations in docstrings

Optimization Priorities

High Impact:

Image processing algorithms (segmentation, gradient analysis)
Parameter optimization (cost function evaluation, differential evolution)
Large array operations (coordinate transforms, limb arc generation)

Medium Impact:

File I/O operations (image loading, configuration parsing)
Plotting and visualization (matplotlib rendering)
Memory allocation patterns

Low Impact:

Basic mathematical functions (already very fast)
String processing and formatting
Small data structure operations

Hardware Considerations

CPU Performance

Single-threaded: Most geometry and fitting operations
Multi-threaded: Image processing can benefit from parallel execution
Memory bound: Large image operations limited by RAM bandwidth

GPU Acceleration

Segmentation: Segment Anything benefits significantly from GPU
PyTorch operations: Some coordinate transforms could use GPU tensors
Memory considerations: GPU memory limits for large images

Storage Performance

SSD recommended: Faster image loading and processing
Network storage: Can be bottleneck for large image datasets
Compression: JPEG vs PNG trade-off between size and loading speed

Benchmark Interpretation

Understanding Results

Mean vs Median: Use median for skewed distributions
Standard deviation: Indicates measurement reliability
Min/Max values: Shows best/worst case performance
Operations per second: Intuitive throughput metric

Statistical Significance

Multiple runs: Benchmarks run multiple iterations for statistical validity
Warmup rounds: JIT compilation and cache effects
Environment consistency: Same hardware/OS for comparable results

Performance Targets

Interactive response: < 100 ms for UI operations
Batch processing: Optimize for throughput over latency
Memory usage: < 4GB total for typical workflows
Scalability: Linear or sub-linear scaling with data size

Contributing Performance Improvements

When optimizing Planet Ruler:

Profile first: Identify actual bottlenecks, not assumed ones
Benchmark changes: Quantify improvements with before/after tests
Consider trade-offs: Speed vs accuracy vs memory usage
Test edge cases: Ensure optimizations work for all input sizes
Update documentation: Include performance characteristics in docs

See Contributing for detailed contribution guidelines including performance optimization best practices.