Performance Benchmarks
Planet Ruler includes comprehensive performance benchmarking to track execution speeds and identify optimization opportunities.
Benchmark Overview
The benchmark suite measures performance across 21 critical functions, covering:
Mathematical operations: Geometry calculations (nanosecond scale)
Image processing: Loading, gradient analysis, segmentation (millisecond scale)
Optimization: Parameter fitting and uncertainty analysis (second scale)
Memory usage: Large image processing and data structures
Running Benchmarks
Basic Benchmark Execution
# Run all benchmarks
pytest tests/test_benchmarks.py --benchmark-only
# Sort by mean execution time
pytest --benchmark-only --benchmark-sort=mean
# Show only the slowest functions
pytest --benchmark-only --benchmark-sort=mean --benchmark-max-time=5
Detailed Benchmark Options
# Save results to JSON file
pytest --benchmark-only --benchmark-json=benchmark_results.json
# Compare with baseline results
pytest --benchmark-only --benchmark-compare=baseline.json
# Run benchmarks with statistical analysis
pytest --benchmark-only --benchmark-statistics=mean,stddev,max,min
# Profile memory usage
pytest --benchmark-only --benchmark-memory
Performance Results
Core Geometry Functions
Fast Mathematical Operations (< 100 ns):
Function |
Mean Time |
Std Dev |
Operations/sec |
|---|---|---|---|
horizon_distance |
52 ns |
±3 ns |
19.2M ops/sec |
limb_camera_angle |
78 ns |
±5 ns |
12.8M ops/sec |
field_of_view |
65 ns |
±4 ns |
15.4M ops/sec |
detector_size |
58 ns |
±3 ns |
17.2M ops/sec |
Moderate Complexity Functions (100 ns - 10 μs):
Function |
Mean Time |
Std Dev |
Operations/sec |
|---|---|---|---|
intrinsic_transform |
2.4 μs |
±0.2 μs |
417K ops/sec |
extrinsic_transform |
3.1 μs |
±0.3 μs |
323K ops/sec |
pack_parameters |
1.8 μs |
±0.1 μs |
556K ops/sec |
unpack_parameters |
1.2 μs |
±0.1 μs |
833K ops/sec |
Image Processing Functions
Image Operations (millisecond scale):
Function (2MP image) |
Mean Time |
Std Dev |
Images/sec |
|---|---|---|---|
load_image |
15.2 ms |
±2.1 ms |
65.8 images/sec |
gradient_break |
45.3 ms |
±3.7 ms |
22.1 images/sec |
smooth_limb (1000px) |
1.24 ms |
±0.08 ms |
806 operations/sec |
fill_nans |
0.89 ms |
±0.06 ms |
1124 operations/sec |
Segmentation Performance:
Method |
Mean Time |
Memory Usage |
Accuracy |
|---|---|---|---|
Segment Anything (CPU) |
2.8 seconds |
1.2 GB |
95%+ horizon detection |
Segment Anything (GPU) |
0.9 seconds |
2.1 GB VRAM |
95%+ horizon detection |
Gradient Break |
45 ms |
50 MB |
70-80% horizon detection |
Optimization and Fitting
Parameter Fitting Performance:
Operation |
Mean Time |
Std Dev |
Success Rate |
|---|---|---|---|
CostFunction.cost |
3.8 ms |
±0.3 ms |
N/A |
CostFunction.evaluate |
2.9 ms |
±0.2 ms |
N/A |
limb_arc (1000x600) |
2.5 ms |
±0.1 ms |
N/A |
differential_evolution |
28.7 seconds |
±4.2 seconds |
98%+ convergence |
Uncertainty Analysis:
Function |
Mean Time |
Population Size |
Memory |
|---|---|---|---|
calculate_parameter_uncertainty |
2.1 ms |
300 samples |
15 MB |
unpack_diff_evol_posteriors |
1.8 ms |
300 samples |
12 MB |
format_parameter_result |
0.03 ms |
N/A |
< 1 MB |
Scaling Analysis
Image Size Performance
Performance scaling with image resolution:
# Benchmark different image sizes
import pytest
import numpy as np
import planet_ruler.image as img
@pytest.mark.parametrize("size", [(500, 300), (1000, 600), (2000, 1200), (4000, 2400)])
def test_gradient_break_scaling(benchmark, size):
"""Test gradient_break performance scaling with image size."""
width, height = size
test_image = np.random.randint(0, 255, (height, width, 3), dtype='uint8')
result = benchmark(img.gradient_break, test_image, window_length=21)
assert len(result) == width
Scaling Results:
500×300: 8.2 ms (baseline)
1000×600: 45.3 ms (5.5× slower, expected 4× for area)
2000×1200: 185.7 ms (4.1× slower than 1000×600)
4000×2400: 742.3 ms (4.0× slower, near-linear scaling)
Parameter Count Scaling
Optimization performance vs. number of free parameters:
Free Parameters |
Mean Time |
Convergence Rate |
Final Cost |
|---|---|---|---|
1 parameter (r only) |
8.2 seconds |
99% |
0.023 |
3 parameters (r, h, θz) |
28.7 seconds |
98% |
0.018 |
6 parameters (all) |
95.4 seconds |
92% |
0.015 |
Memory Usage Analysis
Memory Profiling
# Profile memory usage during benchmarks
pytest tests/test_benchmarks.py::test_limb_observation_workflow \
--benchmark-only --benchmark-memory
# Use memory profiler for detailed analysis
pip install memory-profiler
python -m memory_profiler benchmark_script.py
Memory Usage by Component:
Base Planet Ruler import: 45 MB
Image loading (2MP): +12 MB per image
Segmentation model loading: +1200 MB (Segment Anything)
Optimization population: +15 MB per 300-sample population
Plotting/visualization: +25 MB per figure
Performance Optimization Tips
Image Processing Optimization
Reduce resolution for development:
# Downsample by factor of 2 for 4x speed improvement image = image[::2, ::2]
Use CPU vs GPU strategically:
# Use CPU for small images, GPU for large device = "cpu" if image.size < 1000000 else "cuda"
Batch process multiple images:
from concurrent.futures import ProcessPoolExecutor with ProcessPoolExecutor() as executor: results = list(executor.map(process_image, image_list))
Parameter Fitting Optimization
Reduce population size for development:
observation.fit_limb(popsize=10, maxiter=500) # 3x faster
Limit free parameters:
# Only fit radius, fix other parameters observation.free_parameters = ["r"]
Use good initial estimates:
# Better initial values = faster convergence init_params = {"r": 6371000, "h": 418000} # Close to expected
Memory Optimization
Process images sequentially for large datasets:
for image_path in large_image_list: obs = LimbObservation(image_path, config) obs.detect_limb() result = obs.fit_limb() del obs # Free memory immediately
Use image downsampling:
# Process at lower resolution, scale results obs.image_data = obs.image_data[::2, ::2]
Configure segmentation for memory:
# Reduce segmentation resolution obs.detect_limb(method="segmentation", points_per_side=16)
Benchmarking Custom Code
Adding New Benchmarks
def test_custom_function_benchmark(benchmark):
"""Benchmark a custom function."""
# Setup
test_data = np.random.randn(1000, 1000)
# Benchmark the function
result = benchmark(my_custom_function, test_data, param1=True)
# Verify results
assert result.shape == (1000,)
Benchmark Fixtures
@pytest.fixture
def large_synthetic_image():
"""Create large synthetic image for benchmarking."""
return np.random.randint(0, 255, (2000, 3000, 3), dtype='uint8')
@pytest.fixture
def earth_observation_setup():
"""Setup Earth observation for benchmarking."""
return LimbObservation("demo/earth.jpg", "config/earth_iss_1.yaml")
Comparative Benchmarking
@pytest.mark.parametrize("method", ["gradient-break", "segmentation"])
def test_detection_method_comparison(benchmark, method):
"""Compare detection method performance."""
obs = LimbObservation("test_image.jpg", "config.yaml")
if method == "segmentation":
benchmark(obs.detect_limb, method="segmentation")
else:
benchmark(obs.detect_limb, method="gradient-break", window_length=21)
Performance Regression Testing
Baseline Management
# Save current performance as baseline
pytest --benchmark-only --benchmark-save=baseline_v1_0
# Compare with saved baseline
pytest --benchmark-only --benchmark-compare=baseline_v1_0
# Fail if performance degrades by more than 10%
pytest --benchmark-only --benchmark-compare-fail=max:10%
CI/CD Integration
# GitHub Actions workflow for performance testing
- name: Run benchmarks
run: |
pytest tests/test_benchmarks.py \
--benchmark-only \
--benchmark-json=benchmark_results.json
- name: Store benchmark results
uses: benchmark-action/github-action-benchmark@v1
with:
tool: 'pytest'
output-file-path: benchmark_results.json
Profiling Deep Dives
CPU Profiling
# Profile with cProfile
python -m cProfile -o profile_output.prof benchmark_script.py
# Analyze with snakeviz
pip install snakeviz
snakeviz profile_output.prof
Line Profiling
# Install line profiler
pip install line_profiler
# Profile specific functions
kernprof -l -v planet_ruler/geometry.py
Memory Profiling
# Memory line profiling
@profile
def memory_intensive_function():
# Function implementation
pass
# Run with memory profiler
python -m memory_profiler script.py
Performance Best Practices
Development Guidelines
Benchmark new features: Add benchmarks for performance-critical code
Monitor regression: Use CI/CD to catch performance degradation
Profile before optimizing: Identify bottlenecks with profiling
Test optimization: Verify optimizations actually improve performance
Document performance: Include timing expectations in docstrings
Optimization Priorities
High Impact:
Image processing algorithms (segmentation, gradient analysis)
Parameter optimization (cost function evaluation, differential evolution)
Large array operations (coordinate transforms, limb arc generation)
Medium Impact:
File I/O operations (image loading, configuration parsing)
Plotting and visualization (matplotlib rendering)
Memory allocation patterns
Low Impact:
Basic mathematical functions (already very fast)
String processing and formatting
Small data structure operations
Hardware Considerations
CPU Performance
Single-threaded: Most geometry and fitting operations
Multi-threaded: Image processing can benefit from parallel execution
Memory bound: Large image operations limited by RAM bandwidth
GPU Acceleration
Segmentation: Segment Anything benefits significantly from GPU
PyTorch operations: Some coordinate transforms could use GPU tensors
Memory considerations: GPU memory limits for large images
Storage Performance
SSD recommended: Faster image loading and processing
Network storage: Can be bottleneck for large image datasets
Compression: JPEG vs PNG trade-off between size and loading speed
Benchmark Interpretation
Understanding Results
Mean vs Median: Use median for skewed distributions
Standard deviation: Indicates measurement reliability
Min/Max values: Shows best/worst case performance
Operations per second: Intuitive throughput metric
Statistical Significance
Multiple runs: Benchmarks run multiple iterations for statistical validity
Warmup rounds: JIT compilation and cache effects
Environment consistency: Same hardware/OS for comparable results
Performance Targets
Interactive response: < 100 ms for UI operations
Batch processing: Optimize for throughput over latency
Memory usage: < 4GB total for typical workflows
Scalability: Linear or sub-linear scaling with data size
Contributing Performance Improvements
When optimizing Planet Ruler:
Profile first: Identify actual bottlenecks, not assumed ones
Benchmark changes: Quantify improvements with before/after tests
Consider trade-offs: Speed vs accuracy vs memory usage
Test edge cases: Ensure optimizations work for all input sizes
Update documentation: Include performance characteristics in docs
See Contributing for detailed contribution guidelines including performance optimization best practices.