cheat/adr/003-search-parallelization.md

# ADR-003: No Parallelization for Search Operations

Date: 2025-01-22

## Status

Accepted

## Context

We investigated optimizing cheat's search performance through parallelization. Initial assumptions suggested that I/O operations (reading multiple cheatsheet files) would be the primary bottleneck, making parallel processing beneficial.

Performance benchmarks were implemented to measure search operations, and a parallel search implementation using goroutines was created and tested.

## Decision

We will **not** implement parallel search. The sequential implementation will remain unchanged.

## Rationale

### Performance Profile Analysis

CPU profiling revealed that search performance is dominated by:
- **Process creation overhead** (~30% in `os/exec.(*Cmd).Run`)
- **System calls** (~30% in `syscall.Syscall6`)
- **Process management** (fork, exec, pipe setup)

The actual search logic (regex matching, file reading) was negligible in the profile, indicating our optimization efforts were targeting the wrong bottleneck.

### Benchmark Results

Parallel implementation showed minimal improvements:
- Simple search: 17ms → 15.3ms (10% improvement)
- Regex search: 15ms → 14.9ms (minimal improvement)
- Colorized search: 19.5ms → 16.8ms (14% improvement)
- Complex regex: 20ms → 15.3ms (24% improvement)

The best case saved only ~5ms in absolute terms.

### Cost-Benefit Analysis

**Costs of parallelization:**
- Added complexity with goroutines, channels, and synchronization
- Increased maintenance burden
- More difficult debugging and testing
- Potential race conditions

**Benefits:**
- 5-15% performance improvement (5ms in real terms)
- Imperceptible to users in interactive use

### User Experience Perspective

For a command-line tool:
- Current 15-20ms response time is excellent
- Users cannot perceive 5ms differences
- Sub-50ms is considered "instant" in HCI research

## Consequences

### Positive
- Simpler, more maintainable codebase
- Easier to debug and reason about
- No synchronization bugs or race conditions
- Focus remains on code clarity

### Negative
- Missed opportunity for ~5ms performance gain
- Search remains single-threaded

### Neutral
- Performance remains excellent for intended use case
- Follows Go philosophy of preferring simplicity

## Alternatives Considered

### 1. Keep Parallel Implementation
**Rejected**: Complexity outweighs negligible performance gains.

### 2. Optimize Process Startup
**Rejected**: Process creation overhead is inherent to CLI tools and cannot be avoided without fundamental architecture changes.

### 3. Future Optimizations
If performance becomes critical, consider:
- **Long-running daemon**: Eliminate process startup overhead entirely
- **Shell function**: Reduce fork/exec overhead
- **Compiled-in cheatsheets**: Eliminate file I/O

However, these would fundamentally change the tool's architecture and usage model.

## Notes

This decision reinforces important principles:
1. Always profile before optimizing
2. Consider the full execution context
3. Measure what matters to users
4. Complexity has a real cost

The parallelization attempt was valuable as a learning exercise and definitively answered whether this optimization path was worthwhile.

## References

- Benchmark implementation: cmd/cheat/search_bench_test.go
- Reverted parallel implementation: see git history (commit 82eb918)