Scaling Search with PEM-BiHS

5 min readJan 10, 2025

Efficient search algorithms form the backbone of solving complex optimization and planning problems. As the scale of these problems grows, so do the demands on computational power and memory. Addressing these challenges, the paper “On Parallel External-Memory Bidirectional Search” introduces PEM-BiHS (Parallel External-Memory Bidirectional Heuristic Search). This framework make search algorithms more efficient by combining bidirectional search (BiHS) with parallel processing and external memory utilization, enabling scalability far beyond traditional approaches.

This post breaks down the foundations of PEM-BiHS, its core innovations, and experimental findings, explaining key concepts and their relevance. We also discuss practical applications, including how PEM-BiHS principles can inspire solutions for orchestrating workflows in multi-agent systems involving 100+ LLM agents.

Understanding the Foundations

Unidirectional vs. Bidirectional Search

Unidirectional Search (UniHS): Expands nodes in a single direction, typically from a start state to a goal state.

Algorithms like A* guide exploration using a heuristic function f = g + h , where:

g : Path cost to the current node.
h : Estimated cost to the goal.

Example: Imagine solving the 15-puzzle: You start with a shuffled grid of numbered tiles and aim to move them back to the solved configuration (e.g., numbers ordered left to right, top to bottom). A* would evaluate potential moves, expanding paths that minimize f = g + h . Here, g is the number of moves so far, and h could be the Manhattan Distance (the sum of vertical and horizontal distances each tile is from its goal position).

Limitation: UniHS explores a large portion of the state space, especially for problems with distant goals or large graphs. The number of explored nodes grows exponentially, making it inefficient for complex problems.

Bidirectional Search (BiHS): Runs two searches simultaneously:

Forward search from the start state.
Backward search from the goal state.

The searches meet in the middle, reducing the overall search space.

Example: In the 15-puzzle, one search starts from the shuffled state, and the other begins with the solved configuration. When their paths overlap, the solution is found.

Key Advantage: BiHS explores fewer nodes than UniHS by halving the problem size.

Challenges in Scaling Search

Despite its promise, BiHS faces practical constraints:

1. Memory Limits: Large search problems generate vast state spaces that exceed RAM capacity.

2. Heuristic Sensitivity: BiHS performance depends on consistent, admissible heuristics. Poorly tuned heuristics can hinder its efficiency.

3. Redundancy: Duplicate detection and overlapping exploration between forward and backward searches can waste resources.

4. Scalability Bottlenecks: Adding computational threads or storage doesn’t always translate into linear performance gains due to I/O bottlenecks and workload imbalances.

These challenges motivated the development of PEM-BiHS, designed to overcome these limitations.

What is PEM-BiHS?

PEM-BiHS merges Bidirectional Heuristic Search (BiHS) with Parallel External-Memory (PEM) techniques. This combination creates a robust framework for solving large-scale problems efficiently. Its key innovation is leveraging external memory and parallelism to manage large state spaces while ensuring computational efficiency.

Core Components of PEM-BiHS

1. Bucket-Based State Management:

What it does: Groups nodes into buckets based on heuristic values ( g, h ).
Why it matters: Buckets allow efficient storage and retrieval of states from external memory while facilitating parallel processing.
Example: In the 24-puzzle (an extended version of the 15-puzzle with a 5x5 grid), buckets might group states with similar move costs or heuristic estimates.

2. Delayed Duplicate Detection (DDD):

What it does: Defers duplicate checks until nodes are retrieved from buckets.
Why it matters: Reduces unnecessary I/O operations during node generation, speeding up the overall search process.

3. Dynamic Load Balancing:

What it does: Dynamically adjusts bucket sizes to ensure even workload distribution across threads.
Why it matters: Prevents bottlenecks caused by uneven processing of buckets.

4. PEM-BAE*:

What it is: A bidirectional algorithm optimized for external memory and parallelism.
Why it’s novel: Extends the state-of-the-art BAE* algorithm to handle large-scale problems with limited computational resources.

Experimental Insights

The PEM-BiHS framework was rigorously tested on challenging benchmarks like the 15-puzzle, 24-puzzle, and 4-peg Towers of Hanoi. These benchmarks are known for their vast state spaces and high computational complexity.

Benchmarks Explained

15-Puzzle and 24-Puzzle:

What it is: A sliding-tile puzzle where the goal is to rearrange tiles to reach a solved configuration.
Challenge: The state space grows exponentially with the number of tiles. The 24-puzzle, for instance, has 10^{25} possible states.

4-Peg Towers of Hanoi (TOH4):

What it is: A variation of the classic problem where disks must be moved between pegs following specific rules.
Challenge: Introducing a fourth peg increases the complexity significantly, with large numbers of duplicate states to manage.

Key Results

1. Performance with Strong Heuristics: PEM-BiHS challenges the long-held belief that BiHS struggles with strong heuristics. Results show it outperforms UniHS in heuristic-intensive scenarios.

2. Scalability: Successfully solved problems requiring up to 4TB of memory, far exceeding the capacity of traditional RAM-based algorithms.

3. Efficient Parallelism: Runtime improved significantly with up to 16 threads, though gains diminished due to I/O bottlenecks.

4. Reduction in Node Expansion: PEM-BiHS expanded far fewer nodes than comparable methods, conserving computational resources and speeding up solutions.

Core Learnings

1. BiHS with Strong Heuristics Works: The framework demonstrates that BiHS can handle strong, admissible heuristics effectively, breaking a long-standing misconception.

2. External Memory Unlocks Scalability: Efficient use of external memory allows the exploration of state spaces beyond RAM limits.

3. Parallelism is Effective When Balanced: Adaptive workload distribution is critical to prevent bottlenecks and maximize thread utilization.

4. Buckets and Delayed Detection are Key: Grouping states by heuristic values and deferring duplicate detection streamline both memory and processing.

Applications: AI Workflow Orchestration

Scenario: In systems with 100+ agents, tasks like query parsing, retrieval, summarization, and validation need efficient orchestration.

PEM-BiHS Approach: Agents are treated as nodes, with dependencies as edges.

Forward Search identifies workflows from the query.

Backward Search validates workflows against the required output.

Buckets group agents by function (e.g., retrieval or summarization), enabling parallel execution.

Wrapping up

PEM-BiHS sets a new standard for scalable search algorithms. By combining bidirectional search, parallelism, and external memory, it tackles challenges that traditional approaches cannot overcome. Beyond solving puzzles and combinatorial problems, PEM-BiHS principles offer a roadmap for orchestrating large-scale AI workflows, ensuring scalability without sacrificing efficiency.

This demonstrates how blending established techniques with innovative adaptations can redefine possibilities in search and optimization.