FarragoPlannerProofs

From Eigenpedia

Jump to: navigation, search

This page sketches out a framework for multiple planners to cooperate in optimizing a query. When would this be useful?

  • Grid-based planning. The search required for optimization can be very CPU-intensive, so it's a natural candidate for grid-based computing. A master planner might carve up the search space in some way and dispatch sub-searches to slave planners. The resulting parallelism could be used to implement batch optimization of offline queries across distributed grids, or to allow for better response time in interactive query processing across centralized utility-computing grids, without sacrificing thoroughness.
  • Hybrid planners. A heuristic planner might be used during the first phase of optimization to find a feasible but non-optimal plan. The second phase would involve cost-based optimization, using the heuristic output as a starting point.

To enable these use-cases, we could introduce the notion of a RelOptProof. This is a sequence of rules to be applied to particular RelNodes in a plan A, taking it to plan B as a result. Assuming rules are deterministic, this allows one planner to follow the steps taken by another planner, and to indepently verify that they produce the claimed result. The verification aspect would be important in untrusted grid environments (VOLCANO@Home); the master would send out requests, and the slaves would respond with optimized plans accompanied by proofs.

In the hybrid case, the proof is important because it allows the cost-based planner to recapitulate the transformations applied by the heuristic planner (rather than merely starting from the partially optimized plan). As a result, it can build up the necessary equivalence class mappings and then carry out its work based on these.

The data structure for RelOptProof would contain:

  • a start plan
  • a sequence of steps
  • for each step, the rule to apply, along with bindings to the nodes of the previous step
  • for each step, the intermediate plan produced as a result
  • (the final step would imply the expected final plan)

Some refinement is needed here to add the notion of registering an asserted equivalence (e.g. what SwapJoinRule does).


One way to obtain a proof would be to add a method to the RelOptPlanner interface allowing clients to ask for a proof after planning completes. This would require the planner to keep its own audit trail. An alternative is to write an implementation of RelOptListener which can construct a proof automatically based on the events it receives from a planner it is attached to, freeing planner implementations from this burden. The listener approach may not work well with some planners since it would have to maintain a lot of alternative proofs and only keep one based on the planner's final plan choices, whereas the planner itself might be able to represent this more compactly, discard obsolete information, etc.


Here's a problem: what about rules which are deterministic but make their decisions based on information which is local, or on relational expression metadata queries for which different planners might give different answers?

Personal tools