Wednesday, April 21, 2021
On Cost Estimation for the Recursive Relational Algebra

Abstract

Recursion is becoming a key construct in analytic systems, thanks to the increasing popularity of data structures such as graphs and growth in data over the internet. This resurgence has seen different optimization techniques proposed for recursive queries. Recursive queries are particularly useful for retrieving nodes reachable along deep paths in a graph. Their evaluation involves an iterative application of a function or operation until some condition is satisfied. Cost models remain an essential component of a query optimizer, most important for estimating the cost of query plans and quality plans selection by the optimizer. For recursive queries, however, cost estimation is far from trivial and has received less attention.

One of the challenges encountered in costing a recursive query operator or plan include determining the convergence rate of the recursive. Many systems ignore convergence rate in the data statistics, implementation algorithm and other factors that determine a good cost estimation for recursive query execution. The lack of cost estimation framework support for recursive queries and a validation framework in general for cost model are the main motivation for this work.

In this thesis, we propose a cost estimation technique for recursive terms of the extended relational algebra. This technique uses data statistics and information about the maximum iterative steps needed for recursive evaluation to converge, to estimate the cost of query plans and select an estimated cheapest query plan, in terms of computing resources usage e.g. memory footprint, CPU and I/O, and evaluation time. We also present a cost validation framework where we define a set of metrics and standard specifications for cost model, and the conditions for query plan optimality. These set of metrics and specifications are then used for assessing the efficacy and consistency of plan-selection function of a cost model and they can also serve as a guide for developing advanced cost models.

We evaluate the effectiveness of our cost estimation technique on a set of recursive graph queries on both generated and real datasets of significant size. Experiments show that our cost estimation technique improves the performance of recursive query evaluation on popular relational database engines.
Mis à jour le 16 April 2021