The sqltoast library is an Apache 2 licensed C++ library designed for both
fast and fully thread-safe parsing of SQL into a collection of structs that
represent the abstract syntax tree (thus SQL to AST, or "sqltoast").
There is one primary function that the library publishes: sqltoast::parse().
This function comes in two flavors; one which accepts just the string to be parsed and one which accepts the string to be parsed along with a struct representing parsing options and instructions for sqltoast.
The return value of sqltoast::parse() is a reference to a struct
sqltoast::parse_result_t that describes the results of the parse:
#include <iostream>
#include <sqltoast/sqltoast.h>
int main(int argc, char *argv[]) {
std::string input;
if (argc == 2) {
input.assign(args[1]);
} else
std::cout << "Supply a SQL string to parse." << std::endl;
return 1;
}
sqltoast::parse_result_t res;
sqltoast::parse_input_t subject(input.cbegin(), input.cend());
res = sqltoast::parse(subject);
if (res.code == sqltoast::PARSE_OK)
std::cout << "Success!" << std::endl;
else if (res.code == sqltoast::PARSE_SYNTAX_ERROR)
std::cout << "Syntax error: " << res.error << std::endl;
}An important attribute of the sqltoast::parse_result_t struct is the
statements field, which is of type
std::vector<std::unique_ptr<sqltoast::statement_t>>. For each valid SQL
statement parsed by sqltoast::parse(), there will be a
std::unique_ptr<sqltoast::statement_t> subclassed struct in the statements
field.
The example program included in the
sqltoaster/ directory can be a good guide to lean about the
collection of sqltoast::statement_t structs.
The example program contains "printer functions" for each type of
sqltoast::statement_t struct that is part of the sqltoast library. The
example program outputs in a variety of formats, including YAML, so you can,
for example, see a YAML document that describes the parsed
sqltoast::statement_t type. Reading through the printer functions can give
you a good idea of the structure of these sqltoast::statement_t types and how
to utilize the parsed information.
After building the sqltoaster binary, let's ask it to parse a SELECT
statement and output the parsed information as a YAML document.
sqltoaster --yaml "SELECT a FROM t1"
statements:
- type: SELECT
selected_columns:
- column-reference[a]
referenced_tables:
- t1
(took 15790 nanoseconds)
By examining the sqltoaster::print::to_yaml() function in the sqltoaster
program, we can see how to read information about a particular
sqltoast::statement_t struct that is contained in the
sqltoast::parse_result_t struct's statements attribute.
NOTE: I've trimmed the code below and slightly modified it (removing some indenting-related fluff) for brevity.
First, let's take a look at the sqltoaster::print::to_yaml() function that
receives a sqltoast::statement_t object. This is our "top-level" YAML-output
function for SQL statements.
void to_yaml(printer_t& ptr, std::ostream& out, const sqltoast::statement_t& stmt) {
switch (stmt.type) {
// ...
case sqltoast::STATEMENT_TYPE_SELECT:
out << "type: SELECT";
{
const sqltoast::select_statement_t& sub =
static_cast<const sqltoast::select_statement_t&>(stmt);
to_yaml(ptr, out, sub);
}
break;
// ...
}
}The above demonstrates an important pattern deployed in the sqltoast library
and a fundamental technique used by the example sqltoaster program to sift
through the components of the parse tree.
For performance reasons, sqltoast is built with no run-time type
identification (RTTI). All type identification must be handled at compile time,
and due to this, users of the sqltoast library (in this case, the example
sqltoaster program) use a "type switch" approach to determine what class of
statement the supplied sqltoast::statement_t reference is and then use a
static_cast<> to convert the sqltoast::statement_t& into the appropriate
derived type.
Once you've static_cast<>'d to a reference of the derived type, you would
operate on that derived type reference, as shown in this code that examines the
elements of the sqltoast::select_statement_t sub-type and outputs those
elements into the YAML document.
void to_yaml(printer_t& ptr, std::ostream& out, const sqltoast::select_statement_t& stmt) {
if (stmt.distinct)
out << "distinct: true";
out << "selected_columns:";
for (const sqltoast::derived_column_t& dc : stmt.selected_columns)
out << "- " << dc;
out << "referenced_tables:";
for (const std::unique_ptr<sqltoast::table_reference_t>& tr : stmt.referenced_tables)
out << "- " << *tr;
if (stmt.where_condition)
out << "where:" << *stmt.where_condition;
if (! stmt.group_by_columns.empty()) {
out << "group_by:";
for (const sqltoast::grouping_column_reference_t& gcr : stmt.group_by_columns)
out << "- " << gcr;
}
if (stmt.having_condition)
out << "having:" << *stmt.having_condition;
}This source repository contains two directories with C++ code: the
libsqltoast directory contains the source code and headers
for the sqltoast library; the sqltoaster directory contains
source files for an example program that utilizes the sqltoast library.
Finally, there is a tests directory which contains various test files and test infrastructure.
sqltoast uses cmake as its primary build toolchain.
NOTE: If you add a new source or header file or dependency, you will need
to edit the CMakeLists.txt file in either the libsqltoast
directory or the sqltoaster directory.
To build the sqltoast library and sqltoaster example executable, first,
create a directory to use for building:
BUILD_DIR="$(pwd)/_build"
mkdir -p $BUILD_DIR
Then change directory into your build directory and run cmake:
cd $BUILD_DIR
cmake ..this will regenerate the Makefiles as needed and you may then build the
library and sqltoaster binary with a simple:
makeAfter building the library and sqltoaster binary, you can run tests by
changing into the tests/grammar directory and executing the
Python test runner, like so:
cd tests/grammar
python runner.py run
which will run all the grammar tests and output something like this:
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast$ cd tests/grammar/
jaypipes@uberbox:~/src/github.com/jaypipes/sqltoast/tests/grammar$ python runner.py run
Running ansi-92/alter-table ... OK
Running ansi-92/case-expressions ... OK
Running ansi-92/column-definitions ... OK
Running ansi-92/create-schema ... OK
Running ansi-92/create-table ... OK
Running ansi-92/create-view ... OK
Running ansi-92/datetime-expressions ... OK
Running ansi-92/delete ... OK
Running ansi-92/drop-schema ... OK
Running ansi-92/drop-table ... OK
Running ansi-92/drop-view ... OK
Running ansi-92/grant ... OK
Running ansi-92/identifiers ... OK
Running ansi-92/joins ... OK
Running ansi-92/insert-select ... OK
Running ansi-92/insert-values ... OK
Running ansi-92/interval-expressions ... OK
Running ansi-92/numeric-expressions ... OK
Running ansi-92/predicates ... OK
Running ansi-92/row-value-constructors ... OK
Running ansi-92/string-expressions ... OK
Running ansi-92/table-references ... OK
Running ansi-92/update ... OK