
painting by Hieronymus Bosch
There always comes some time when your custom made fast and pretty parser breaks because you need some important enhancement to your toy language. When adding some more signs to regular expressions is not enough, you can either:
We chose (1) for zafu’s ruby expressions (rubyless), but we have to take the second route for query builder.
We now have a ragel based parser that generates nice s-expressions from pseudo sql code. For example:
objects where event_at > REF_DATE + custom_a months
becomes:
[:query, [:filter, [:relation, "objects"], [:>, [:field, "event_at"], [:+, [:field, "REF_DATE"], [:field, "custom_a"]]]]]
If you indent this mess, you get something a Lisp coder would kill for:
[:query, [:filter, [:relation, "objects"], [:>, [:field, "event_at"], [:+, [:field, "REF_DATE"], [:field, "custom_a"] ] ] ] ]
This is quite fun: it reminds me of my old calculator with reverse polish notation.
Generating proper SQL from this is now simply a matter of processing this tree.
I did some testing with the ragel parser: ruby vs C extension. Both parsers do exactly the same work with the same actions. The ruby specific part of both parsers (actions) should compile to the same ruby code so the only difference between the two is the ragel generated code.
This test finally boils down to 1300 lines of ruby vs 1300 lines of C...

| suite | ruby | C |
|---|---|---|
| basic | 1.35 | 0.03 |
| errors | 0.74 | 0.02 |
| filters | 1.29 | 0.03 |
| total | 3.38 | 0.08 |
It’s quite obvious from these results that even though managing a C extension complicates deployment, it’s really worth it in terms of speed and probably also in terms of memory usage.
Gaspard Bucher
comments