zeek/auxil/spicy/doc/programming/debugging.rst


.. _debugging:

=========
Debugging
=========

It can be challenging to track down the specifics of what a parser is
doing (or not doing) because often there's no directly observable
effect. To make that easier, Spicy comes with debugging support that
helps during parser development.

Generally, debugging support requires running ``spicyc`` or
``spicy-driver`` with option ``-d``; that enables generating debug
versions of the generated C++ code. In addition, the option ``-X
<tag>`` may enable additional, more expensive debug instrumentation,
as discussed below. Any use of ``-X`` implicitly turns on ``-d``.

Debug Hooks
===========

The simplest way to learn more about what's going on is to add hooks
with ``print`` statements to your grammar. That's rather disruptive
though, and hence there are also special ``%debug`` unit hooks which
only get compiled into the resulting code if ``spicy-driver`` is run
with debugging enabled (``-d``):

.. spicy-code:: debug-hooks.spicy

    module Test;

    public type test = unit {
        a: /1234/ %debug {
            print self.a;
        }

        b: /567890/;

        on b %debug { print self.b; }
    };


.. spicy-output:: debug-hooks.spicy 1
    :exec: printf "1234567890" | spicy-driver -d %INPUT
    :show-with: debugging.spicy

.. spicy-output:: debug-hooks.spicy 2
    :exec: printf "1234567890" | spicy-driver %INPUT
    :show-with: debugging.spicy

Debug Streams
=============

A second form of debugging support uses runtime *debug streams* that
instrument the generated parsers to log activity as they are parsing
their input. If you run ``spicy-driver`` with ``-d``, you can set the
environment variable ``HILTI_DEBUG`` to a set of debug stream names to
select the desired information (see below for the list). Execution
will then print debug information to standard error::

    > echo "GET /index.html HTTP/1.0" | HILTI_DEBUG=spicy spicy-driver -d http-request.spicy
    [spicy] Request::RequestLine
    [spicy]   method = GET
    [spicy]   anon_2 =
    [spicy]   uri = /index.html
    [spicy]   anon_3 =
    [spicy]   Request::Version
    [spicy]     anon = HTTP/
    [spicy]     number = 1.0
    [spicy]   version = [$anon=b"HTTP/", $number=b"1.0"]
    [spicy]   anon_4 = \n
    GET, /index.html, 1.0

The available debug streams include:

``spicy``
    Logs unit fields and variables as they receive values. This is
    often the most helpful output as it shows rather concisely what
    the parser is doing, and in particular how far it gets in cases
    where it doesn't parse something correctly.

``spicy-verbose``
    Logs various internals about the parsing process, including the
    grammar rules currently being parsed, the current input, and lexer
    tokens.

    This stream is primarily intended for debugging the Spicy compiler
    itself, but it can be helpful also in particular for understanding
    the data that remains to be parsed.

.. _hilti-trace:

``hilti-trace``
    This is a HILTI-level debug stream that records every HILTI
    instruction being executed. To use this, you need to run
    ``spicy-driver`` with ``-X trace``.

    This stream is primarily intended for debugging the Spicy
    compiler itself.

``hilti-flow``
    This is a HILTI-level debug level recording flow information like
    function calls. To use this, you need to run ``spicy-driver`` with
    ``-X flow``.

    This stream is primarily intended for debugging the Spicy compiler
    itself, although it may also be helpful to understand the internal
    control flow when writing a grammar.

Multiple streams can be enabled by separating them with colons.

Exceptions
==========

When encountering runtime errors, Spicy by default triggers C++
exceptions that bubble up back to the host application. If not handled
there, execution will stop. For debugging, you can also let the Spicy
runtime system ``abort()`` with a core dump, instead of throwing an
exception, by running ``spicy-driver`` with ``--abort-on-exceptions``.
That especially helps inside a debugger.

If in addition you specify ``--show-backtraces`` as well, it will
print a stack trace before aborting (assuming support for that is
available on your platform).

Inspecting Generated Code
=========================

Using ``spicyc`` you can inspect the code that's being generated for a
given Spicy grammar:

- ``spicyc -p`` outputs the intermediary HILTI code. The code tends to
  be pretty intuitively readable. Even if you don't know all the
  specifics of HILTI, much of the code is rather close to Spicy
  itself. (Per :ref:`above <hilti-trace>`, you can trace the generated
  HILTI code as it executes by activating the ``hilti-trace`` debug
  stream).

- ``spicyc -c`` outputs the final C++ code. If you add ``-L``, the
  output will also include additional code generated by HILTI's
  linker (which enables cross-module functionality).

- When JITing a grammar with ``spicyc -j``, running with ``-D
  dump-code`` will record all generated intermediary code (HILTI code,
  C++ code, object files) into files ``dbg.*`` inside the current
  directory.

Skipping Validation
===================

When working on the Spicy code, it can be helpful to disable internal
validation of generated HILTI code with ``-V``. That way, one can
often still see the HILTI code even if it's malformed. Note, however,
that Spicy may end up crashing if broken HILTI code gets passed into
later stages.

Interpreting Stack Traces
=========================

When looking at stack traces taken during Spicy parsing (e.g., from
``--show-backtraces``, or inside a debugger or profiler), it's
important to understand that the stack trace will show internal HILTI
functions that are being executed, not the Spicy grammar rules. While
HILTI functions are generated from the Spicy grammar, and bear some
resemblance to its structure, the mapping is not always
straightforward.

In addition, stack traces may be hard to interpret because the
generated code runs inside *fibers*, which are a form of lightweight
threads. Fibers are managed by HILTI's runtime system and are used to
implement the non-blocking parsing. As a result of the fibers, stack
traces don't show the full call chain up into the host application,
but stop prematurely at the point where execution switched from the
host application into a fiber.

Here's an example stack trace taken during an execution of
``spicy-driver`` with a small Spicy grammar printing some parsed
value, as shown by ``lldb``::

    frame #0:  void hilti::rt::print<hilti::rt::ValueReference<__hlt::Test::Foo>>
    frame #1:  __hlt::Test::__hook_Foo___on_0x25_done_7fba
    frame #2:  __hlt::Test::__hook_Foo___on_0x25_done
    frame #3:  __hlt::Test::Foo::__on_0x25_done
    frame #4:  __hlt::Test::Foo::__parse_Test__Foo_stage2
    frame #5:  __hlt::Test::Foo::__parse_stage1
    frame #6:  __hlt::Test::Foo::parse3
    frame #7:  hlt::Test::Foo::parse3
    frame #8:  hilti::rt::detail::Callback::Callback<hlt::Test::Foo::parse3>
    frame #9:  hilti::rt::detail::Callback::Callback<hlt::Test::Foo::parse3>
    frame #10: libhilti.dylib`hilti::rt::detail::Callback::operator
    frame #11: libhilti.dylib`__fiber_run_trampoline
    frame #12: libhilti.dylib`fiber_asm_invoke at fiber_asm_aarch64_apcs.S:83

``__fiber_asm_invoke`` is the original entry point where control
transferred into the fiber. Note that the stack trace doesn't show
anything further beyond that point: no function of ``spicy-driver``
shows up.

While this example is relatively clean, depending on the OS and
toolchain, sometimes there's additional noise in the stack trace just
beyond that ``fiber_asm_invoke`` entry. In particular, profiling tools
trying to aggregate multiple stack traces may get confused by the
fibers, and can end up breaking up activity that belongs together.
Sometimes this can be solved through some custom pre-processing of the
stack traces, removing/ignoring anything beyond the fiber entry point
(``hilti::rt::detail::Callback::operator`` is a good cut-off point).