zeek/auxil/spicy/doc/programming/guidelines.rst


.. _guidelines:

==========
Guidelines
==========

This section collects guidelines for writing Spicy parsers in the form
of best practices, useful patterns, and pitfalls to avoid. The content
is compiled from growing experience with real-world parsers by the
broader Spicy community.  If you have anything to add here, please
:ref:`let us know <feedback>`.

.. note::

    For now this section focuses on Spicy *performance*. We plan to
    extend it further with common patterns and idioms for structuring
    parsers. Contributions welcome.

.. _performance:

Spicy Performance
=================

This section provides advice on how to optimize CPU and memory usage
if you find your Spicy parser to consume more resources than you would
like.

As general note upfront, keep in mind that it can be tricky to
estimate the performance impact of some particular piece Spicy code.
Some seemingly simple Spicy constructs can turn into substantial
amounts of C++ code that may be expensive both at runtime and to
compile. On the other hand, some of Spicy's most powerful features,
like look-ahead parsing or failure recovery, add only relatively
little additional complexity to the code. On top of that, overall
performance often depends on how features are combined, or layered
across several units.

Hence, it's useful to remember the following four rules on Spicy
performance:

    1. Do as much work as needed in your Spicy analyzer, but not more.
       For example, what you can do in Zeek, do there.

    2. If you're in doubt about the performance impact of some Spicy
       code, do a benchmark.

    3. Rule 2: If you think you know the performance impact of some
       Spicy code, do a benchmark!

    4. In practice, it might all not matter that much anyways!

To explain Rule 4: When you are writing a parser for a network
protocol in Spicy, the resulting performance impact is a direct result of the
amount of traffic that parser will be processing. For example, in a
typical Zeek setup, where you network carries a mix of various
protocols, your parser will see only a subset of the overall traffic.
And if you aren't going for the handful of most common high-volume
protocols (e.g., HTTP, DNS, TLS), then your parser will very likely
end up processing only a tiny fraction of the overall traffic. At that
point, its runtime performance is going to be dwarfed by everything
else Zeek is doing. Hence, take the following with a grain of salt.
Usually it's best to get you parser working first, then benchmark to
see if you need to improve performance, and finally find the primary
bottlenecks of needed.

.. _performance_runtime:

Runtime Performance
-------------------

Among other things, runtime performance is affected by:
    - additional C++ code that is generated by the Spicy compiler
    - use of expensive Spicy constructs
    - poorly managed objects hogging memory (both on stack and heap)

Below are some strategies to make the runtime performance as efficient as possible.


.. _guidelines_runtime_public_units:

.. rubric:: Avoid declaring public units

The Spicy compiler generates additional code for ``public`` units,
thereby increasing both compile and execution times. Defining units as
``public`` is only required for top-level units where parsing starts
(i.e., with Zeek: the units that you enter into your EVT file). The
attribute can be omitted otherwise.

.. _guidelines_runtime_anonymous_fields:

.. rubric:: Turn unreferenced vectors into anonymous fields

Implement vectors that don't need to be referenced from your code as
:ref:`anonymous fields <anonymous_fields>` (i.e., do not provide a
name for the field). Normal, named fields store the whole vector
accumulating all parsed elements over the lifetime of the defining
unit, whereas anonymous vector field forgo storage, since nobody
could access the fields anyways.

The most common pitfall here is top-level units that parse a sequence
of PDUs:

.. code::

    public type MyPDUs = unit {
        msgs: Message[]; # DANGEROUS:   accumulates all messages until end of session
        : Message[];     # RECOMMENDED: anonymous field, no accumulation
    };

.. _guidelines_runtime_skip:

.. rubric::  Skip unused fields

Don't parse data that is not needed. Parsing data into a field will
cause a copy of that data into a dedicated memory location. Instead,
use the :ref:`skip keyword <skip>` to discard data that isn't of
interest, which leads the compiler to generate optimized code for many
field types, including in particular ``bytes``, literals, and
generally fields of fixed size. Examples:

.. code::

    public type Message = unit {
        unused1: uint32;            # LESS EFFICIENT: parse and store
        unused2: skip uint32;       # RECOMMENDED:    skip over 4 bytes

        unused3: bytes &eod;        # LESS EFFICIENT: extract and store remaining data
        unused4: skip bytes &eod;   # RECOMMENDED:    skip over remaining data
    };

.. _guidelines_runtime_strings:

.. rubric::  Avoid Spicy strings and string manipulation

Use Spicy :ref:`strings <type_string>` sparsely in your analyzer, and
stick to :ref:`bytes <type_bytes>` instead where you can. Typically,
you would convert ``bytes`` to strings as late as possible just when
you need it, for example when passing it to functions expecting a
string, or preparing it for presentation to the user. When passing
data to Zeek event in your EVT files, ``bytes`` will be automatically
converted to Zeek strings, retaining their original byte-level
representation. That means that you don't need to convert them into a
string yourself at all unless you want to take character encodings
into account for the conversion through :spicy:method:`bytes::decode`.

As a corollary, avoid Spicy string manipulation. Always
manipulate/concatenate ``bytes`` and convert only the final result to
Spicy strings, probably in a ``%done`` hook or an EVT event. In
particular, format strings come with a cost to compute. In other
words, avoid use of ``%s`` to generate strings from ``bytes``.

.. _guidelines_runtime_temporary_vars:

.. rubric::  Don't use temporary variables of expensive types just for readability

Don't use temporary variables just to improve readability. In
particular, ``string`` and ``bytes`` (that are implemented as C++
strings under the hood and can be expensive to use) need to be
allocated and destroyed. This may introduce relevant overhead as it
cannot be guaranteed that the C++ compiler will be able to optimize away the temporary in
the code generated by Spicy. To improve readability, comments are the
tool of choice. If in doubt about the impact of a temporary, benchmark.

.. _guidelines_runtime_unnecessary_hooks:

.. rubric::  Remove unnecessary hooks

Multiple hook handlers with different priorities can be defined in
various places like inside a unit or, with Zeek, in EVT files.
However, hooks (either unit or field) should be avoided when not
needed. Using hooks comes with a performance cost because of
additional code generated by the compiler, which executes during
parsing. While ``%init`` hooks are often used for initializing unit
variables, we can often eliminate them by providing default values in
the variable definition:

.. code::

    public type Message = unit {

        on %init {
            self.A1 = 23;     # LESS EFFICIENT: explicit initialization through hook
        }

        var A1: uint32;
        var A2: uint32 = 23;  # RECOMMENDED: implicit initialization through default value
    };

.. _guidelines_runtime_recursion:

.. rubric::  Avoid recursion

The Spicy compiler allows declaration of recursive units with runtime
conditions dictating when the recursion terminates. However, recursion
introduces additional overhead compared to unrolled linear/iterative
code performing similar functionality due to increasing the lifetimes
of units, data and their associated hooks; as well as less potential
for compiler optimization of additional internal machinery around the
recursive calls.

.. _guidelines_runtime_inline_units:

.. rubric::  Inline small nested units

The Spicy compiler is not smart enough yet to inline nested units.
Since declaration of units incurs additional cost to maintain their
associated state and hooks, it is advisable to manually inline small
units where performance is critical.

.. _guidelines_runtime_event_generation:

.. rubric::  Minimize event generation

With Zeek, minimize the number of generated events. Each instance of
an event comes with overhead as the parsed data needs to be converted
from Spicy's data model into Zeek's data model, which can involve heap
allocations even for simple types. It's the number of event instances
generated at runtime here that matters, *not* the number of event
types defined in the EVT files (although the latter may increase
compilation times).

.. _guidelines_runtime_aggregate_data:

.. rubric::  Aggregate data to be forwarded into other analyzers

When passing chunks of data back into Zeek through
``zeek::protocol_data_in``, it can be more efficient to aggregate
multiple chunks inside a temporary variable first, instead of
forwarding each chunk individually. This is because each chunk
forwarded to Zeek will go through its analyzer pipeline individually,
which incurs additional overhead.

.. _guidelines_runtime_global_constants:

.. rubric::  Move fixed local values into global constants

Inside functions and hooks, local variables are created and destroyed
every time the corresponding code executes. For non-trivial types,
that can lead to noticeable overhead. If the locals aren't modified,
consider moving them to global constants instead. (For some particular
expensive, non-mutable types, Spicy performs this optimization
internally already; for example, for regular expressions.)

.. _guidelines_runtime_small_byte_fields:

.. rubric::  Avoid small sized byte fields

Avoid using the ``bytes`` type for fields that could be handled using
integer types. As bytes will always allocate a C++ string under the
hood, using integer types can improve performance.

.. _guidelines_runtime_cpp:

.. rubric::  Consider outsourcing into C++

Consider outsourcing complex and performance-critical calculations required for your parsing
into custom C++ code. In particular, decoding bytes into special
string representations or peculiar time conversions might be
significantly faster when implemented in C++ directly. See
:ref:`extending` for more on how to do that.

.. _guidelines_runtime_state_management:

.. rubric::  State-management in analyzers

Try to avoid using global variables, such as maps, to store analyzer
state as that can cause significant memory bloat over longer periods
if not managed correctly. Instead, prefer to retain analyzer state
through a :ref:`%context <unit_context>` inside your top-level units,
and then propagate that context down through unit arguments for other
code to populate it. When used with Zeek, Spicy ties the context state
to individual connections that get teared down automatically when the
connection state is removed, thereby preventing accidental state space
explosion. Note, however, that even state maintained inside a
``%context`` will need additional manual management if it can grow
unbounded for long-running connections (like state tables that
continuously accumulate new information with each PDU).

.. _performance_toolchain:

Compilation Performance
-----------------------

Depending on the complexity of the Spicy code, it may take a bit (and
sometimes quite a while) to compile your parsers. In the following, we
collect some recommendations to speed up the compile process.

.. note::

    When processing Spicy code, generally the bulk of the time
    tends to be spent on compiling the generated C++ code; often about
    80-90%. If you want to see a break-down of where Spicy spends its
    time, run the Spicy compiler with ``--report-times``. In the
    output at the end, ``jit`` refers to compiling generated C++ code.

.. _guidelines_compilation_precompile_headers:

.. rubric:: Precompile Headers

Make sure to run :ref:`spicy-precompile-headers
<parser-development-setup>` to speed up C++ compilation a little.

.. _guidelines_compilation_debug_builds:

.. rubric:: Faster Debug Builds

During development of new parsers, it helps quite a bit to build
non-optimized debug versions by adding ``--debug`` to the Spicy
compiler's command-line. This emits almost identical code, but then
compiles the generated code without ``-O2`` (i.e., not optimized),
which avoids some work the C++ compiler would otherwise do. The
produced HLTO will perform (much) less well so it is probably not
useful for production.

.. danger::

    Do *not* run ``spicyc`` with ``--disable-optimizations`` as that
    will actually generate *more* C++ code to compile.

When building a Spicy parser as a Zeek analyzer with the default package
template one can pass Spicy compilation flags via the ``SPICYZ_FLAGS`` CMake
variable, e.g., to build a parser in debug mode configure the parser with

.. code-block:: sh

    $ cmake -DSPICYZ_FLAGS="--debug" <OTHER FLAGS>

For building with ``zkg`` you can add this flag to the CMake invocation
``zkg.meta``'s ``build_command``; this change is for development and likely
should not be published.

.. _guidelines_compilation_ccache:

.. rubric:: Use a compiler cache to speed up repeated compilations

C++ compilation can become the dominant factor in compilation time for Spicy
parsers. If you repeatedly compile the same file (this might even be an
unchanged module in your Spicy project) it is worthwhile to cache the C++
compilation results to avoid doing this work again.

To configure a compiler cache set the its invocation in the environment
variable ``HILTI_CXX_COMPILER_LAUNCHER``, e.g., to use an installed `ccache
<https://ccache.dev/>`_:

.. code-block:: sh

    $ export HILTI_CXX_COMPILER_LAUNCHER=ccache

.. _guidelines_compilation_parallelism:

.. rubric:: Tweak compilation parallelism

When compiling generated C++ code by default Spicy will spawn as many parallel
compiler processes as there are cores. This often works well enough, but can
produce issues when e.g., (1) C++ compilation requires a lot of RAM so concurrent
processes might compete for it and end up swapping, or (2) if multiple parsers are
built in parallel as part of a bigger build setup. If this is something you
observe it might make sense to *reduce* the level of parallelism, e.g.,

.. code-block:: sh

    # Run at most 4 parallel C++ compilation jobs.
    $ export HILTI_JIT_PARALLELISM=4

Especially for case (1) it might make sense to check whether you can :ref:`switch to
a more efficient compiler<guidelines_compilation_switch_compiler>`.

.. _guidelines_compilation_switch_compiler:

.. rubric:: Consider switching to a more efficient compiler

*Compilation performance* of GCC and Clang can differ by a lot, e.g., GCC can
require 2-4GB of RAM to compile C++ files generated by Spicy while Clang might
only require 1-2GB. This can negatively impact performance if RAM becomes a
bottleneck and forces process memory into slower swap, see also
:ref:`guidelines_compilation_parallelism`. For this reason it can be worthwhile
to switch to Clang to speed up compilation, especially during development.

Spicy utilizes the same compiler for compiling generated C++ files that was
used for compiling Spicy itself. Binary packages are most often built with a
system compiler so going down this path requires a custom build of Spicy (or
Zeek if Spicy comes bundled with it). You can query the compiler Spicy would
use with ``spicy-config``, e.g.,

.. code-block:: sh

    $ spicy-config --cxx
    /usr/bin/c++

    # '/usr/bin/c++' corresponds to gcc-12.2.0-14 on this system.
    $ /usr/bin/c++ --version
    c++ (Debian 12.2.0-14) 12.2.0
    Copyright (C) 2022 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

To build Spicy with Clang instead configure its build with the following flags

.. code-block:: sh

    $ ./configure --with-cxx-compiler=clang++ --with-c-compiler=clang --prefix=<MY CUSTOM PREFIX> <OTHER FLAGS>

To configure a Zeek build to use Clang set the ``CC`` and ``CXX`` environment
variables when making a clean build

.. code-block:: sh

    # Environment variables only have an effect for a clean build.
    $ rm -rf build

    $ CXX=clang++ CC=clang ./configure --prefix=<MY CUSTOM PREFIX> <OTHER FLAGS>

After building and installing you should see a changed C++ compiler with
``spicy-config --cxx`` for your custom-built ``spicy-config``, e.g.,

.. code-block:: sh

    $ spicy-config --cxx
    clang++

.. danger::

    While one can switch the compiler at runtime with the ``HILTI_CXX``
    environment variable it is not the right tool to switch between GCC and
    Clang since the compilers can produce ABI-incompatible code. This will in
    the best case lead to linker failures (worst case: parsers might behave
    incorrectly at runtime).


.. _guidelines_compilation_imports:

.. rubric:: Reduce number of imports required

By reducing the number of imports, i.e. source files, compilation from
scratch will become faster. There is a tradeoff, as multiple files may
allow for some incremental compilation if caching is used, and thus
may speed up subsequent builds.