439 lines
17 KiB
ReStructuredText
439 lines
17 KiB
ReStructuredText
|
|
.. _guidelines:
|
|
|
|
==========
|
|
Guidelines
|
|
==========
|
|
|
|
This section collects guidelines for writing Spicy parsers in the form
|
|
of best practices, useful patterns, and pitfalls to avoid. The content
|
|
is compiled from growing experience with real-world parsers by the
|
|
broader Spicy community. If you have anything to add here, please
|
|
:ref:`let us know <feedback>`.
|
|
|
|
.. note::
|
|
|
|
For now this section focuses on Spicy *performance*. We plan to
|
|
extend it further with common patterns and idioms for structuring
|
|
parsers. Contributions welcome.
|
|
|
|
.. _performance:
|
|
|
|
Spicy Performance
|
|
=================
|
|
|
|
This section provides advice on how to optimize CPU and memory usage
|
|
if you find your Spicy parser to consume more resources than you would
|
|
like.
|
|
|
|
As general note upfront, keep in mind that it can be tricky to
|
|
estimate the performance impact of some particular piece Spicy code.
|
|
Some seemingly simple Spicy constructs can turn into substantial
|
|
amounts of C++ code that may be expensive both at runtime and to
|
|
compile. On the other hand, some of Spicy's most powerful features,
|
|
like look-ahead parsing or failure recovery, add only relatively
|
|
little additional complexity to the code. On top of that, overall
|
|
performance often depends on how features are combined, or layered
|
|
across several units.
|
|
|
|
Hence, it's useful to remember the following four rules on Spicy
|
|
performance:
|
|
|
|
1. Do as much work as needed in your Spicy analyzer, but not more.
|
|
For example, what you can do in Zeek, do there.
|
|
|
|
2. If you're in doubt about the performance impact of some Spicy
|
|
code, do a benchmark.
|
|
|
|
3. Rule 2: If you think you know the performance impact of some
|
|
Spicy code, do a benchmark!
|
|
|
|
4. In practice, it might all not matter that much anyways!
|
|
|
|
To explain Rule 4: When you are writing a parser for a network
|
|
protocol in Spicy, the resulting performance impact is a direct result of the
|
|
amount of traffic that parser will be processing. For example, in a
|
|
typical Zeek setup, where you network carries a mix of various
|
|
protocols, your parser will see only a subset of the overall traffic.
|
|
And if you aren't going for the handful of most common high-volume
|
|
protocols (e.g., HTTP, DNS, TLS), then your parser will very likely
|
|
end up processing only a tiny fraction of the overall traffic. At that
|
|
point, its runtime performance is going to be dwarfed by everything
|
|
else Zeek is doing. Hence, take the following with a grain of salt.
|
|
Usually it's best to get you parser working first, then benchmark to
|
|
see if you need to improve performance, and finally find the primary
|
|
bottlenecks of needed.
|
|
|
|
.. _performance_runtime:
|
|
|
|
Runtime Performance
|
|
-------------------
|
|
|
|
Among other things, runtime performance is affected by:
|
|
- additional C++ code that is generated by the Spicy compiler
|
|
- use of expensive Spicy constructs
|
|
- poorly managed objects hogging memory (both on stack and heap)
|
|
|
|
Below are some strategies to make the runtime performance as efficient as possible.
|
|
|
|
|
|
.. _guidelines_runtime_public_units:
|
|
|
|
.. rubric:: Avoid declaring public units
|
|
|
|
The Spicy compiler generates additional code for ``public`` units,
|
|
thereby increasing both compile and execution times. Defining units as
|
|
``public`` is only required for top-level units where parsing starts
|
|
(i.e., with Zeek: the units that you enter into your EVT file). The
|
|
attribute can be omitted otherwise.
|
|
|
|
.. _guidelines_runtime_anonymous_fields:
|
|
|
|
.. rubric:: Turn unreferenced vectors into anonymous fields
|
|
|
|
Implement vectors that don't need to be referenced from your code as
|
|
:ref:`anonymous fields <anonymous_fields>` (i.e., do not provide a
|
|
name for the field). Normal, named fields store the whole vector
|
|
accumulating all parsed elements over the lifetime of the defining
|
|
unit, whereas anonymous vector field forgo storage, since nobody
|
|
could access the fields anyways.
|
|
|
|
The most common pitfall here is top-level units that parse a sequence
|
|
of PDUs:
|
|
|
|
.. code::
|
|
|
|
public type MyPDUs = unit {
|
|
msgs: Message[]; # DANGEROUS: accumulates all messages until end of session
|
|
: Message[]; # RECOMMENDED: anonymous field, no accumulation
|
|
};
|
|
|
|
.. _guidelines_runtime_skip:
|
|
|
|
.. rubric:: Skip unused fields
|
|
|
|
Don't parse data that is not needed. Parsing data into a field will
|
|
cause a copy of that data into a dedicated memory location. Instead,
|
|
use the :ref:`skip keyword <skip>` to discard data that isn't of
|
|
interest, which leads the compiler to generate optimized code for many
|
|
field types, including in particular ``bytes``, literals, and
|
|
generally fields of fixed size. Examples:
|
|
|
|
.. code::
|
|
|
|
public type Message = unit {
|
|
unused1: uint32; # LESS EFFICIENT: parse and store
|
|
unused2: skip uint32; # RECOMMENDED: skip over 4 bytes
|
|
|
|
unused3: bytes &eod; # LESS EFFICIENT: extract and store remaining data
|
|
unused4: skip bytes &eod; # RECOMMENDED: skip over remaining data
|
|
};
|
|
|
|
.. _guidelines_runtime_strings:
|
|
|
|
.. rubric:: Avoid Spicy strings and string manipulation
|
|
|
|
Use Spicy :ref:`strings <type_string>` sparsely in your analyzer, and
|
|
stick to :ref:`bytes <type_bytes>` instead where you can. Typically,
|
|
you would convert ``bytes`` to strings as late as possible just when
|
|
you need it, for example when passing it to functions expecting a
|
|
string, or preparing it for presentation to the user. When passing
|
|
data to Zeek event in your EVT files, ``bytes`` will be automatically
|
|
converted to Zeek strings, retaining their original byte-level
|
|
representation. That means that you don't need to convert them into a
|
|
string yourself at all unless you want to take character encodings
|
|
into account for the conversion through :spicy:method:`bytes::decode`.
|
|
|
|
As a corollary, avoid Spicy string manipulation. Always
|
|
manipulate/concatenate ``bytes`` and convert only the final result to
|
|
Spicy strings, probably in a ``%done`` hook or an EVT event. In
|
|
particular, format strings come with a cost to compute. In other
|
|
words, avoid use of ``%s`` to generate strings from ``bytes``.
|
|
|
|
.. _guidelines_runtime_temporary_vars:
|
|
|
|
.. rubric:: Don't use temporary variables of expensive types just for readability
|
|
|
|
Don't use temporary variables just to improve readability. In
|
|
particular, ``string`` and ``bytes`` (that are implemented as C++
|
|
strings under the hood and can be expensive to use) need to be
|
|
allocated and destroyed. This may introduce relevant overhead as it
|
|
cannot be guaranteed that the C++ compiler will be able to optimize away the temporary in
|
|
the code generated by Spicy. To improve readability, comments are the
|
|
tool of choice. If in doubt about the impact of a temporary, benchmark.
|
|
|
|
.. _guidelines_runtime_unnecessary_hooks:
|
|
|
|
.. rubric:: Remove unnecessary hooks
|
|
|
|
Multiple hook handlers with different priorities can be defined in
|
|
various places like inside a unit or, with Zeek, in EVT files.
|
|
However, hooks (either unit or field) should be avoided when not
|
|
needed. Using hooks comes with a performance cost because of
|
|
additional code generated by the compiler, which executes during
|
|
parsing. While ``%init`` hooks are often used for initializing unit
|
|
variables, we can often eliminate them by providing default values in
|
|
the variable definition:
|
|
|
|
.. code::
|
|
|
|
public type Message = unit {
|
|
|
|
on %init {
|
|
self.A1 = 23; # LESS EFFICIENT: explicit initialization through hook
|
|
}
|
|
|
|
var A1: uint32;
|
|
var A2: uint32 = 23; # RECOMMENDED: implicit initialization through default value
|
|
};
|
|
|
|
.. _guidelines_runtime_recursion:
|
|
|
|
.. rubric:: Avoid recursion
|
|
|
|
The Spicy compiler allows declaration of recursive units with runtime
|
|
conditions dictating when the recursion terminates. However, recursion
|
|
introduces additional overhead compared to unrolled linear/iterative
|
|
code performing similar functionality due to increasing the lifetimes
|
|
of units, data and their associated hooks; as well as less potential
|
|
for compiler optimization of additional internal machinery around the
|
|
recursive calls.
|
|
|
|
.. _guidelines_runtime_inline_units:
|
|
|
|
.. rubric:: Inline small nested units
|
|
|
|
The Spicy compiler is not smart enough yet to inline nested units.
|
|
Since declaration of units incurs additional cost to maintain their
|
|
associated state and hooks, it is advisable to manually inline small
|
|
units where performance is critical.
|
|
|
|
.. _guidelines_runtime_event_generation:
|
|
|
|
.. rubric:: Minimize event generation
|
|
|
|
With Zeek, minimize the number of generated events. Each instance of
|
|
an event comes with overhead as the parsed data needs to be converted
|
|
from Spicy's data model into Zeek's data model, which can involve heap
|
|
allocations even for simple types. It's the number of event instances
|
|
generated at runtime here that matters, *not* the number of event
|
|
types defined in the EVT files (although the latter may increase
|
|
compilation times).
|
|
|
|
.. _guidelines_runtime_aggregate_data:
|
|
|
|
.. rubric:: Aggregate data to be forwarded into other analyzers
|
|
|
|
When passing chunks of data back into Zeek through
|
|
``zeek::protocol_data_in``, it can be more efficient to aggregate
|
|
multiple chunks inside a temporary variable first, instead of
|
|
forwarding each chunk individually. This is because each chunk
|
|
forwarded to Zeek will go through its analyzer pipeline individually,
|
|
which incurs additional overhead.
|
|
|
|
.. _guidelines_runtime_global_constants:
|
|
|
|
.. rubric:: Move fixed local values into global constants
|
|
|
|
Inside functions and hooks, local variables are created and destroyed
|
|
every time the corresponding code executes. For non-trivial types,
|
|
that can lead to noticeable overhead. If the locals aren't modified,
|
|
consider moving them to global constants instead. (For some particular
|
|
expensive, non-mutable types, Spicy performs this optimization
|
|
internally already; for example, for regular expressions.)
|
|
|
|
.. _guidelines_runtime_small_byte_fields:
|
|
|
|
.. rubric:: Avoid small sized byte fields
|
|
|
|
Avoid using the ``bytes`` type for fields that could be handled using
|
|
integer types. As bytes will always allocate a C++ string under the
|
|
hood, using integer types can improve performance.
|
|
|
|
.. _guidelines_runtime_cpp:
|
|
|
|
.. rubric:: Consider outsourcing into C++
|
|
|
|
Consider outsourcing complex and performance-critical calculations required for your parsing
|
|
into custom C++ code. In particular, decoding bytes into special
|
|
string representations or peculiar time conversions might be
|
|
significantly faster when implemented in C++ directly. See
|
|
:ref:`extending` for more on how to do that.
|
|
|
|
.. _guidelines_runtime_state_management:
|
|
|
|
.. rubric:: State-management in analyzers
|
|
|
|
Try to avoid using global variables, such as maps, to store analyzer
|
|
state as that can cause significant memory bloat over longer periods
|
|
if not managed correctly. Instead, prefer to retain analyzer state
|
|
through a :ref:`%context <unit_context>` inside your top-level units,
|
|
and then propagate that context down through unit arguments for other
|
|
code to populate it. When used with Zeek, Spicy ties the context state
|
|
to individual connections that get teared down automatically when the
|
|
connection state is removed, thereby preventing accidental state space
|
|
explosion. Note, however, that even state maintained inside a
|
|
``%context`` will need additional manual management if it can grow
|
|
unbounded for long-running connections (like state tables that
|
|
continuously accumulate new information with each PDU).
|
|
|
|
.. _performance_toolchain:
|
|
|
|
Compilation Performance
|
|
-----------------------
|
|
|
|
Depending on the complexity of the Spicy code, it may take a bit (and
|
|
sometimes quite a while) to compile your parsers. In the following, we
|
|
collect some recommendations to speed up the compile process.
|
|
|
|
.. note::
|
|
|
|
When processing Spicy code, generally the bulk of the time
|
|
tends to be spent on compiling the generated C++ code; often about
|
|
80-90%. If you want to see a break-down of where Spicy spends its
|
|
time, run the Spicy compiler with ``--report-times``. In the
|
|
output at the end, ``jit`` refers to compiling generated C++ code.
|
|
|
|
.. _guidelines_compilation_precompile_headers:
|
|
|
|
.. rubric:: Precompile Headers
|
|
|
|
Make sure to run :ref:`spicy-precompile-headers
|
|
<parser-development-setup>` to speed up C++ compilation a little.
|
|
|
|
.. _guidelines_compilation_debug_builds:
|
|
|
|
.. rubric:: Faster Debug Builds
|
|
|
|
During development of new parsers, it helps quite a bit to build
|
|
non-optimized debug versions by adding ``--debug`` to the Spicy
|
|
compiler's command-line. This emits almost identical code, but then
|
|
compiles the generated code without ``-O2`` (i.e., not optimized),
|
|
which avoids some work the C++ compiler would otherwise do. The
|
|
produced HLTO will perform (much) less well so it is probably not
|
|
useful for production.
|
|
|
|
.. danger::
|
|
|
|
Do *not* run ``spicyc`` with ``--disable-optimizations`` as that
|
|
will actually generate *more* C++ code to compile.
|
|
|
|
When building a Spicy parser as a Zeek analyzer with the default package
|
|
template one can pass Spicy compilation flags via the ``SPICYZ_FLAGS`` CMake
|
|
variable, e.g., to build a parser in debug mode configure the parser with
|
|
|
|
.. code-block:: sh
|
|
|
|
$ cmake -DSPICYZ_FLAGS="--debug" <OTHER FLAGS>
|
|
|
|
For building with ``zkg`` you can add this flag to the CMake invocation
|
|
``zkg.meta``'s ``build_command``; this change is for development and likely
|
|
should not be published.
|
|
|
|
.. _guidelines_compilation_ccache:
|
|
|
|
.. rubric:: Use a compiler cache to speed up repeated compilations
|
|
|
|
C++ compilation can become the dominant factor in compilation time for Spicy
|
|
parsers. If you repeatedly compile the same file (this might even be an
|
|
unchanged module in your Spicy project) it is worthwhile to cache the C++
|
|
compilation results to avoid doing this work again.
|
|
|
|
To configure a compiler cache set the its invocation in the environment
|
|
variable ``HILTI_CXX_COMPILER_LAUNCHER``, e.g., to use an installed `ccache
|
|
<https://ccache.dev/>`_:
|
|
|
|
.. code-block:: sh
|
|
|
|
$ export HILTI_CXX_COMPILER_LAUNCHER=ccache
|
|
|
|
.. _guidelines_compilation_parallelism:
|
|
|
|
.. rubric:: Tweak compilation parallelism
|
|
|
|
When compiling generated C++ code by default Spicy will spawn as many parallel
|
|
compiler processes as there are cores. This often works well enough, but can
|
|
produce issues when e.g., (1) C++ compilation requires a lot of RAM so concurrent
|
|
processes might compete for it and end up swapping, or (2) if multiple parsers are
|
|
built in parallel as part of a bigger build setup. If this is something you
|
|
observe it might make sense to *reduce* the level of parallelism, e.g.,
|
|
|
|
.. code-block:: sh
|
|
|
|
# Run at most 4 parallel C++ compilation jobs.
|
|
$ export HILTI_JIT_PARALLELISM=4
|
|
|
|
Especially for case (1) it might make sense to check whether you can :ref:`switch to
|
|
a more efficient compiler<guidelines_compilation_switch_compiler>`.
|
|
|
|
.. _guidelines_compilation_switch_compiler:
|
|
|
|
.. rubric:: Consider switching to a more efficient compiler
|
|
|
|
*Compilation performance* of GCC and Clang can differ by a lot, e.g., GCC can
|
|
require 2-4GB of RAM to compile C++ files generated by Spicy while Clang might
|
|
only require 1-2GB. This can negatively impact performance if RAM becomes a
|
|
bottleneck and forces process memory into slower swap, see also
|
|
:ref:`guidelines_compilation_parallelism`. For this reason it can be worthwhile
|
|
to switch to Clang to speed up compilation, especially during development.
|
|
|
|
Spicy utilizes the same compiler for compiling generated C++ files that was
|
|
used for compiling Spicy itself. Binary packages are most often built with a
|
|
system compiler so going down this path requires a custom build of Spicy (or
|
|
Zeek if Spicy comes bundled with it). You can query the compiler Spicy would
|
|
use with ``spicy-config``, e.g.,
|
|
|
|
.. code-block:: sh
|
|
|
|
$ spicy-config --cxx
|
|
/usr/bin/c++
|
|
|
|
# '/usr/bin/c++' corresponds to gcc-12.2.0-14 on this system.
|
|
$ /usr/bin/c++ --version
|
|
c++ (Debian 12.2.0-14) 12.2.0
|
|
Copyright (C) 2022 Free Software Foundation, Inc.
|
|
This is free software; see the source for copying conditions. There is NO
|
|
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
|
|
|
|
To build Spicy with Clang instead configure its build with the following flags
|
|
|
|
.. code-block:: sh
|
|
|
|
$ ./configure --with-cxx-compiler=clang++ --with-c-compiler=clang --prefix=<MY CUSTOM PREFIX> <OTHER FLAGS>
|
|
|
|
To configure a Zeek build to use Clang set the ``CC`` and ``CXX`` environment
|
|
variables when making a clean build
|
|
|
|
.. code-block:: sh
|
|
|
|
# Environment variables only have an effect for a clean build.
|
|
$ rm -rf build
|
|
|
|
$ CXX=clang++ CC=clang ./configure --prefix=<MY CUSTOM PREFIX> <OTHER FLAGS>
|
|
|
|
After building and installing you should see a changed C++ compiler with
|
|
``spicy-config --cxx`` for your custom-built ``spicy-config``, e.g.,
|
|
|
|
.. code-block:: sh
|
|
|
|
$ spicy-config --cxx
|
|
clang++
|
|
|
|
.. danger::
|
|
|
|
While one can switch the compiler at runtime with the ``HILTI_CXX``
|
|
environment variable it is not the right tool to switch between GCC and
|
|
Clang since the compilers can produce ABI-incompatible code. This will in
|
|
the best case lead to linker failures (worst case: parsers might behave
|
|
incorrectly at runtime).
|
|
|
|
|
|
.. _guidelines_compilation_imports:
|
|
|
|
.. rubric:: Reduce number of imports required
|
|
|
|
By reducing the number of imports, i.e. source files, compilation from
|
|
scratch will become faster. There is a tradeoff, as multiple files may
|
|
allow for some incremental compilation if caching is used, and thus
|
|
may speed up subsequent builds.
|