zeek/auxil/spicy/doc/toolchain.rst
Patrick Kelley 8fd444092b initial
2025-05-07 15:35:15 -04:00

267 lines
11 KiB
ReStructuredText

.. _toolchain:
=========
Toolchain
=========
.. _spicy-build:
``spicy-build``
===============
``spicy-build`` is a shell frontend that compiles Spicy source code
into a standalone executable by running :ref:`spicyc` to generate the
necessary C++ code, then spawning the system compiler to compile and
link that.
.. spicy-output:: usage-spicy-build
:exec: spicy-build -h
.. _spicy-config:
``spicy-config``
================
``spicy-config`` reports information about Spicy's build &
installation options.
.. spicy-output:: usage-spicy-config
:exec: spicy-config -h
.. _spicyc:
``spicyc``
==========
``spicyc`` compiles Spicy code into C++ output, optionally also
executing it directly through JIT.
.. spicy-output:: usage-spicyc
:exec: spicyc -h
``spicyc`` also supports the following environment variables to
control the compilation process:
``SPICY_PATH``
Replaces the built-in search path for `*.spicy` source files.
``SPICY_CACHE``
Location for storing precompiled C++ headers. Default is ``~/.cache/spicy/<VERSION>``.
``HILTI_CXX``
Specifies the path to the C++ compiler to use.
``HILTI_CXX_COMPILER_LAUNCHER``
Specifies a command to prefix compiler invocations with during JIT.
This can e.g., be used to use a compiler cache like
`ccache <https://ccache.dev/>`_. If Spicy was configured with e.g.,
``--with-hilti-compiler-launcher=ccache`` (the equivalent CMake option
is ``HILTI_COMPILER_LAUNCHER``) ``ccache`` would automatically be used
during JIT. Setting this variable to an empty value disables use of
``ccache`` in that case.
``HILTI_CXX_FLAGS``
Specifies additional flags to pass during C++ compilation. This will be
added after all implicit arguments. Use ``HILTI_CXX_INCLUDE_DIRS`` to
specify additional include directories.
``HILTI_CXX_INCLUDE_DIRS``
Specifies additional, colon-separated C++ include directories to
search for header files. Directories passed via
``HILTI_CXX_INCLUDE_DIRS`` will be searched for headers before any
header search paths implicit in Spicy C++ compilation.
``HILTI_JIT_PARALLELISM``
Set to specify the maximum number of background compilation jobs to run
during JIT. Defaults to number of cores.
``HILTI_JIT_SEQUENTIAL``
Set to prevent spawning multiple concurrent C++ compiler instances.
This overrides any value set for ``HILTI_JIT_PARALLELISM`` and
effectively sets it to one.
``HILTI_OPTIMIZER_PASSES``
Colon-separated list of optimizer passes to activate. If unset uses the
default-enabled set.
``HILTI_PATH``
Replaces the built-in search path for `*.hlt` source files.
``HILTI_PRINT_SETTINGS``
Set to see summary of compilation options.
.. _spicy-driver:
``spicy-driver``
================
``spicy-driver`` is a standalone Spicy host application that compiles
and executes Spicy parsers on the fly, and then feeds them data for
parsing from standard input.
.. spicy-output:: usage-spicy-driver
:exec: spicy-driver -h
``spicy-driver`` supports the same environment variables as
:ref:`spicyc`.
Specifying the parser to use
----------------------------
If there's only single ``public`` unit in the Spicy source code,
``spicy-driver`` will automatically use that for parsing its input. If
there's more than one public unit, you need to tell ``spicy-driver``
which one to use through its ``--parser`` (or ``-p``) option. To see
the parsers that are available, use ``--list-parsers`` (or ``-l``).
In addition to the names shown by ``--list-parsers``, you can also
specify a parser through a port or MIME type if the corresponding unit
:ref:`defines them through properties <unit_meta_data>`. For example,
if a unit defines ``%port = 80/tcp``, you can use ``spicy-driver -p
80/tcp`` to select it. To specify a direction, add either ``%orig`` or
``%resp`` (e.g., ``-p 80/tcp%resp``); then only units with a port
tagged with an ``&originator`` or ``&responder`` attribute,
respectively, will be considered. If a unit defines ``%mime-type =
application/test``, you can select it through ``spicy-driver -p
application/test``.
.. versionadded:: 1.13 Verbose mode for ``list-parsers``
Internally, these port-based arguments for ``-p`` are alias names for
existing parsers. You can see all aliases by running ``spicy-driver``
with ``-ll`` (i.e., ``--list-parsers`` twice).
.. _spicy-driver-batch:
Batch input
-----------
``spicy-driver`` provides a batch input mode for processing multiple
interleaved input flows in parallel, mimicking how host applications
like Zeek would be employing Spicy parsers for processing many
sessions concurrently. The batch input must be prepared in a specific
format (see below) that provides embedded meta information about the
contained flows of input. If you have Zeek at hand, the easiest way to
generate such a batch is `a script coming with Zeek
<https://github.com/zeek/zeek/blob/master/scripts/policy/frameworks/spicy/record-spicy-batch.zeek>`_.
If you run Zeek with this script on a PCAP trace, it will record the
contained TCP and UDP sessions
into a Spicy batch file::
# zeek -b -r http/methods.trace policy/frameworks/spicy/record-spicy-batch
tracking [orig_h=128.2.6.136, orig_p=46562/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46563/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46564/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46565/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46566/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46567/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
[...]
tracking [orig_h=128.2.6.136, orig_p=46608/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46609/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
tracking [orig_h=128.2.6.136, orig_p=46610/tcp, resp_h=173.194.75.103, resp_p=80/tcp]
recorded 49 sessions total
output in batch.dat
You will now have a file ``batch.dat`` that you can use with
``spicy-driver -F batch.data ...``.
By default, the batch created by the Zeek script will select parsers for the
contained sessions through well-known ports. That means your units
need to have a ``%port`` property matching the responder port of the
sessions you want them to parse. So for the HTTP trace above, our
Spicy source code would need to provide a public unit with property
``%port = 80/tcp;``.
.. versionadded:: 1.13 ``--parser-alias``
Alternatively, you can run ``spicy-driver`` with ``--parser-alias
PORT=PARSER`` to tell it explicitly which parsers to use for
connections on a particular port. Here, ``PORT`` must be of the form
``<port>/<protocol>`` (e.g., ``80/tcp``), and ``PARSER`` is the name
of the parser to use (as shown by ``spicy-driver --list-parsers``). By
default, the parser will be applied to both directions of all
connections that are using that responder port. You can limit the
direction by appending either ``%orig`` or ``%resp`` to ``PORT``
(e.g., ``80/tcp%orig`` to attach the parser only to originator-side
flows). ``--parser-alias`` can be used multiple times to specify
further mappings.
In case you want to create batches yourself, we document the batch
format in the following. A batch needs to start with a line
``!spicy-batch v2<NL>``, followed by lines with commands of the form
``@<tag> <arguments><NL>``.
There are two types of input that the batch format can represent: (1)
individual, uni-directional flows; and (2) bi-directional connections
consisting in turn of one flow per side. The type is determined
through an initial command: ``@begin-flow`` starts a flow flow, and
``@begin-conn`` starts a connection. Either form introduces a unique,
free-form ID that subsequent commands will then refer to. The
following commands are supported:
``@begin-flow FID TYPE PARSER<NL>``
Initializes a new input flow for parsing, associating the unique
ID ``FID`` with it. ``TYPE`` must be either ``stream`` for
stream-based parsing (think: TCP), or ``block`` for parsing each
data block independent of others (think: UDP). ``PARSER`` is the
name of the Spicy parser to use for parsing this input flow,
given in the same form as with ``spicy-driver``'s ``--parser``
option (i.e., either as a unit name, a ``%port``, or a
``%mime-type``).
``@begin-conn CID TYPE ORIG_FID ORIG_PARSER RESP_FID RESP_PARSER<NL>``
Initializes a new input connection for parsing, associating the
unique connection ID ``CID`` with it. ``TYPE`` must be either
``stream`` for stream-based parsing (think: TCP), or ``block`` for
parsing each data block independent of others (think: UDP).
``ORIG_FID`` is separate unique ID for the originator-side flow,
and ``ORIG_PARSER`` is the name of the Spicy parser to use for
parsing that flow. ``RESP_FID`` and ``RESP_PARSER`` work
accordingly for the responder-side flow. The parsers can be given
in the same form as with ``spicy-driver``'s ``--parser`` option
(i.e., either as a unit name, a ``%port``, or a ``%mime-type``).
``@data FID SIZE<NL>``
A block of data for the input flow ``FID``. This command must be
followed directly by binary data of length ``SIZE``, plus a final
newline character. The data represents the next chunk of input for
the corresponding flow. ``@data`` can be used only inside
corresponding ``@begin-*`` and ``@end-*`` commands bracketing the
flow ID.
``@gap FID SIZE<NL>``
A gap of size ``SIZE``. This inserts a gap into the input stream
that will trigger a parse error once the parser reaches it. If the
parser supports error recovery, it will then attempt to continue
processing after the gap. ``@gap`` is similar to how a host
application like Zeek would report TCP reassembly gaps caused by
missing packets.
``@end-flow FID<NL>``
Finalizes parsing of the input flow associated with ``FID``,
releasing all state. This must come only after a corresponding
``@begin-flow`` command, and every ``@begin-flow`` must eventually
be followed by an ``@end-flow``.
``@end-conn CID<NL>``
Finalizes parsing the input connection associated with ``CID``,
releasing all state (including for its two flows). This must come
only after a corresponding ``@begin-conn`` command, and every
``@begin-conn`` must eventually be followed by an ``@end-end``.
.. _spicy-dump:
``spicy-dump``
==============
``spicy-dump`` is a standalone Spicy host application that compiles
and executes Spicy parsers on the fly, feeds them data for processing,
and then at the end prints out the parsed information in either a
readable, custom ASCII format, or as JSON (``--json`` or ``-J``). By
default, ``spicy-dump`` disables showing the output of Spicy ``print``
statements, ``--enable-print`` or ``-P`` reenables that.
.. spicy-output:: usage-spicy-dump
:exec: spicy-dump -h