330 lines
12 KiB
ReStructuredText
330 lines
12 KiB
ReStructuredText
|
|
.. _host_applications:
|
|
|
|
========================
|
|
Custom Host Applications
|
|
========================
|
|
|
|
Spicy provides a C++ API for integrating its parsers into custom host
|
|
applications. There are two different approaches to doing this:
|
|
|
|
1. If you want to integrate just one specific kind of parser, Spicy
|
|
can generate C++ prototypes for it that facilitate feeding data and
|
|
accessing parsing results.
|
|
|
|
2. If you want to write a generic host application that can support
|
|
arbitrary parsers, Spicy provides a dynamic runtime introspection
|
|
API for dynamically instantiating parsers and accessing results.
|
|
|
|
We discuss both approaches in the following.
|
|
|
|
.. note::
|
|
|
|
Internally, Spicy is a layer on top of an intermediary framework
|
|
called HILTI. It is the HILTI runtime library that implements most
|
|
of the functionality which we'll look at in this section, so you'll see
|
|
quite a bit of HILTI-side functionality. Spicy comes with a small
|
|
additional runtime library of its own that adds anythings that's
|
|
specific to the parsers it generates.
|
|
|
|
.. note::
|
|
|
|
The API for host applications isn't considered stable at this time
|
|
and specifics may change in future versions of HILTI/Spicy without
|
|
any migration/deprecation process.
|
|
|
|
.. _host_applications_specific:
|
|
|
|
Integrating a Specific Parser
|
|
=============================
|
|
|
|
We'll use our simple HTTP example from the :ref:`getting_started`
|
|
section as a running example for a parser we want to leverage from a
|
|
C++ application.
|
|
|
|
.. literalinclude:: examples/my_http.spicy
|
|
:lines: 4-
|
|
:caption: my_http.spicy
|
|
:language: spicy
|
|
|
|
First, we'll use :ref:`spicyc` to generate a C++ parser from the Spicy
|
|
source code::
|
|
|
|
# spicyc -x my_http my_http.spicy
|
|
|
|
The option ``-x`` (aka ``--output-c++-files``) tells ``spicyc`` that
|
|
we want it to generate C++ code for external compilation, rather than
|
|
directly turning the Spicy module into executable code. This generates
|
|
two C++ files that have their names prefixed with ``my_http_``::
|
|
|
|
# ls my_http_*.cc
|
|
my_http___linker__.cc my_http_MyHTTP.cc
|
|
|
|
We don't need to worry further what's in these files.
|
|
|
|
Next, ``spicyc`` can generate C++ prototypes for us that declare (1) a
|
|
set of parsing functions for feeding input into our parser, and (2) a
|
|
``struct`` type providing access to the parsed fields. That's done
|
|
through option ``-P`` (aka ``--output-prototypes``)::
|
|
|
|
# spicyc -P my_http my_http.spicy -o my_http.h
|
|
|
|
That'll leave the prototypes in ``my_http.h``. The content of that
|
|
generated header file tends to be a bit convoluted because it
|
|
(necessarily) also contains a bunch of Spicy internals. But stripped
|
|
down to the interesting parts, it looks like this for our example:
|
|
|
|
.. literalinclude:: examples/my_http-excerpt.h
|
|
|
|
You can see the ``struct`` definition corresponding to the public unit
|
|
type, as well as a set of parsing functions with three different
|
|
signatures:
|
|
|
|
``parse1``
|
|
The simplest form of parsing function receives a stream of input
|
|
data, along with an optional view into the stream to limit the
|
|
region to parse if desired and an optional context.
|
|
``parse1`` will internally instantiate an instance of the unit's
|
|
``struct``, and then feed the unit's parser with the data stream.
|
|
However, it won't provide access to what's being parsed as it
|
|
doesn't pass back the ``struct``.
|
|
|
|
``parse2``
|
|
The second form takes a pre-instantiated instance of the unit's
|
|
``struct`` type, which parsing will fill out. Once parsing
|
|
finishes, results can be accessed by inspecting the ``struct``
|
|
fields.
|
|
|
|
``parse3``
|
|
The third form takes a pre-instantiated instance of a generic,
|
|
type-erased unit type that the parsing will fill out. Accessing
|
|
the data requires use of HILTI's reflection API, which we will
|
|
discuss in :ref:`host_applications_generic`.
|
|
|
|
Spicy puts all these declarations into a namespace ``hlt_PREFIX``,
|
|
where ``PREFIX`` is the argument we specified to ``-P``. (If you leave
|
|
the ``PREFIX`` empty (``spicyc -P ''``), you get a namespace of just
|
|
``hlt::*``.)
|
|
|
|
Let's start by using ``parse1()``:
|
|
|
|
.. literalinclude:: examples/my_http-host-parse1.cc
|
|
:caption: my_http-host.cc
|
|
:lines: 10-36
|
|
:language: c++
|
|
|
|
This code first instantiates a stream from data giving on the command
|
|
line. It freezes the stream to indicate that no further data will
|
|
arrive later. Then it sends the stream into the ``parse1()`` function
|
|
for processing.
|
|
|
|
We can now use the standard C++ compiler to build all this into an
|
|
executable, leveraging ``spicy-config`` to add the necessary flags
|
|
for finding includes and libraries::
|
|
|
|
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
|
|
# ./my_http $'GET index.html HTTP/1.0\n'
|
|
GET, /index.html, 1.0
|
|
|
|
The output comes from the execution of the ``print`` statement inside
|
|
the Spicy grammar, demonstrating that the parsing proceeded as
|
|
expected.
|
|
|
|
.. note::
|
|
|
|
Above, when building the executable, we used ``clang++`` assuming
|
|
that that's the C++ compiler in use on the system. Generally, you
|
|
need to use the same compiler here as the one that Spicy itself
|
|
got build with, to ensure that libraries and C++ ABI match. To
|
|
ensure that you're using the the right compiler (e.g., if there
|
|
are multiple on the system, or if it's not in ``PATH``),
|
|
:ref:`spicy-config` can print out the full path to the expected
|
|
one through its ``--cxx`` option. You can even put that directly
|
|
into the build command line::
|
|
|
|
# $(spicy-config --cxx) -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
|
|
|
|
When using ``parse1()`` we don't get access to the parsed information.
|
|
If we want that, we can use ``parse2()`` instead and provide it with a
|
|
``struct`` to fill in:
|
|
|
|
.. literalinclude:: examples/my_http-host-parse2.cc
|
|
:caption: my_http-host.cc
|
|
:lines: 10-45
|
|
:emphasize-lines: 19-28
|
|
:language: c++
|
|
|
|
::
|
|
|
|
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
|
|
# ./my_http $'GET index.html HTTP/1.0\n'
|
|
GET, /index.html, 1.0
|
|
method : GET
|
|
uri : /index.html
|
|
version: 1.0
|
|
|
|
Another approach to retrieving field values goes through Spicy hooks
|
|
calling back into the host application. That's how Zeek's Spicy support
|
|
operates. Let's say we want to execute a custom C++ function every
|
|
time a ``RequestList`` has been parsed. By adding the following code
|
|
to ``my_http.spicy``, we (1) declare that function on the Spicy-side,
|
|
and (2) implement a Spicy hook that calls it:
|
|
|
|
.. literalinclude:: examples/my_http-host-callback.cc
|
|
:caption: my_http.spicy
|
|
:start-after: doc-start-callback-spicy
|
|
:end-before: doc-end-callback-spicy
|
|
:language: spicy
|
|
|
|
The ``&cxxname`` attribute for ``got_request_line`` indicates to Spicy
|
|
that this is a function implemented externally inside custom C++ code,
|
|
accessible through the given name. Now we need to implement that
|
|
function:
|
|
|
|
.. literalinclude:: examples/my_http-host-callback.cc
|
|
:caption: my_http-callback.cc
|
|
:start-after: doc-start-callback-cc
|
|
:end-before: doc-end-callback-cc
|
|
:language: c++
|
|
|
|
Finally, we compile it altogether like before, but now including our
|
|
additional custom C++ file::
|
|
|
|
# spicyc -x my_http my_http.spicy
|
|
# spicyc -P my_http my_http.spicy -o my_http.h
|
|
# clang++ -o my_http my_http-callback.cc my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
|
|
# ./my_http $'GET index.html HTTP/1.0\n'
|
|
In C++ land: GET, index.html, 1.0
|
|
GET, index.html, 1.0
|
|
|
|
Note that the C++ function signature needs to match what Spicy
|
|
expects, based on the Spicy-side prototype. If you are unsure how
|
|
Spicy arguments translate into C++ arguments, look at the C++
|
|
prototype that's included for the callback function in the output of
|
|
``-P``.
|
|
|
|
.. _host_applications_generic:
|
|
|
|
Supporting Arbitrary Parsers
|
|
============================
|
|
|
|
This approach is more complex, and we'll just briefly describe the
|
|
main pieces here. All of the tools coming with Spicy support arbitrary
|
|
parsers and can serve as further examples (e.g., :ref:`spicy-driver`,
|
|
:ref:`spicy-dump`, :ref:`zeek_plugin`). Indeed, they all
|
|
build on the same C++ library class ``spicy::rt::Driver`` that
|
|
provides a higher-level API to working with Spicy's parsers in a
|
|
generic fashion. We'll do the same in the following.
|
|
|
|
Retrieving Available Parsers
|
|
----------------------------
|
|
|
|
The first challenge for a generic host application is that it cannot
|
|
know what parsers are even available. Spicy's runtime library provides
|
|
an API to get a list of all parsers that are compiled into the current
|
|
process. Continuing to use the ``my_http.spicy`` example, this code
|
|
prints out our one available parser:
|
|
|
|
.. literalinclude:: examples/my_http-host-driver.cc
|
|
:caption: my_http-host.cc
|
|
:lines: 9-14,31-44,59-64
|
|
:language: c++
|
|
|
|
::
|
|
|
|
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
|
|
# ./my_http
|
|
Available parsers:
|
|
|
|
MyHTTP::RequestLine
|
|
|
|
Using the name of the parser (``MyHTTP::RequestLine``) we can
|
|
instantiate it from C++, and then feed it data:
|
|
|
|
.. literalinclude:: examples/my_http-host-driver.cc
|
|
:lines: 44-53
|
|
:language: c++
|
|
|
|
::
|
|
|
|
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
|
|
# ./my_http $'GET index.html HTTP/1.0\n'
|
|
GET, /index.html, 1.0
|
|
|
|
That's the output of the ``print`` statement once more.
|
|
|
|
``unit`` is of type ``spicy::rt::ParsedUnit``, which is a type-erased
|
|
class holding, in this case, an instance of
|
|
``_hlt::MyHTTP::RequestLine``. Internally, that instance went through
|
|
the ``parse3()`` function that we have encountered in the previous
|
|
section. To access the parsed fields, there's a visitor API to iterate
|
|
generically over HILTI types like this unit:
|
|
|
|
.. literalinclude:: examples/my_http-host-driver.cc
|
|
:lines: 15-30
|
|
:language: c++
|
|
|
|
Adding ``print(unit->value())`` after the call to ``processInput()``
|
|
then gives us this output:
|
|
|
|
::
|
|
|
|
# clang++ -o my_http my_http-host.cc my_http___linker__.cc my_http_MyHTTP.cc $(spicy-config --cxxflags --ldflags)
|
|
# ./my_http $'GET index.html HTTP/1.0\n'
|
|
GET, /index.html, 1.0
|
|
method: GET
|
|
uri: /index.html
|
|
version: number: 1.0
|
|
|
|
Our visitor code implements just what we need for our example. The
|
|
source code of ``spicy-dump`` shows a full implementation covering all
|
|
available types.
|
|
|
|
So far we have compiled the Spicy parsers statically into the
|
|
generated executable. The runtime API supports loading them
|
|
dynamically as well from pre-compiled ``HLTO`` files through the class
|
|
``hilti::rt::Library``. Here's the full example leveraging that,
|
|
taking the file to load from the command line:
|
|
|
|
.. literalinclude:: examples/my_http-host-driver-hlto.cc
|
|
:caption: my-driver.cc
|
|
:lines: 9-70
|
|
:emphasize-lines: 27-31
|
|
:language: c++
|
|
|
|
::
|
|
|
|
# clang++ -o my-driver my-driver.cc $(spicy-config --cxxflags --ldflags --dynamic-loading)
|
|
# spicyc -j -o my_http.hlto my_http.spicy
|
|
# printf "GET /index.html HTTP/1.0\n\n<dummy>" > data
|
|
# ./my-driver my_http.hlto MyHTTP::RequestLine "$(cat data)"
|
|
Available parsers:
|
|
|
|
MyHTTP::RequestLine
|
|
|
|
GET, /index.html, 1.0
|
|
method: GET
|
|
uri: /index.html
|
|
version: number: 1.0
|
|
|
|
.. note::
|
|
|
|
Note the addition of ``--dynamic-loading`` to the ``hilti-config``
|
|
command line. That's needed when the resulting binary will
|
|
dynamically load precompiled Spicy parsers because linker flags
|
|
need to be slightly adjusted in that case.
|
|
|
|
API Documentation
|
|
=================
|
|
|
|
We won't go further into details of the HILTI/Spicy runtime API here.
|
|
Please see :ref:`doxygen` for more on that, the namespaces
|
|
``hilti::rt`` and ``spicy::rt`` cover what's available to host
|
|
applications.
|
|
|
|
Our examples always passed the full input at once. You don't need to
|
|
do that, Spicy's parsers can process input incrementally as it comes
|
|
in, and return back to the caller to retrieve more. See the source of
|
|
:repo:`spicy::Driver::processInput() <spicy/runtime/src/driver.cc>`
|
|
for an example of how to implement that.
|