239 lines
9.3 KiB
ReStructuredText
239 lines
9.3 KiB
ReStructuredText
|
|
.. _getting_started:
|
|
|
|
Getting Started
|
|
===============
|
|
|
|
The following gives a short overview how to write and use Spicy
|
|
parsers. We won't use many of Spicy's features yet, but we we'll walk
|
|
through some basic code examples and demonstrate typical usage of the
|
|
Spicy toolchain.
|
|
|
|
Hello, World!
|
|
-------------
|
|
|
|
Here's a simple "Hello, world!" in Spicy:
|
|
|
|
.. literalinclude:: examples/hello.spicy
|
|
:lines: 4-
|
|
:language: spicy
|
|
|
|
Assuming that's stored in ``hello.spicy``, you can compile and execute
|
|
the code with Spicy's standalone compiler ``spicyc``::
|
|
|
|
# spicyc -j hello.spicy
|
|
Hello, world!
|
|
|
|
``spicyc -j`` compiles the source code into native code on the fly
|
|
using your system's C++ compiler, and then directly executes the
|
|
result. If you run ``spicyc -c hello.spicy``, you will see the C++
|
|
code that Spicy generates behind the scenes.
|
|
|
|
You can also precompile the code into an object file, and then load
|
|
that for immediate execution::
|
|
|
|
# spicyc -j -o hello.hlto hello.spicy
|
|
# spicyc -j hello.hlto
|
|
Hello, world!
|
|
|
|
To compile Spicy code into an actual executable on disk, use
|
|
``spicy-build``::
|
|
|
|
# spicy-build -o a.out hello.spicy
|
|
# ./a.out
|
|
Hello, world!
|
|
|
|
``spicy-build`` is a small shell script that wraps ``spicyc -x`` and
|
|
runs the resulting code through the system's C++ compiler to produce
|
|
an executable.
|
|
|
|
.. note::
|
|
|
|
.. image:: _static/hilti-logo.png
|
|
:align: left
|
|
:width: 50
|
|
|
|
Internally, Spicy employs another intermediary language called
|
|
*HILTI* that sits between the Spicy source code and the generated
|
|
C++ output. For more complex Spicy grammars, the HILTI code is
|
|
often far easier to comprehend than the final C++ code, in
|
|
particular once we do some actual parsing. To see that
|
|
intermediary HILTI code, execute ``spicy -p hello.spicy``. The
|
|
``.hlto`` extension comes from HILTI as well: It's a
|
|
HILTI-generated object file.
|
|
|
|
A Simple Parser
|
|
---------------
|
|
|
|
To actually parse some data, we now look at a small example dissecting
|
|
HTTP-style request lines, such as: ``GET /index.html HTTP/1.0``.
|
|
|
|
Generally, in Spicy you define parsers through types called "units"
|
|
that describe the syntax of a protocol. A set of units forms a
|
|
*grammar*. In practice, Spicy units typically correspond pretty
|
|
directly to protocol data units (PDUs) as protocol specifications tend
|
|
to define them. In addition to syntax, a Spicy unit type can also
|
|
specify semantic actions, called *hooks*, that will execute during
|
|
parsing as the corresponding pieces are extracted.
|
|
|
|
Here's an example of a Spicy script for parsing HTTP request lines:
|
|
|
|
.. literalinclude:: examples/my_http.spicy
|
|
:lines: 4-
|
|
:caption: my_http.spicy
|
|
:language: spicy
|
|
|
|
In this example, you can see a number of things that are typical for
|
|
Spicy code:
|
|
|
|
* A Spicy input script starts with a ``module`` statement defining
|
|
a namespace for the script's content.
|
|
|
|
* The layout of a piece of data is defined by creating a ``unit``
|
|
type. The type lists individual *fields* in the order they are
|
|
to be parsed. The example defines two such units:
|
|
``RequestLine`` and ``Version``.
|
|
|
|
* Each field inside a unit has a type and an optional name. The
|
|
type defines how that field will be parsed from raw input data.
|
|
In the example, all fields use regular expressions instead of
|
|
actual data types (``uint32`` would be an actual type), which
|
|
means that the generated parser will match these expressions
|
|
against the input stream. Assuming a match, the corresponding
|
|
value will then be recorded with type ``bytes``, which is
|
|
Spicy's type for binary data. Note how the regular expressions
|
|
can either be given directly as a field's type (as in
|
|
``Version``), or indirectly via globally defined constants (as
|
|
in ``RequestLine``).
|
|
|
|
* If a field has a name, it can later be referenced to access its
|
|
value. Consequently, in this example all fields with semantic
|
|
meanings have names, while those which are unlikely to be
|
|
relevant later do not (e.g., whitespace).
|
|
|
|
* A unit field can have another unit as its type; here that's the
|
|
case for the ``version`` field in ``RequestLine``; we say that
|
|
``Version`` is a *subunit* of ``RequestLine``. The meaning for
|
|
parsing is straight-forward: When parsing the top-level unit
|
|
reaches the field with the subunit, it switches to processing
|
|
that field according to the subunit's definition. Once the
|
|
subunit is fully parsed, the top-level unit's next field is
|
|
processed as normal from the remaining input data.
|
|
|
|
* We can specify code to be executed when a unit has been
|
|
completely parsed by implementing a hook called ``%done``.
|
|
Inside the hook's code body, statements can refer to the unit
|
|
instance currently being parsed through an implicitly defined
|
|
``self`` identifier. Through ``self``, they can then access any
|
|
fields already parsed by using a standard attribute notation
|
|
(``self.<field>``). As the access to ``version`` shows, this
|
|
also works for getting to fields nested inside subunits. In the
|
|
example, we tell the generated parser to print out three of the
|
|
parsed fields whenever a ``RequestLine`` has been fully parsed.
|
|
|
|
* The ``public`` keyword exposes the generated parser of a unit to
|
|
to external host applications wanting to deploy it. Only public
|
|
units can be used as the starting point for feeding input;
|
|
non-public subunits cannot be directly instantiated by host
|
|
applications.
|
|
|
|
Now let us see how we turn this into an actual parser that we can run.
|
|
Spicy comes with a tool called ``spicy-driver`` that acts as a
|
|
generic, standalone host application for Spicy parsers: It compiles
|
|
Spicy scripts into code and then feeds them its standard input as data
|
|
to parse. Internally, ``spicy-driver`` uses much of the same machinery
|
|
as ``spicyc``, but provides additional code kicking off the actual
|
|
parsing as well.
|
|
|
|
With the above Spicy script in a file ``my_http.spicy``, we can use
|
|
``spicy-driver`` on it like this::
|
|
|
|
# echo "GET /index.html HTTP/1.0" | spicy-driver my_http.spicy
|
|
GET, /index.html, 1.0
|
|
|
|
As you see, the ``print`` statement inside the ``%done`` hook wrote
|
|
out the three fields as we would expect (``print`` automatically
|
|
separates its arguments with commas). If we pass something into the
|
|
driver that's malformed according to our grammar, the parser will
|
|
complain::
|
|
|
|
# echo "GET XXX/1.0" | spicy-driver my_http.spicy
|
|
[fatal error] terminating with uncaught exception of type spicy::rt::ParseError: parse error: failed to match regular expression (my_http.spicy:7)
|
|
|
|
Using ``spicy-driver`` in this way relies on Spicy's support for
|
|
just-in-time compilation, just like ``spicyc -j``. In the background,
|
|
there's C++ code being generated and compiled without that we see it.
|
|
Just like in the earlier example, we can also either use ``spicyc`` to
|
|
precompile the C++ code into an object file that ``spicy-driver`` can
|
|
then load, or use ``spicy-build`` to give us an actual executable::
|
|
|
|
# spicyc -j -o my_http.hlto my_http.spicy
|
|
# echo "GET /index.html HTTP/1.0" | spicy-driver my_http.hlto
|
|
GET, /index.html, 1.0
|
|
|
|
::
|
|
|
|
# spicy-build -o a.out my_http.spicy
|
|
# echo "GET /index.html HTTP/1.0" | ./a.out
|
|
GET, /index.html, 1.0
|
|
|
|
Spicy also comes with another tool :ref:`spicy-dump <spicy-dump>` that
|
|
works similar to ``spicy-driver``, but prints out the parsed fields at
|
|
the end, either in a custom ASCII representation or as JSON::
|
|
|
|
# echo "GET /index.html HTTP/1.0" | spicy-dump my_http.hlto
|
|
MyHTTP::RequestLine {
|
|
method: GET
|
|
uri: /index.html
|
|
version: MyHTTP::Version {
|
|
number: 1.0
|
|
}
|
|
}
|
|
|
|
# echo "GET /index.html HTTP/1.0" | spicy-dump -J my_http.hlto
|
|
{"method":"GET","uri":"/index.html","version":{"number":"1.0"}}
|
|
|
|
If you want to see the actual parsing code that Spicy generates, use
|
|
``spicyc`` again: ``spicyc -c my_http.spicy`` will show the C++ code,
|
|
and ``spicyc -p my_http.spicy`` will show the intermediary HILTI code.
|
|
|
|
.. note::
|
|
|
|
See :zeek:`Zeek's Spicy documentation
|
|
<devel/spicy/getting-started.html>` for how to use the
|
|
`RequestLine` parser with Zeek.
|
|
|
|
|
|
Custom Host Application
|
|
-----------------------
|
|
|
|
Spicy parsers expose a C++ API that any application can leverage to
|
|
send them data for processing. The specifics of how to approach this
|
|
depend quite a bit on the particular needs of the application (Is it
|
|
just a single, static parser that's needed; or a set not known
|
|
upfront, and compiled dynamically? Just a single input stream, or
|
|
many? All data in one buffer, or coming in incrementally? How does the
|
|
application want to access the parsed information?). That said, the
|
|
most basic use case is quite straight-forward: feeding data into a
|
|
specific parser. Here's a small C++ program that parses input with our
|
|
``RequestLine`` parser:
|
|
|
|
.. literalinclude:: examples/my_http.cc
|
|
:caption: my_http.cc
|
|
:language: c++
|
|
|
|
.. code::
|
|
|
|
# spicy-build -S -o a.out my_http.cc my_http.spicy
|
|
# echo "GET /index.html HTTP/1.0" | ./a.out
|
|
GET, /index.html, 1.0
|
|
# echo 'Hello, World!' | ./a.out
|
|
parse error: failed to match regular expression (my_http.spicy:7)
|
|
|
|
We are using ``-S`` with ``spicy-build`` because we're providing our
|
|
own ``main`` function.
|
|
|
|
The code in ``my_http.cc`` is the core of what ``spicy-driver`` does
|
|
if we ignore the dynamic JIT compilation. See :ref:`host_applications`
|
|
for more.
|