Patrick Kelley 8fd444092b initial
2025-05-07 15:35:15 -04:00

6.0 KiB

Broker Benchmarks

Broker ships with benchmarking tools that allow developers and users to investigate system performance in various deployment and configuration setups.

Clustering: broker-cluster-benchmark

This is the primary benchmark suite that runs Broker in a full end-to-end deployment. Unlike real deployments, this tool allows all Broker endpoints run in a single OS process.

Setup and Configuration

Running broker-cluster-benchmark requires a cluster configuration file using CAF's config syntax:

; comments start with a semicolon
foo = "bar"                   ; strings use double quotes
homepage = <https://zeek.org> ; URIs use angle brackets
list = [1, 2, 3]              ; Lists use  square brackets

The cluster config contains all participating Broker endpoints under nodes. Each node must have at least an id (URI) and topics (list of strings). The id is the network-wide identifier for peering. Use local:$name if a node does not accept incoming connections and tcp://$ip:$port otherwise.

Nodes that publish data must have a generator-file. Nodes that wait for data must set num-inputs. A minimal example file might look like this:

nodes {
  earth {
    id = <local:earth>
    peers = ["mars"]
    topics = ["/benchmark/events"]
    num-inputs = 100000
  }
  mars {
    id = <tcp://[::1]:8001>
    topics = ["/benchmark/events"]
    generator-file = "mars.dat"
    num-outputs = 100000
  }
}

This config file will start two nodes: earth and mars. On startup, mars opens port 8001 and waits for its peers to connect while earth will not open any port since it has a local: ID. The entry peers for earth will cause this node to connect to mars by trying to connect to tcp://[::1]:8001.

The generator file mars.dat contains previously recorded meta data from a live system. Setting num-outputs causes broker-cluster-benchmark to emit exactly that amount of messages. The node will ignore additional messages in the generator file if it contains more than num-outputs entries or loop through the file if it contains less entries.

Recording Meta Data

Setting the configuration parameter broker.recording-directory (or setting the environment variable BROKER_RECORDING_DIRECTORY) to a non-empty path triggers Broker to record meta data such as subscriptions, peerings, and published data at this endpoint. The meta data is about 2MB for each 1M recorded messages (depending on the structure of the data).

Setting the configuration parameter broker.output-generator-file-cap (or setting the environment variable BROKER_OUTPUT_GENERATOR_FILE_CAP) to an unsigned integer limits recording to that many published messages.

An example for how to record data from a Zeek cluster simply involves adding a line for each node in /usr/local/zeek/etc/node.cfg like:

env_vars=BROKER_RECORDING_DIRECTORY=/your/desired/path/zeek-recording-<node>

Where <node> would be replaced by the specific node name to avoid nodes overwriting each other's data.

Generating Config Files from Recorded Meta Data

After recording meta data for all Broker nodes, the tool broker-cluster-benchmark can automatically generate a cluster configuration by analyzing the recorded files. The generated config file uses the directory names as node names and establishes the recorded peering relations.

The tool generates config files when passing the --generate-config option by scanning all specified directories. For example, the following command prints a configuration for a recorded Broker session with two endpoints:

broker-cluster-benchmark --mode=generate-config recordings/server recordings/client

The tool assumes the directories server and client to contain the following files:

recordings/
├── client
│   ├── id.txt
│   ├── messages.dat
│   ├── peers.txt
│   └── topics.txt
└── server
    ├── id.txt
    ├── messages.dat
    ├── peers.txt
    └── topics.txt

The produced configuration will contain two nodes: client and server. All other fields and peering relations are automatically generated from the file contents. It is worth mentioning that the tool does a linear scan over all messages.dat files to compute the number of expected messages in the system. This step may take some time.

Running the Benchmark

The tool broker-cluster-benchmark expects at least -c $configFile. Passing -v also enables verbose output to get a glimpse into the program state at runtime. When running a configuration for the first time, we strongly recommend running in verbose mode:

broker-cluster-benchmark -c cluster.conf -v

Running in verbose mode prints various state messages to the console:

Peering tree (multiple roots are allowed):
mars, topics: ["/benchmark/events"]
└── earth, topics: ["/benchmark/events"]

mars starts listening at [::1]:8001
mars up and running
earth starts peering to [::1]:8001 (mars)
earth up and running
all nodes are up and running, run benchmark
earth waits for messages
mars starts publishing
... snip ...

Before the tool spins up all Broker endpoints, it makes sure that the configured topology is safe to deploy:

  • No loops allowed.
  • Each node must set the mandatory fields id and topics.

Broker's source distribution includes a working setup to get started at tests/benchmark/cluster-example.zip.

Inspecting Generator Files

If you're unsure which topics appear in a generator file or how many messages it contains, you can add the dump-stats mode:

broker-cluster-benchmark -c cluster.conf -v --mode=dump-stats

In this mode, the tool only prints the contents of all generator files and then exits. The output simply includes all generator files, which topics they contain and how many messages they produce:

mars.dat
├── entries: 1000
|   ├── data-entries: 1000
|   └── command-entries: 0
└── topics:
    └── /benchmark/events

Note that the tool has to linearly scan each generator file, which may take some time.