# Broker Benchmarks Broker ships with benchmarking tools that allow developers and users to investigate system performance in various deployment and configuration setups. ## Clustering: `broker-cluster-benchmark` This is the primary benchmark suite that runs Broker in a full end-to-end deployment. Unlike real deployments, this tool allows all Broker endpoints run in a single OS process. ### Setup and Configuration Running `broker-cluster-benchmark` requires a cluster configuration file using CAF's config syntax: ```sh ; comments start with a semicolon foo = "bar" ; strings use double quotes homepage = ; URIs use angle brackets list = [1, 2, 3] ; Lists use square brackets ``` The cluster config contains all participating Broker endpoints under `nodes`. Each node must have at least an `id` (URI) and `topics` (list of strings). The `id` is the network-wide identifier for peering. Use `local:$name` if a node does not accept incoming connections and `tcp://$ip:$port` otherwise. Nodes that publish data must have a `generator-file`. Nodes that wait for data must set `num-inputs`. A minimal example file might look like this: ```sh nodes { earth { id = peers = ["mars"] topics = ["/benchmark/events"] num-inputs = 100000 } mars { id = topics = ["/benchmark/events"] generator-file = "mars.dat" num-outputs = 100000 } } ``` This config file will start two nodes: `earth` and `mars`. On startup, `mars` opens port 8001 and waits for its peers to connect while `earth` will not open any port since it has a `local:` ID. The entry `peers` for `earth` will cause this node to connect to `mars` by trying to connect to `tcp://[::1]:8001`. The generator file `mars.dat` contains previously recorded meta data from a live system. Setting `num-outputs` causes `broker-cluster-benchmark` to emit exactly that amount of messages. The node will ignore additional messages in the generator file if it contains more than `num-outputs` entries or loop through the file if it contains less entries. ### Recording Meta Data Setting the configuration parameter `broker.recording-directory` (or setting the environment variable `BROKER_RECORDING_DIRECTORY`) to a non-empty path triggers Broker to record meta data such as subscriptions, peerings, and published data at this endpoint. The meta data is about 2MB for each 1M recorded messages (depending on the structure of the data). Setting the configuration parameter `broker.output-generator-file-cap` (or setting the environment variable `BROKER_OUTPUT_GENERATOR_FILE_CAP`) to an unsigned integer limits recording to that many published messages. An example for how to record data from a Zeek cluster simply involves adding a line for each node in `/usr/local/zeek/etc/node.cfg` like: ``` env_vars=BROKER_RECORDING_DIRECTORY=/your/desired/path/zeek-recording- ``` Where `` would be replaced by the specific node name to avoid nodes overwriting each other's data. ### Generating Config Files from Recorded Meta Data After recording meta data for *all* Broker nodes, the tool `broker-cluster-benchmark` can automatically generate a cluster configuration by analyzing the recorded files. The generated config file uses the directory names as node names and establishes the recorded peering relations. The tool generates config files when passing the `--generate-config` option by scanning all specified directories. For example, the following command prints a configuration for a recorded Broker session with two endpoints: ```sh broker-cluster-benchmark --mode=generate-config recordings/server recordings/client ``` The tool assumes the directories `server` and `client` to contain the following files: ``` recordings/ ├── client │   ├── id.txt │   ├── messages.dat │   ├── peers.txt │   └── topics.txt └── server ├── id.txt ├── messages.dat ├── peers.txt └── topics.txt ``` The produced configuration will contain two nodes: `client` and `server`. All other fields and peering relations are automatically generated from the file contents. It is worth mentioning that the tool does a linear scan over all `messages.dat` files to compute the number of expected messages in the system. This step may take some time. ### Running the Benchmark The tool `broker-cluster-benchmark` expects at least `-c $configFile`. Passing `-v` also enables verbose output to get a glimpse into the program state at runtime. When running a configuration for the first time, we strongly recommend running in verbose mode: ```sh broker-cluster-benchmark -c cluster.conf -v ``` Running in verbose mode prints various state messages to the console: ```sh Peering tree (multiple roots are allowed): mars, topics: ["/benchmark/events"] └── earth, topics: ["/benchmark/events"] mars starts listening at [::1]:8001 mars up and running earth starts peering to [::1]:8001 (mars) earth up and running all nodes are up and running, run benchmark earth waits for messages mars starts publishing ... snip ... ``` Before the tool spins up all Broker endpoints, it makes sure that the configured topology is safe to deploy: - No loops allowed. - Each node must set the mandatory fields `id` and `topics`. Broker's source distribution includes a working setup to get started at `tests/benchmark/cluster-example.zip`. ### Inspecting Generator Files If you're unsure which topics appear in a generator file or how many messages it contains, you can add the `dump-stats` mode: ```sh broker-cluster-benchmark -c cluster.conf -v --mode=dump-stats ``` In this mode, the tool only prints the contents of all generator files and then exits. The output simply includes all generator files, which topics they contain and how many messages they produce: ```sh mars.dat ├── entries: 1000 | ├── data-entries: 1000 | └── command-entries: 0 └── topics: └── /benchmark/events ``` Note that the tool has to linearly scan each generator file, which may take some time.