This document provides a detailed description of the QA process. It is intended to be used by engineers reproducing the experimental setup for future tests of CometBFT.
The (first iteration of the) QA process as described in the RELEASES.md document was applied to version v0.34.x in order to have a set of results acting as benchmarking baseline. This baseline is then compared with results obtained in later versions.
Out of the testnet-based test cases described in the releases document we focused on two of them: 200 Node Test, and Rotating Nodes Test.
This section explains how the tests were carried out for reproducibility purposes.
README.md
at the top of the testnet repository to configure Terraform, and doctl
.testnets/testnet200.toml
onto testnet.toml
(do NOT commit this change)VERSION_TAG
in the Makefile
to the git hash that is to be tested.
VERSION2_WEIGHT
is set to 0VERSION_TAG2
to the other version you want deployed
in the network. The, adjust the weight variables VERSION_WEIGHT
and VERSION2_WEIGHT
to configure the
desired proportion of nodes running each of the two configured versions.README.md
to configure and start the 200 node testnet
make terraform-destroy
as soon as you are done with the tests (see step 9)COMETBFT_CONSENSUS_HEIGHT
metric.
All nodes should be increasing their heights.ssh
into the testnet-load-runner
, then copy script script/200-node-loadscript.sh
and run it from the load runner node.ROTATE_CONNECTIONS
, ROTATE_TX_RATE
, to values that will produce the desired transaction load.ROTATE_TOTAL_TIME
to 90 (seconds).make retrieve-data
to gather all relevant data from the testnet into the orchestrating machine
make retrieve-prometheus-data
and make retrieve-blockstore
separately.
The end result will be the same.make retrieve-blockstore
accepts the following values in makefile variable RETRIEVE_TARGET_HOST
any
: (which is the default) picks up a full node and retrieves the blockstore from that node only.all
: retrieves the blockstore from all full nodes; this is extremely slow, and consumes plenty of bandwidth,
so use it with care.validator01
): retrieves the blockstore from that node only.zip -T
on the prometheus.zip
file and (one of) the blockstore.db.zip
file(s)make terraform-destroy
yes
! Otherwise you’re in trouble.The method for extracting the results described here is highly manual (and exploratory) at this stage. The CometBFT team should improve it at every iteration to increase the amount of automation.
mkdir results
go run github.com/cometbft/cometbft/test/loadtime/cmd/report@f1aaa436d --database-type goleveldb --data-dir ./ > results/report.txt`
go run github.com/cometbft/cometbft/test/loadtime/cmd/report@f1aaa436d --database-type goleveldb --data-dir ./ --csv results/raw.csv`
report.txt
contains an unordered list of experiments with varying concurrent connections and transaction rate
report01.txt
, report02.txt
, report04.txt
and, for each experiment in file report.txt
,
copy its related lines to the filename that matches the number of connections, for example
for cnum in 1 2 3 4; do echo "$cnum"; grep "Connections: $cnum" results/report.txt -B 2 -A 10 > results/report$cnum.txt; done
report01.txt
in ascending tx rate order. Likewise for report02.txt
and report04.txt
.report.txt
, and skip step 4.report_tabbed.txt
by showing the contents report01.txt
, report02.txt
, report04.txt
side by side
raw.csv
using the following bash loop. This creates a .csv
file and a .dat
file per experiment.
The format of the .dat
files is amenable to loading them as matrices in Octave.
report.txt
).report*.txt
to the files you produced in step 3. uuids=($(cat report01.txt report02.txt report04.txt | grep '^Experiment ID: ' | awk '{ print $3 }'))
c=1
rm -f *.dat
for i in 01 02 04; do
for j in 0025 0050 0100 0200; do
echo $i $j $c "${uuids[$c]}"
filename=c${i}_r${j}
grep ${uuids[$c]} raw.csv > ${filename}.csv
cat ${filename}.csv | tr , ' ' | awk '{ print $2, $3 }' >> ${filename}.dat
c=$(expr $c + 1)
done
done
Load all .dat
files generated in step 5 into matrices using this Octave code snippet
conns = { "01"; "02"; "04" };
rates = { "0025"; "0050"; "0100"; "0200" };
for i = 1:length(conns)
for j = 1:length(rates)
filename = strcat("c", conns{i}, "_r", rates{j}, ".dat");
load("-ascii", filename);
endfor
endfor
Set variable release to the current release undergoing QA
release = "v0.34.x";
Generate a plot with all (or some) experiments, where the X axis is the experiment time, and the y axis is the latency of transactions. The following snippet plots all experiments.
legends = {};
hold off;
for i = 1:length(conns)
for j = 1:length(rates)
data_name = strcat("c", conns{i}, "_r", rates{j});
l = strcat("c=", conns{i}, " r=", rates{j});
m = eval(data_name); plot((m(:,1) - min(m(:,1))) / 1e+9, m(:,2) / 1e+9, ".");
hold on;
legends(1, end+1) = l;
endfor
endfor
legend(legends, "location", "northeastoutside");
xlabel("experiment time (s)");
ylabel("latency (s)");
t = sprintf("200-node testnet - %s", release);
title(t);
Consider adjusting the axis, in case you want to compare your results to the baseline, for instance
axis([0, 100, 0, 30], "tic");
Use Octave’s GUI menu to save the plot (e.g. as .png
)
Repeat steps 9 and 10 to obtain as many plots as deemed necessary.
latency_throughput.py
script.
This plot is useful to visualize the saturation point.latency_plotter.py
script.
This script generates a series of plots per experiment and configuration that my
help with visualizing Latency vs Throughput variation.systemd
unit).prometheus_plotter.py
script for the time window.This section explains how the tests were carried out for reproducibility purposes.
README.md
at the top of the testnet repository to configure Terraform, and doctl
.testnet_rotating.toml
onto testnet.toml
(do NOT commit this change)VERSION_TAG
to the git hash that is to be tested.make terraform-apply EPHEMERAL_SIZE=25
make terraform-destroy
as soon as you are done with the testsREADME.md
to configure and start the “stable” part of the rotating node testnettendermint_consensus_height
metric.
All nodes should be increasing their heights.make runload ROTATE_CONNECTIONS=X ROTATE_TX_RATE=Y
X
and Y
should reflect a load below the saturation point (see, e.g.,
this paragraph for further info)make rotate
to start the script that creates the ephemeral nodes, and kills them when they are caught up.
make rotate
scriptmake rotate
make retrieve-data
to gather all relevant data from the testnet into the orchestrating machinezip -T
on the prometheus.zip
file and (one of) the blockstore.db.zip
file(s)make terraform-destroy
Steps 8 to 10 are highly manual at the moment and will be improved in next iterations.
In order to obtain a latency plot, follow the instructions above for the 200 node experiment, but:
results.txt
file contains only one experimentfor
loopsAs for prometheus, the same method as for the 200 node experiment can be applied.