Hello,
I would like to share my results of running performance tests against a 16-node Aptos network. It would be great to get some feedback and learn about potential improvements.
First, let me describe the setup.
I’ve been using 17 single-region c2-standard-30 GCP machines (1 for the load generator) with:
- Local SSD disks.
- Internal connectivity with extended bandwidth (50 Gb/s).
The load generator is a simple application triggering a module entry function that emits received messages (of a certain size) as events. It triggers it in a loop at a certain rate with multiple threads using the Rust SDK. Then, it also reads all transactions with paging and matches them with the writes to calculate latency/roundtrip times.
At 500 TPS (~30 TPS per node) with empty messages, Aptos is able to keep up but has quite high latencies:
"global.roundtrips.duration.histogram.ms.95%": 7074.315049,
"global.roundtrips.duration.histogram.ms.99%": 7095.654735,
"global.roundtrips.duration.histogram.ms.max": 7114.140838,
"global.roundtrips.duration.histogram.ms.mean": 5725.701553571252,
"global.roundtrips.duration.histogram.ms.median": 6283.169803,
"global.roundtrips.duration.histogram.ms.min": 255.338921,
With 20KB messages, for example, I was only able to get stable (10-minute) runs (with latencies below 1 second) at 32 TPS (2 TPS per node):
{
"global.reads.meter.count": 19216,
"global.reads.meter.rate.mean": 31.526677491451355,
"global.roundtrips.duration.histogram.ms.95%": 797.162459,
"global.roundtrips.duration.histogram.ms.99%": 913.018313,
"global.roundtrips.duration.histogram.ms.max": 1034.077742,
"global.roundtrips.duration.histogram.ms.mean": 501.0518366966364,
"global.roundtrips.duration.histogram.ms.median": 503.522658,
"global.roundtrips.duration.histogram.ms.min": 198.304255,
"global.writes.duration.histogram.ms.95%": 37.613604,
"global.writes.duration.histogram.ms.99%": 40.774876,
"global.writes.duration.histogram.ms.max": 63.204721,
"global.writes.duration.histogram.ms.mean": 33.47809573232633,
"global.writes.duration.histogram.ms.median": 34.95765,
"global.writes.duration.histogram.ms.min": 24.037804,
"global.writes.failed.meter.count": 0,
"global.writes.failed.meter.rate.mean": 0.0,
"global.writes.started.meter.count": 19216,
"global.writes.started.meter.rate.mean": 31.481573635528235,
"global.writes.successful.meter.count": 19216,
"global.writes.successful.meter.rate.mean": 31.50827352535433
}
For higher rates, the latencies explode and are constantly increasing.
Please let me know if there is anything that could improve the performance, as the current results are quite disappointing based on promises of hundreds of thousands of TPS.
What I can add is that the setup as well as the load generator itself (to some extent) was working fine for other technologies/blockchains and was able to run at a few thousand TPS.
Sharing my validator config template for reference:
base:
role: "validator"
data_dir: "/etc/aptos-network/nodes/nodeN/data"
waypoint:
from_file: "/etc/aptos-network/waypoint.txt"
state_sync:
data_streaming_service:
max_concurrent_requests: 10
max_concurrent_state_requests: 20
max_request_retry: 10
consensus:
safety_rules:
service:
type: "local"
backend:
type: "on_disk_storage"
path: /etc/aptos-network/nodes/nodeN/data/secure-data.json
namespace: ~
initial_safety_rules_config:
from_file:
waypoint:
from_file: /etc/aptos-network/waypoint.txt
identity_blob_path: /etc/aptos-network/nodes/nodeN/keys/validator-identity.yaml
execution:
genesis_file_location: "/etc/aptos-network/genesis.blob"
validator_network:
listen_address: "/ip4/0.0.0.0/tcp/6180"
discovery_method: "onchain"
mutual_authentication: true
identity:
type: "from_file"
path: /etc/aptos-network/nodes/nodeN/keys/validator-identity.yaml
full_node_networks:
- network_id:
private: "vfn"
listen_address: "/ip4/0.0.0.0/tcp/6182"
identity:
type: "from_file"
path: /etc/aptos-network/nodes/nodeN/keys/validator-full-node-identity.yaml
inspection_service:
address: 0.0.0.0
port: 9091
storage:
backup_service_address: "0.0.0.0:6186"
api:
enabled: true
address: "0.0.0.0:8080"
mempool:
capacity_per_user: 10000