16-node Aptos network performance results

Hello,

I would like to share my results of running performance tests against a 16-node Aptos network. It would be great to get some feedback and learn about potential improvements.

First, let me describe the setup.

I’ve been using 17 single-region c2-standard-30 GCP machines (1 for the load generator) with:

  • Local SSD disks.
  • Internal connectivity with extended bandwidth (50 Gb/s).

The load generator is a simple application triggering a module entry function that emits received messages (of a certain size) as events. It triggers it in a loop at a certain rate with multiple threads using the Rust SDK. Then, it also reads all transactions with paging and matches them with the writes to calculate latency/roundtrip times.

At 500 TPS (~30 TPS per node) with empty messages, Aptos is able to keep up but has quite high latencies:

  "global.roundtrips.duration.histogram.ms.95%": 7074.315049,
  "global.roundtrips.duration.histogram.ms.99%": 7095.654735,
  "global.roundtrips.duration.histogram.ms.max": 7114.140838,
  "global.roundtrips.duration.histogram.ms.mean": 5725.701553571252,
  "global.roundtrips.duration.histogram.ms.median": 6283.169803,
  "global.roundtrips.duration.histogram.ms.min": 255.338921,

With 20KB messages, for example, I was only able to get stable (10-minute) runs (with latencies below 1 second) at 32 TPS (2 TPS per node):

{
  "global.reads.meter.count": 19216,
  "global.reads.meter.rate.mean": 31.526677491451355,
  "global.roundtrips.duration.histogram.ms.95%": 797.162459,
  "global.roundtrips.duration.histogram.ms.99%": 913.018313,
  "global.roundtrips.duration.histogram.ms.max": 1034.077742,
  "global.roundtrips.duration.histogram.ms.mean": 501.0518366966364,
  "global.roundtrips.duration.histogram.ms.median": 503.522658,
  "global.roundtrips.duration.histogram.ms.min": 198.304255,
  "global.writes.duration.histogram.ms.95%": 37.613604,
  "global.writes.duration.histogram.ms.99%": 40.774876,
  "global.writes.duration.histogram.ms.max": 63.204721,
  "global.writes.duration.histogram.ms.mean": 33.47809573232633,
  "global.writes.duration.histogram.ms.median": 34.95765,
  "global.writes.duration.histogram.ms.min": 24.037804,
  "global.writes.failed.meter.count": 0,
  "global.writes.failed.meter.rate.mean": 0.0,
  "global.writes.started.meter.count": 19216,
  "global.writes.started.meter.rate.mean": 31.481573635528235,
  "global.writes.successful.meter.count": 19216,
  "global.writes.successful.meter.rate.mean": 31.50827352535433
}

For higher rates, the latencies explode and are constantly increasing.

Please let me know if there is anything that could improve the performance, as the current results are quite disappointing based on promises of hundreds of thousands of TPS.

What I can add is that the setup as well as the load generator itself (to some extent) was working fine for other technologies/blockchains and was able to run at a few thousand TPS.

Sharing my validator config template for reference:

base:
  role: "validator"
  data_dir: "/etc/aptos-network/nodes/nodeN/data"
  waypoint:
    from_file: "/etc/aptos-network/waypoint.txt"

state_sync:
  data_streaming_service:
    max_concurrent_requests: 10
    max_concurrent_state_requests: 20
    max_request_retry: 10

consensus:
  safety_rules:
    service:
      type: "local"
    backend:
      type: "on_disk_storage"
      path: /etc/aptos-network/nodes/nodeN/data/secure-data.json
      namespace: ~
    initial_safety_rules_config:
      from_file:
        waypoint:
          from_file: /etc/aptos-network/waypoint.txt
        identity_blob_path: /etc/aptos-network/nodes/nodeN/keys/validator-identity.yaml

execution:
  genesis_file_location: "/etc/aptos-network/genesis.blob"

validator_network:
  listen_address: "/ip4/0.0.0.0/tcp/6180"
  discovery_method: "onchain"
  mutual_authentication: true
  identity:
    type: "from_file"
    path: /etc/aptos-network/nodes/nodeN/keys/validator-identity.yaml

full_node_networks:
  - network_id:
      private: "vfn"
    listen_address: "/ip4/0.0.0.0/tcp/6182"
    identity:
      type: "from_file"
      path: /etc/aptos-network/nodes/nodeN/keys/validator-full-node-identity.yaml

inspection_service:
  address: 0.0.0.0
  port: 9091

storage:
  backup_service_address: "0.0.0.0:6186"

api:
  enabled: true
  address: "0.0.0.0:8080"

mempool:
  capacity_per_user: 10000
9 Likes

The state sync parameters are copied from the Docker setup.

3 Likes

That was good thanks.
So we are still far from 160k tps!

I tried increasing the state sync parameters to the defaults, and it didn’t help.