ci: solidity test benchmarks #686

agostbiro · 2024-09-26T14:57:02Z

This PR adds automated regression checks for the JS Solidity test runner. The setup is similar to how the RPC benchmark that benchmarks Hardhat Node-style workloads works.

Benchmark Suite

The benchmark runs the test harness of forge-std. This has two advantages:

Largest coverage of cheatcodes in a single test harness which is helpful to catch regressions in all features.
Makes sure our implementation works with forge-std which we intend to use.

The disadvantage is that forge-std is not representative of how users are writing Solidity tests. But this is not a concern as long as we’re not using the benchmark to guide optimizations, only to catch regressions.

There are multiple benchmark scenarios. The first one is called Total which measures the entire test harness execution of the forge-std test harness. The purpose of this scenario is to make sure that there are no regressions in the parallel test suite executor.

In addition to test suites, individual tests within test suites are also parallelized. This means that if we want to catch regressions in features covered by a certain test suite, we have to execute that test suite in isolation. For this reason, in addition to the Total scenario, we execute the following test suites as well:

StdCheatsTest: environment-related cheatcode coverage + fuzzing.
StdStorageTest: storage-related cheatcode coverage + fuzzing.
StdMathTest: math-related utilities + it's very quick (~15ms on my machine), so regressions from FFI overhead should show up on this.
StdUtilsForkTest: read-type programmatic fork tests.
StdCheatsForkTest: write + fuzz tests for programmatic forking.

The rest of the test suites in forge-std test things implemented in Solidity or test file-system utilities, so they're less interesting for us.

Benchmark Metric

The metric for all scenarios is the wall clock time measured from JS in milliseconds. The measurement excludes loading files from disk as that's out of scope for EDR. If there is a 10% regression on any of the scenarios above, the benchmark job fails (same as the RPC benchmark).

Baseline

The baseline is executing the same tests with forge test. Instructions to run the baseline are in the crates/tools/js/benchmark/src/solidity-tests.ts file.

I tested manually that performance is identical to forge test on my machine, so we don't have to keep running the baseline.

Benchmark Jobs

There are two benchmark workflows in the feat/solidity-tests branch and three benchmark jobs:

EDR Benchmarks:
- Run JS scenario runner benchmark for Hardhat Node style workload
  - Stores results for main and compares PRs against against those results
  - Dashboard: https://nomic-foundation-automation.github.io/edr-benchmark-results/bench/
- Run JS Solidity test runner benchmark
  - This is the new job
  - Stores results for feat/solidity-tests and compares PRs targeting feat/solidity-tests against those results
    - Once feat/solidity-tests is merged to main, we'll switch to storing results for main
  - Dashboard will be live here once this PR is merged: https://nomic-foundation-automation.github.io/edr-benchmark-results/soltests/
EDR Benchmarks for feat/solidity-tests
- Run JS scenario runner benchmark for Hardhat Node style workload
  - Stores results for feat/solidity-tests and compares PRs against against those results
  - Once feat/solidity-tests is merged to main, we'll remove this

I reorganized the workflow files a bit to share Rust compilation cache + start caching the RPC cache for the Solidity test runner benchmarks.

changeset-bot · 2024-09-26T14:57:07Z

⚠️ No Changeset found

Latest commit: e9a013b

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

agostbiro added 3 commits September 26, 2024 14:32

Make soltest benchmark work with all test suites

3697aa3

Refactor

d6b9f15

Add measuring/reporting

f3189af

agostbiro marked this pull request as draft September 26, 2024 14:57

agostbiro temporarily deployed to github-action-benchmark September 26, 2024 14:57 — with GitHub Actions Inactive

agostbiro had a problem deploying to github-action-benchmark September 26, 2024 14:57 — with GitHub Actions Failure

agostbiro had a problem deploying to github-action-benchmark September 26, 2024 14:58 — with GitHub Actions Failure

agostbiro added the no changeset needed This PR doesn't require a changeset label Sep 26, 2024

agostbiro force-pushed the ci/soltest-benchmarks branch from 191ca79 to a056709 Compare September 26, 2024 14:59

agostbiro temporarily deployed to github-action-benchmark September 26, 2024 14:59 — with GitHub Actions Inactive

agostbiro had a problem deploying to github-action-benchmark September 26, 2024 15:00 — with GitHub Actions Failure

agostbiro force-pushed the ci/soltest-benchmarks branch from a056709 to 7f27ad6 Compare September 26, 2024 15:04

agostbiro temporarily deployed to github-action-benchmark September 26, 2024 15:04 — with GitHub Actions Inactive

agostbiro temporarily deployed to github-action-benchmark September 26, 2024 15:05 — with GitHub Actions Inactive

agostbiro had a problem deploying to github-action-benchmark September 26, 2024 15:05 — with GitHub Actions Error

agostbiro force-pushed the ci/soltest-benchmarks branch 2 times, most recently from ed784a8 to fe7e781 Compare September 26, 2024 15:56

agostbiro temporarily deployed to github-action-benchmark September 26, 2024 15:57 — with GitHub Actions Inactive

agostbiro had a problem deploying to github-action-benchmark September 26, 2024 15:57 — with GitHub Actions Error

agostbiro had a problem deploying to github-action-benchmark September 26, 2024 15:58 — with GitHub Actions Error

agostbiro had a problem deploying to github-action-benchmark September 26, 2024 17:12 — with GitHub Actions Error

Fix cargo cache

0007c36

agostbiro temporarily deployed to github-action-benchmark September 26, 2024 17:42 — with GitHub Actions Inactive

agostbiro temporarily deployed to github-action-benchmark September 26, 2024 17:43 — with GitHub Actions Inactive

Fix formatting

0ff2e19

agostbiro temporarily deployed to github-action-benchmark September 26, 2024 19:12 — with GitHub Actions Inactive

agostbiro temporarily deployed to github-action-benchmark September 26, 2024 19:13 — with GitHub Actions Inactive

Switch to isolated test cases

e5eb620

agostbiro temporarily deployed to github-action-benchmark September 30, 2024 16:04 — with GitHub Actions Inactive

agostbiro had a problem deploying to github-action-benchmark September 30, 2024 16:05 — with GitHub Actions Failure

agostbiro temporarily deployed to github-action-benchmark September 30, 2024 16:05 — with GitHub Actions Inactive

agostbiro temporarily deployed to github-action-benchmark September 30, 2024 16:06 — with GitHub Actions Inactive

Fix linter

e9a013b

agostbiro temporarily deployed to github-action-benchmark September 30, 2024 18:19 — with GitHub Actions Inactive

agostbiro marked this pull request as ready for review September 30, 2024 18:19

agostbiro requested a review from a team September 30, 2024 18:20

agostbiro temporarily deployed to github-action-benchmark September 30, 2024 18:20 — with GitHub Actions Inactive

agostbiro self-assigned this Sep 30, 2024

agostbiro had a problem deploying to github-action-benchmark September 30, 2024 18:22 — with GitHub Actions Failure

agostbiro temporarily deployed to github-action-benchmark September 30, 2024 18:22 — with GitHub Actions Inactive

agostbiro deployed to github-action-benchmark September 30, 2024 18:23 — with GitHub Actions Active

agostbiro had a problem deploying to github-action-benchmark September 30, 2024 19:21 — with GitHub Actions Failure

agostbiro had a problem deploying to github-action-benchmark October 1, 2024 05:59 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: solidity test benchmarks #686

ci: solidity test benchmarks #686

agostbiro commented Sep 26, 2024 •

edited

Loading

changeset-bot bot commented Sep 26, 2024 •

edited

Loading

ci: solidity test benchmarks #686

Are you sure you want to change the base?

ci: solidity test benchmarks #686

Conversation

agostbiro commented Sep 26, 2024 • edited Loading

Benchmark Suite

Benchmark Metric

Baseline

Benchmark Jobs

changeset-bot bot commented Sep 26, 2024 • edited Loading

⚠️ No Changeset found

agostbiro commented Sep 26, 2024 •

edited

Loading

changeset-bot bot commented Sep 26, 2024 •

edited

Loading