-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: solidity test benchmarks #686
base: feat/solidity-tests
Are you sure you want to change the base?
Conversation
|
191ca79
to
a056709
Compare
a056709
to
7f27ad6
Compare
ed784a8
to
fe7e781
Compare
This PR adds automated regression checks for the JS Solidity test runner. The setup is similar to how the RPC benchmark that benchmarks Hardhat Node-style workloads works.
Benchmark Suite
The benchmark runs the test harness of
forge-std
. This has two advantages:forge-std
which we intend to use.The disadvantage is that
forge-std
is not representative of how users are writing Solidity tests. But this is not a concern as long as we’re not using the benchmark to guide optimizations, only to catch regressions.There are multiple benchmark scenarios. The first one is called
Total
which measures the entire test harness execution of theforge-std
test harness. The purpose of this scenario is to make sure that there are no regressions in the parallel test suite executor.In addition to test suites, individual tests within test suites are also parallelized. This means that if we want to catch regressions in features covered by a certain test suite, we have to execute that test suite in isolation. For this reason, in addition to the
Total
scenario, we execute the following test suites as well:StdCheatsTest
: environment-related cheatcode coverage + fuzzing.StdStorageTest
: storage-related cheatcode coverage + fuzzing.StdMathTest
: math-related utilities + it's very quick (~15ms on my machine), so regressions from FFI overhead should show up on this.StdUtilsForkTest
: read-type programmatic fork tests.StdCheatsForkTest
: write + fuzz tests for programmatic forking.The rest of the test suites in
forge-std
test things implemented in Solidity or test file-system utilities, so they're less interesting for us.Benchmark Metric
The metric for all scenarios is the wall clock time measured from JS in milliseconds. The measurement excludes loading files from disk as that's out of scope for EDR. If there is a 10% regression on any of the scenarios above, the benchmark job fails (same as the RPC benchmark).
Baseline
The baseline is executing the same tests with
forge test
. Instructions to run the baseline are in the crates/tools/js/benchmark/src/solidity-tests.ts file.I tested manually that performance is identical to
forge test
on my machine, so we don't have to keep running the baseline.Benchmark Jobs
There are two benchmark workflows in the
feat/solidity-tests
branch and three benchmark jobs:EDR Benchmarks
:Run JS scenario runner benchmark for Hardhat Node style workload
main
and compares PRs against against those resultsRun JS Solidity test runner benchmark
feat/solidity-tests
and compares PRs targetingfeat/solidity-tests
against those resultsfeat/solidity-tests
is merged tomain
, we'll switch to storing results formain
EDR Benchmarks for feat/solidity-tests
Run JS scenario runner benchmark for Hardhat Node style workload
feat/solidity-tests
and compares PRs against against those resultsfeat/solidity-tests
is merged tomain
, we'll remove thisI reorganized the workflow files a bit to share Rust compilation cache + start caching the RPC cache for the Solidity test runner benchmarks.