Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradle tasks executing benchmarks succeed even if some benchmarks fail #186

Open
fzhinkin opened this issue Feb 6, 2024 · 0 comments
Open

Comments

@fzhinkin
Copy link
Contributor

fzhinkin commented Feb 6, 2024

Currently, Gradle tasks executing benchmarks don't fail if some benchmarks fail. That might not be a problem if benchmarks are executed within the IDE, as failure status will be reported explicitly, but in other scenarios, it may lead to failures being unnoticed as generated reports will not contain any hints of failures and the only way to figure out that something went wrong is by inspecting logs.

For example, if benchmarks are executed in CI then, most likely, nobody will check the logs until there's a failure, but since a benchmarking task will succeed in any case and there will also be a report with all benchmarks but a failed one, it may take a long time until somebody will notice a failure.

Here's a reproducer: https://github.com/fzhinkin/kotlinx-benchmark-success-on-benchmark-failure

./gradlew benchmark
> Task :jvmBenchmark
...
<failure>

java.lang.RuntimeException
        at org.example.FaultyBenchmark.thisOneIsNoBetter(FaultyBenchmark.kt:14)
        at org.example.generated.FaultyBenchmark_thisOneIsNoBetter_jmhTest.thisOneIsNoBetter_thrpt_jmhStub(FaultyBenchmark_thisOneIsNoBetter_jmhTest.java:121)
        at org.example.generated.FaultyBenchmark_thisOneIsNoBetter_jmhTest.thisOneIsNoBetter_Throughput(FaultyBenchmark_thisOneIsNoBetter_jmhTest.java:83)
...
> Task :macosArm64Benchmark
...
… org.example.FaultyBenchmark.faulty
  EXCEPTION: kotlin.RuntimeException
0   macosArm64Benchmark.kexe            0x102b6fc73        kfun:org.example.FaultyBenchmark#faulty(){} + 99 
1   macosArm64Benchmark.kexe            0x102b71edb        kfun:kotlinx.benchmark.generated.org.example.FaultyBenchmark_Descriptor.$faulty$FUNCTION_REFERENCE$5.invoke#internal + 23 
...
BUILD SUCCESSFUL in 4m 25s

The build is successful, and the reports contain some results (there's one non-failing benchmark in the demo project), so without inspecting the logs it's hard to detect failures. And even with logs one may decide that everything is fine as the task succeeded.

I am suggesting starting failing Grade tasks if there's at least one failed benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant