-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid generating unsupported data types for JoinFuzzer with PQR #10662
Conversation
✅ Deploy Preview for meta-velox canceled.
|
This pull request was exported from Phabricator. Differential Revision: D60768414 |
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…ncubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, about 80% of iterations are verified against Presto. Differential Revision: D60768414
ab2f06d
to
49bc4b3
Compare
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…ncubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Differential Revision: D60768414
49bc4b3
to
b5efb92
Compare
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…ncubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Differential Revision: D60768414
8f99711
to
00c69f7
Compare
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…ncubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Differential Revision: D60768414
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…ncubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Differential Revision: D60768414
00c69f7
to
c09905f
Compare
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…ncubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Differential Revision: D60768414
c09905f
to
e4fac78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kagamiori LGTM % minors. Thanks!
velox/vector/fuzzer/VectorFuzzer.h
Outdated
@@ -389,4 +389,7 @@ RowTypePtr randRowType( | |||
const std::vector<TypePtr>& scalarTypes, | |||
int maxDepth = 5); | |||
|
|||
// Default set of scalar types to be chosen from when generating random types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// for public comment
|
||
namespace facebook::velox::exec::test { | ||
|
||
enum RunnerType { kPrestoQueryRunner, kDuckQueryRunner }; | ||
|
||
/// Query runner that uses reference database, i.e. DuckDB, Presto, Spark. | ||
class ReferenceQueryRunner { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class ReferenceQueryRunner {
public:
enum class Type {
kPresto,
kDuckDB,
}
return RunnerType::kPrestoQueryRunner; | ||
} | ||
|
||
const std::vector<TypePtr>& supportedScalarTypes() const override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: move the nontrivial implementation to cc? thanks!
velox/exec/fuzzer/DuckQueryRunner.h
Outdated
/// casting from Bigint to Interval. | ||
/// | ||
/// TODO Investigate mismatches reported when comparing Varbinary. | ||
const std::vector<TypePtr>& supportedScalarTypes() const override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
// The number of iterations that test cross product. | ||
size_t numCrossProduct{0}; | ||
|
||
void print(size_t totalIterations) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let make totalIterations a member?
size_t numIterations;
And add a toString() method for stats
LOG(INFO) << stats_.toString()?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I have tried making this change, but it turns out to be less readable then before. The original output is
W0814 12:39:44.775323 847942 JoinFuzzer.cpp:1481] ==============================> Done with iteration 0
I0814 12:39:44.775427 847942 JoinFuzzer.cpp:318] Total iterations tested: 1
I0814 12:39:44.775455 847942 JoinFuzzer.cpp:319] Total iterations verified against reference DB: 1 (100.00%)
I0814 12:39:44.775501 847942 JoinFuzzer.cpp:321] Total iterations testing cross product: 0 (0.00%)
After changing the code to LOG(INFO) << stats_.toString()
, the output becomes
W0814 12:36:05.199414 779680 JoinFuzzer.cpp:1485] ==============================> Done with iteration 3
I0814 12:36:05.199489 779680 JoinFuzzer.cpp:1491] Total iterations tested: 4
Total iterations verified against reference DB: 3 (75.00%)
Total iterations testing cross product: 0 (0.00%)
I think the original output is better, so I keep the code.
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
@@ -350,7 +372,8 @@ std::vector<TypePtr> JoinFuzzer::generateJoinKeyTypes(int32_t numKeys) { | |||
types.reserve(numKeys); | |||
for (auto i = 0; i < numKeys; ++i) { | |||
// Pick random scalar type. | |||
types.push_back(vectorFuzzer_.randType(0 /*maxDepth*/)); | |||
types.push_back(vectorFuzzer_.randType( | |||
referenceQueryRunner_->supportedScalarTypes(), 0 /*maxDepth*/)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/*maxDepth=*/0
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
@@ -373,7 +396,8 @@ std::vector<RowVectorPtr> JoinFuzzer::generateProbeInput( | |||
const auto numPayload = randInt(0, 3); | |||
for (auto i = 0; i < numPayload; ++i) { | |||
names.push_back(fmt::format("tp{}", i + keyNames.size())); | |||
types.push_back(vectorFuzzer_.randType(2 /*maxDepth*/)); | |||
types.push_back(vectorFuzzer_.randType( | |||
referenceQueryRunner_->supportedScalarTypes(), 2 /*maxDepth*/)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
@@ -406,7 +430,8 @@ std::vector<RowVectorPtr> JoinFuzzer::generateBuildInput( | |||
const auto numPayload = randInt(0, 3); | |||
for (auto i = 0; i < numPayload; ++i) { | |||
names.push_back(fmt::format("bp{}", i + buildKeys.size())); | |||
types.push_back(vectorFuzzer_.randType(2 /*maxDepth*/)); | |||
types.push_back(vectorFuzzer_.randType( | |||
referenceQueryRunner_->supportedScalarTypes(), 2 /*maxDepth*/)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dittos
@@ -933,6 +955,9 @@ RowVectorPtr JoinFuzzer::testCrossProduct( | |||
assertEqualResults( | |||
referenceResult.value(), plan.plan->outputType(), {expected}), | |||
"Velox and DuckDB results don't match"); | |||
|
|||
LOG(INFO) << "Result matches with referenc DB."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…bookincubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Differential Revision: D60768414
e4fac78
to
2afd736
Compare
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…bookincubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Differential Revision: D60768414
2afd736
to
44321d3
Compare
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…bookincubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Differential Revision: D60768414
44321d3
to
307a742
Compare
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…bookincubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Differential Revision: D60768414
307a742
to
0f8365f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kagamiori thanks for the update % nits.
|
||
namespace facebook::velox::exec::test { | ||
|
||
/// Query runner that uses reference database, i.e. DuckDB, Presto, Spark. | ||
class ReferenceQueryRunner { | ||
public: | ||
enum RunnerType { kPrestoQueryRunner, kDuckQueryRunner }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just define it as enum class?
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
// The number of iterations that test cross product. | ||
size_t numCrossProduct{0}; | ||
|
||
void print() const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The common practice is to provide a toString method and let the control path to log out if need.
velox/exec/fuzzer/JoinFuzzer.cpp
Outdated
|
||
struct Stats { | ||
// The total number of iterations tested. | ||
size_t totalIterations{0}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/totalIterations/numIterations/
The same naming as the other two? Thanks!
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…bookincubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Reviewed By: xiaoxmeng Differential Revision: D60768414
0f8365f
to
cf6b3f0
Compare
This pull request was exported from Phabricator. Differential Revision: D60768414 |
…bookincubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Reviewed By: xiaoxmeng Differential Revision: D60768414
cf6b3f0
to
26b156e
Compare
…bookincubator#10662) Summary: Pull Request resolved: facebookincubator#10662 When running JoinFuzzerTest with PrestoQueryRunner, currently only about 16-20% iterations are verified against Presto. The rest iterations are unverified due to unsupported data types. This diff makes JoinFuzzer to avoid generating unsupported types when running with PrestoQueryRunner. After this change, over 85% of iterations are verified against Presto. Reviewed By: xiaoxmeng Differential Revision: D60768414
This pull request was exported from Phabricator. Differential Revision: D60768414 |
26b156e
to
d9d92aa
Compare
This pull request has been merged in d23dde5. |
Conbench analyzed the 1 benchmark run on commit There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
Summary:
When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.
Differential Revision: D60768414