Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid generating unsupported data types for JoinFuzzer with PQR #10662

Closed
wants to merge 1 commit into from

Conversation

kagamiori
Copy link
Contributor

@kagamiori kagamiori commented Aug 5, 2024

Summary:
When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 5, 2024
Copy link

netlify bot commented Aug 5, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit d9d92aa
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66bece2569bb0b0008b461d4

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 6, 2024
…ncubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, about 80% of iterations are verified against Presto.

Differential Revision: D60768414
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 6, 2024
…ncubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 6, 2024
…ncubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414
@kagamiori kagamiori force-pushed the export-D60768414 branch 2 times, most recently from 8f99711 to 00c69f7 Compare August 13, 2024 17:13
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 13, 2024
…ncubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 13, 2024
…ncubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 13, 2024
…ncubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414
@kagamiori kagamiori changed the title Avoid generating unsupported types for JoinFuzzer with PQR Avoid generating unsupported data types for JoinFuzzer with PQR Aug 13, 2024
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori LGTM % minors. Thanks!

@@ -389,4 +389,7 @@ RowTypePtr randRowType(
const std::vector<TypePtr>& scalarTypes,
int maxDepth = 5);

// Default set of scalar types to be chosen from when generating random types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// for public comment


namespace facebook::velox::exec::test {

enum RunnerType { kPrestoQueryRunner, kDuckQueryRunner };

/// Query runner that uses reference database, i.e. DuckDB, Presto, Spark.
class ReferenceQueryRunner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class ReferenceQueryRunner {
  public:
   enum class Type {
     kPresto,
     kDuckDB,
   }

return RunnerType::kPrestoQueryRunner;
}

const std::vector<TypePtr>& supportedScalarTypes() const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move the nontrivial implementation to cc? thanks!

/// casting from Bigint to Interval.
///
/// TODO Investigate mismatches reported when comparing Varbinary.
const std::vector<TypePtr>& supportedScalarTypes() const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// The number of iterations that test cross product.
size_t numCrossProduct{0};

void print(size_t totalIterations) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let make totalIterations a member?
size_t numIterations;

And add a toString() method for stats
LOG(INFO) << stats_.toString()?
Thanks!

Copy link
Contributor Author

@kagamiori kagamiori Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I have tried making this change, but it turns out to be less readable then before. The original output is

W0814 12:39:44.775323 847942 JoinFuzzer.cpp:1481] ==============================> Done with iteration 0
I0814 12:39:44.775427 847942 JoinFuzzer.cpp:318] Total iterations tested: 1
I0814 12:39:44.775455 847942 JoinFuzzer.cpp:319] Total iterations verified against reference DB: 1 (100.00%)
I0814 12:39:44.775501 847942 JoinFuzzer.cpp:321] Total iterations testing cross product: 0 (0.00%)

After changing the code to LOG(INFO) << stats_.toString(), the output becomes

W0814 12:36:05.199414 779680 JoinFuzzer.cpp:1485] ==============================> Done with iteration 3
I0814 12:36:05.199489 779680 JoinFuzzer.cpp:1491] Total iterations tested: 4
Total iterations verified against reference DB: 3 (75.00%)
Total iterations testing cross product: 0 (0.00%)

I think the original output is better, so I keep the code.

@@ -350,7 +372,8 @@ std::vector<TypePtr> JoinFuzzer::generateJoinKeyTypes(int32_t numKeys) {
types.reserve(numKeys);
for (auto i = 0; i < numKeys; ++i) {
// Pick random scalar type.
types.push_back(vectorFuzzer_.randType(0 /*maxDepth*/));
types.push_back(vectorFuzzer_.randType(
referenceQueryRunner_->supportedScalarTypes(), 0 /*maxDepth*/));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/*maxDepth=*/0

@@ -373,7 +396,8 @@ std::vector<RowVectorPtr> JoinFuzzer::generateProbeInput(
const auto numPayload = randInt(0, 3);
for (auto i = 0; i < numPayload; ++i) {
names.push_back(fmt::format("tp{}", i + keyNames.size()));
types.push_back(vectorFuzzer_.randType(2 /*maxDepth*/));
types.push_back(vectorFuzzer_.randType(
referenceQueryRunner_->supportedScalarTypes(), 2 /*maxDepth*/));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -406,7 +430,8 @@ std::vector<RowVectorPtr> JoinFuzzer::generateBuildInput(
const auto numPayload = randInt(0, 3);
for (auto i = 0; i < numPayload; ++i) {
names.push_back(fmt::format("bp{}", i + buildKeys.size()));
types.push_back(vectorFuzzer_.randType(2 /*maxDepth*/));
types.push_back(vectorFuzzer_.randType(
referenceQueryRunner_->supportedScalarTypes(), 2 /*maxDepth*/));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dittos

velox/exec/fuzzer/JoinFuzzer.cpp Show resolved Hide resolved
@@ -933,6 +955,9 @@ RowVectorPtr JoinFuzzer::testCrossProduct(
assertEqualResults(
referenceResult.value(), plan.plan->outputType(), {expected}),
"Velox and DuckDB results don't match");

LOG(INFO) << "Result matches with referenc DB.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 14, 2024
…bookincubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 14, 2024
…bookincubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 14, 2024
…bookincubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 14, 2024
…bookincubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Differential Revision: D60768414
Copy link
Contributor

@xiaoxmeng xiaoxmeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kagamiori thanks for the update % nits.


namespace facebook::velox::exec::test {

/// Query runner that uses reference database, i.e. DuckDB, Presto, Spark.
class ReferenceQueryRunner {
public:
enum RunnerType { kPrestoQueryRunner, kDuckQueryRunner };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just define it as enum class?

// The number of iterations that test cross product.
size_t numCrossProduct{0};

void print() const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common practice is to provide a toString method and let the control path to log out if need.


struct Stats {
// The total number of iterations tested.
size_t totalIterations{0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/totalIterations/numIterations/

The same naming as the other two? Thanks!

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 14, 2024
…bookincubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Reviewed By: xiaoxmeng

Differential Revision: D60768414
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

kagamiori added a commit to kagamiori/velox that referenced this pull request Aug 15, 2024
…bookincubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Reviewed By: xiaoxmeng

Differential Revision: D60768414
…bookincubator#10662)

Summary:
Pull Request resolved: facebookincubator#10662

When running JoinFuzzerTest with PrestoQueryRunner, currently only about
16-20% iterations are verified against Presto. The rest iterations are
unverified due to unsupported data types. This diff makes JoinFuzzer to
avoid generating unsupported types when running with PrestoQueryRunner.
After this change, over 85% of iterations are verified against Presto.

Reviewed By: xiaoxmeng

Differential Revision: D60768414
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D60768414

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in d23dde5.

Copy link

Conbench analyzed the 1 benchmark run on commit d23dde5a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants