Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix register simple function SignatureMap overwrite #10811

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jinchengchenghh
Copy link
Contributor

@jinchengchenghh jinchengchenghh commented Aug 22, 2024

Before this PR, the SignatureMap key is FunctionSignature and will compare all the fields of it. But fields std::unordered_map<std::string, SignatureVariable> variables and TypeSignature returnType is different, causing registering functions only differs with result type. It should only consider name and input types and not allow registering functions that differ only in result type. But for decimal registration, (short, short) -> short, (short, short) -> long exists, so I only relax the result type computation logic, allow the result type diffs.

This PR introduces SignatureHash and SignatureCmp to SignatureMap, which only consider the argument type base name, and check metadata's physical argument type and physical result type by kindEquals which will check the child data type to determine whether to overwrite.

Resolves: #10602

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 22, 2024
Copy link

netlify bot commented Aug 22, 2024

Deploy Preview for meta-velox ready!

Name Link
🔨 Latest commit d613251
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66e244ed3f445c00080e752e
😎 Deploy Preview https://deploy-preview-10811--meta-velox.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@jinchengchenghh
Copy link
Contributor Author

If remove the physical type match for result type, we will get this exception because decimal registered as (short, short) -> short and (short, short) -> long.

Function: resolveScalarFunctionType
File: /__w/velox/velox/velox/parse/TypeResolver.cpp
Line: 99
Stack trace:
Stack trace has been disabled. Use --velox_exception_user_stacktrace_enabled=true to enable it.
" thrown in the test body.
[  FAILED  ] DecimalArithmeticTest.add (6 ms)
[ RUN      ] DecimalArithmeticTest.subtract
unknown file: Failure
C++ exception with description "Exception: VeloxUserError
Error Source: USER
Error Code: INVALID_ARGUMENT
Reason: Scalar function signature is not supported: minus(DECIMAL(10, 3), DECIMAL(10, 3)). Supported signatures: (timestamp,timestamp) -> interval day to second, (date,interval year to month) -> date, (timestamp with time zone,interval year to month) -> timestamp with time zone, (real,real) -> real, (timestamp,interval year to month) -> timestamp, (interval day to second,interval day to second) -> interval day to second, (timestamp with time zone,timestamp with time zone) -> interval day to second, (decimal(i1,i5),decimal(i2,i6)) -> decimal(i3,i7), (tinyint,tinyint) -> tinyint, (integer,integer) -> integer, (double,double) -> double, (smallint,smallint) -> smallint, (timestamp,interval day to second) -> timestamp, (bigint,bigint) -> bigint, (date,interval day to second) -> date, (timestamp with time zone,interval day to second) -> timestamp with time zone.
Retriable: False

@jinchengchenghh
Copy link
Contributor Author

Can you help review this PR? Thanks! @mbasmanova

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh I'm not sure what change does and I don't see a unit test.

Initially, I thought that we should use name + input-args as the key and not allow duplicates, but you reminded me that this doesn't work for decimal type. What's the trick to make this work?

@jinchengchenghh
Copy link
Contributor Author

In my thought, decimal should be considered as one type DecimalType, and should be registered one time. For example,

registerFunction<
      DecimalDivideFunction,
      Decimal<P3, S3>,
      Decimal<P1, S1>,
      Decimal<P2, S2>>({prefix + "divide"}, constraints);

But now, it's physical type is different, so we take it as different type. I couldn't think of a good idea to solve it.

Exists UT can cover this change, I change some UTs.

@mbasmanova

@jinchengchenghh
Copy link
Contributor Author

This PR change the SignatureMap key compare logic. Change from FunctionSignature to some fields of FunctionSignature. @mbasmanova

@@ -518,7 +519,7 @@ class SimpleFunctionMetadata : public ISimpleFunctionMetadata {
std::string helpMessage(const std::string& name) const final {
// return fmt::format("{}({})", name, signature_->toString());
std::string s{name};
s.append("(");
s.append("\nSignature argument types:\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change? The comment on L520 needs updating.

Can we share an example of the new message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in a separate PR #10821, you notice it after then.

@@ -60,6 +60,9 @@ bool SimpleFunctionRegistry::registerFunctionInternal(
metadata->defaultNullBehavior(), otherMetadata.defaultNullBehavior());
VELOX_USER_CHECK_EQ(
metadata->isDeterministic(), otherMetadata.isDeterministic());
VELOX_USER_CHECK_EQ(
metadata->signature()->variableArity(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a separate change, right? Perhaps, it can be extracted into a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this Patch, signature is the key used for comparison, variableArity is always equal in that case, after this change, we don't allow to register func(int) and func(int...) at the same time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Would you extract this change into a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #10980

using SignatureMap = std::unordered_map<
FunctionSignature,
std::vector<std::unique_ptr<const FunctionEntry>>>;
std::vector<std::unique_ptr<const FunctionEntry>>,
SignatureHash,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is:

  • SignatureHash includes argument types, but not return type
  • SignatureCmd includes argument types, but in a limited way (map(k, v) compares equal to map(k1, v1) no matter what k, v, k1, v1 are.

This means that 2 entries may have different hashes, but compare the same. This is a violation of a core contract for hash/compare. We cannot do that.

Let me know if I missed something.

CC: @rui-mo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right.
I change the SignatureHash to only consider base name like SignatureCmd.

@mbasmanova
Copy link
Contributor

In my thought, decimal should be considered as one type DecimalType, and should be registered one time

@jinchengchenghh I agree. Let's figure out how to achieve that.

@mbasmanova
Copy link
Contributor

@jinchengchenghh I feel this needs some more thinking / designing.

A function has name, arguments and result type. Ideally, result type is a function of name + argument, but this is not the case for functions that use decimal types (either in argument, result type or both). I believe we need to fix that first. Like you mentioned earlier, for the purpose of function signatures we should have a single decimal(p, s) type and we should allow a single signature be backed by multiple implementations.

If we leave the decimal problem aside for a moment, we need a way to tell whether two signatures are distinguishable or not. For example, foo(int) and foo(int...) are not distinguishable. We don't want to allow both to be present in the registry at the same time. This is a standard problem in compilers. Wondering how they solve this. Might be good to look that up and see if we can implement similar solution.

// element type, so as Decimal, List, Struct.let the metadata
// physicalSignatureEquals which will consider the child data type to
// decide if overwrite.
if (l.argumentTypes()[i].baseName() != r.argumentTypes()[i].baseName()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this works. We cannot store 2 entries in a map if their keys compare as equal. Hence, we won't be able to store foo(array(bigint)) and foo(array(boolean)) at the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IN the signature, we use array(T) to represents array(bigint) and array(boolean), so the baseName is equal.
Suppose we have registered array(bigint), now we try to register array<boolean>, the functions size is 1, then the metadata->physicalSignatureEquals(otherMetadata) condition is false, so we can insert this FunctionEntry array(boolean).
https://github.com/facebookincubator/velox/blob/main/velox/expression/SimpleFunctionRegistry.cpp#L44
https://github.com/facebookincubator/velox/blob/main/velox/expression/SimpleFunctionRegistry.cpp#L49

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppose we have registered array(bigint), now we try to register array, the functions size is 1, then the metadata->physicalSignatureEquals(otherMetadata) condition is false, so we can insert this FunctionEntry array(boolean).

A hash map contract requires that all keys stores in the map compare not equal. We should not write code that don't honor this contract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so I have changed to both the SignatureMap and SignatureCmp only consider baseName. It is equal.

@jinchengchenghh
Copy link
Contributor Author

jinchengchenghh commented Sep 13, 2024

This PR considers name, input arguments and physical result type to distinguish a function.

For decimal, if we solve it, we could remove physical result type from function identifier.

@mbasmanova
Copy link
Contributor

This PR considers name, input arguments and physical result type to distinguish a function.

For this to be useful, we need to remove "physical result type". Otherwise, we allow to store foo(int) -> int and foo(int) -> bool at the same time.

@jinchengchenghh
Copy link
Contributor Author

Yes, I will consider how to solve decimal issue, if you have any good idea, please ping me, thanks! @mbasmanova

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Register simple function should not consider result type
3 participants