-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix register simple function SignatureMap overwrite #10811
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
If remove the physical type match for result type, we will get this exception because decimal registered as (short, short) -> short and (short, short) -> long.
|
Can you help review this PR? Thanks! @mbasmanova |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jinchengchenghh I'm not sure what change does and I don't see a unit test.
Initially, I thought that we should use name + input-args as the key and not allow duplicates, but you reminded me that this doesn't work for decimal type. What's the trick to make this work?
In my thought, decimal should be considered as one type
But now, it's physical type is different, so we take it as different type. I couldn't think of a good idea to solve it. Exists UT can cover this change, I change some UTs. |
This PR change the SignatureMap key compare logic. Change from |
@@ -518,7 +519,7 @@ class SimpleFunctionMetadata : public ISimpleFunctionMetadata { | |||
std::string helpMessage(const std::string& name) const final { | |||
// return fmt::format("{}({})", name, signature_->toString()); | |||
std::string s{name}; | |||
s.append("("); | |||
s.append("\nSignature argument types:\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change? The comment on L520 needs updating.
Can we share an example of the new message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in a separate PR #10821, you notice it after then.
@@ -60,6 +60,9 @@ bool SimpleFunctionRegistry::registerFunctionInternal( | |||
metadata->defaultNullBehavior(), otherMetadata.defaultNullBehavior()); | |||
VELOX_USER_CHECK_EQ( | |||
metadata->isDeterministic(), otherMetadata.isDeterministic()); | |||
VELOX_USER_CHECK_EQ( | |||
metadata->signature()->variableArity(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a separate change, right? Perhaps, it can be extracted into a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this Patch, signature is the key used for comparison, variableArity
is always equal in that case, after this change, we don't allow to register func(int)
and func(int...)
at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Would you extract this change into a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #10980
using SignatureMap = std::unordered_map< | ||
FunctionSignature, | ||
std::vector<std::unique_ptr<const FunctionEntry>>>; | ||
std::vector<std::unique_ptr<const FunctionEntry>>, | ||
SignatureHash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is:
- SignatureHash includes argument types, but not return type
- SignatureCmd includes argument types, but in a limited way (map(k, v) compares equal to map(k1, v1) no matter what k, v, k1, v1 are.
This means that 2 entries may have different hashes, but compare the same. This is a violation of a core contract for hash/compare. We cannot do that.
Let me know if I missed something.
CC: @rui-mo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are right.
I change the SignatureHash to only consider base name like SignatureCmd.
@jinchengchenghh I agree. Let's figure out how to achieve that. |
2d4cc57
to
d613251
Compare
@jinchengchenghh I feel this needs some more thinking / designing. A function has name, arguments and result type. Ideally, result type is a function of name + argument, but this is not the case for functions that use decimal types (either in argument, result type or both). I believe we need to fix that first. Like you mentioned earlier, for the purpose of function signatures we should have a single decimal(p, s) type and we should allow a single signature be backed by multiple implementations. If we leave the decimal problem aside for a moment, we need a way to tell whether two signatures are distinguishable or not. For example, foo(int) and foo(int...) are not distinguishable. We don't want to allow both to be present in the registry at the same time. This is a standard problem in compilers. Wondering how they solve this. Might be good to look that up and see if we can implement similar solution. |
// element type, so as Decimal, List, Struct.let the metadata | ||
// physicalSignatureEquals which will consider the child data type to | ||
// decide if overwrite. | ||
if (l.argumentTypes()[i].baseName() != r.argumentTypes()[i].baseName()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this works. We cannot store 2 entries in a map if their keys compare as equal. Hence, we won't be able to store foo(array(bigint)) and foo(array(boolean)) at the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IN the signature, we use array(T)
to represents array(bigint)
and array(boolean)
, so the baseName is equal.
Suppose we have registered array(bigint)
, now we try to register array<boolean>
, the functions
size is 1, then the metadata->physicalSignatureEquals(otherMetadata)
condition is false, so we can insert this FunctionEntry array(boolean)
.
https://github.com/facebookincubator/velox/blob/main/velox/expression/SimpleFunctionRegistry.cpp#L44
https://github.com/facebookincubator/velox/blob/main/velox/expression/SimpleFunctionRegistry.cpp#L49
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppose we have registered array(bigint), now we try to register array, the functions size is 1, then the metadata->physicalSignatureEquals(otherMetadata) condition is false, so we can insert this FunctionEntry array(boolean).
A hash map contract requires that all keys stores in the map compare not equal. We should not write code that don't honor this contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, so I have changed to both the SignatureMap and SignatureCmp only consider baseName
. It is equal.
This PR considers name, input arguments and physical result type to distinguish a function. For decimal, if we solve it, we could remove |
For this to be useful, we need to remove "physical result type". Otherwise, we allow to store foo(int) -> int and foo(int) -> bool at the same time. |
Yes, I will consider how to solve decimal issue, if you have any good idea, please ping me, thanks! @mbasmanova |
Before this PR, the
SignatureMap
key isFunctionSignature
and will compare all the fields of it. But fieldsstd::unordered_map<std::string, SignatureVariable> variables
andTypeSignature returnType
is different, causing registering functions only differs with result type. It should only consider name and input types and not allow registering functions that differ only in result type. But for decimal registration, (short, short) -> short, (short, short) -> long exists, so I only relax the result type computation logic, allow the result type diffs.This PR introduces
SignatureHash
andSignatureCmp
toSignatureMap
, which only consider the argument type base name, and check metadata's physical argument type and physical result type bykindEquals
which will check the child data type to determine whether to overwrite.Resolves: #10602