-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support decimal operation precision not loss mode #10383
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for meta-velox canceled.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added several comments.
} | ||
|
||
std::pair<std::string, std::string> | ||
getNotAllowPrecisionLossDivideResultScale() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: name suggestion: getStrictDivideResultScale, and also document the difference with AllowPrecisionLoss being true.
velox/core/QueryConfig.h
Outdated
/// rounding the decimal part of the result if an exact representation is not | ||
/// possible. Otherwise, NULL is returned in those cases, as previously | ||
static constexpr const char* kSparkDecimalOperationsAllowPrecisionLoss = | ||
"spark.decimal_operations.allow_precision_loss"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps document this config and corresponding calculation for each arithmetic operation in https://github.com/facebookincubator/velox/blob/main/velox/docs/functions/spark/decimal.rst#division.
velox/core/QueryConfig.h
Outdated
@@ -304,6 +304,13 @@ class QueryConfig { | |||
/// The current spark partition id. | |||
static constexpr const char* kSparkPartitionId = "spark.partition_id"; | |||
|
|||
/// When true, establishing the result type of an arithmetic operation | |||
/// happens according to Hive behavior and SQL ANSI 2011 specification, i.e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove 'happens'
velox/core/QueryConfig.h
Outdated
/// When true, establishing the result type of an arithmetic operation | ||
/// happens according to Hive behavior and SQL ANSI 2011 specification, i.e. | ||
/// rounding the decimal part of the result if an exact representation is not | ||
/// possible. Otherwise, NULL is returned in those cases, as previously |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, NULL is returned in those cases, as previously
nit: comment suggestion: NULL is returned when the actual result cannot be represented with the calculated decimal type.
velox/docs/configs.rst
Outdated
- bool | ||
- When true, establishing the result type of an arithmetic operation happens according to Hive behavior and SQL ANSI 2011 specification, i.e. | ||
rounding the decimal part of the result if an exact representation is not | ||
possible. Otherwise, NULL is returned in those cases, as previously |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
createDecimalFunction<Divide>); | ||
|
||
VELOX_DECLARE_STATEFUL_VECTOR_FUNCTION( | ||
udf_decimal_add_not_allow_precision_loss, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This decimal function signature is only used in test, so register it in test scope.
Downstream project can only set the query config to enable this feature.
Looks like the declaration is only used by test code. Can we also test the false behavior via setting the query config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still need the declarations and registrations then? I wonder if it makes sense to register them as special forms if the result type of signature needs to determined at runtime.
https://github.com/facebookincubator/velox/pull/10383/files#diff-257bd1947469dfd8494c7e19f535d4a3247b3df086287110e53740c199104073R27-R37
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is different, with the registrations , we can test by the function name.
I prefer not, current implementation can cover the feature. We need to do many refactors if switching to special form. But if the community decides to do that, I will help to refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel these declarations need to be reconsidered for below reasons.
- they are in the runtime code but are only used by tests.
- their signatures are generated with
allowPrecisionLoss
being false, butcreateDecimalFunction
follows the query config to decide if the function allows precision loss. Therefore, its behavior can vary according to config but its signature is fixed.
To solve above issues, maybe we need to register them as special forms. How do you think? Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And it still has a problem, we need to register the decimal add name from add
to decimal_add
because the name should be unique in special form, can we register it as add
for substrait plan conversion and downstream project plan conversion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also change here https://github.com/facebookincubator/velox/blob/main/velox/expression/ExprCompiler.cpp#L407 to break the special form name is unique, check if we can constructSpecialForm
Can we do that? @mbasmanova @rui-mo
Which way is better?
- current implementation;
- special form registered as
decimal_add
, substrait function resolution unique logic; - special form resgistered as
add
, break special form name unique constraint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactor to current version, remove the most of the signatures, only keep input type as decimal and output type as unknown, construct Expr
to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the function signature requires returnType
, but only input type
and function name
should specify the function, this also aligns with the sql function definition, we query by select hash(col);
without return type.
And we don't permit the function with same argument and different return type, so I propose to change returnType
in FunctionSignature
to be optional, and for decimal functions, we don't need to supply returnType, the signature will be
return {exec::FunctionSignatureBuilder()
.integerVariable("a_precision")
.integerVariable("a_scale")
.integerVariable("b_precision")
.integerVariable("b_scale")
.argumentType("DECIMAL(a_precision, a_scale)")
.argumentType("DECIMAL(b_precision, b_scale)")
.build()};
@mbasmanova How about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @zhouyuan
Thanks for your comments, I have address all the comments. @rui-mo |
@@ -43,6 +57,8 @@ precision and scale are adjusted. | |||
precision = 38 | |||
scale = max(38 - (p - s), min(s, 6)) | |||
|
|||
Caps p ans s at 38 when not allowing precision loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks a typo: ans
@@ -522,5 +545,50 @@ TEST_F(DecimalArithmeticTest, decimalDivTest) { | |||
{makeConstant<int128_t>( | |||
DecimalUtil::kLongDecimalMax, 1, DECIMAL(38, 0))}); | |||
} | |||
|
|||
TEST_F(DecimalArithmeticTest, notAllowPrecisionLoss) { | |||
setDecimalOperationsAllowPrecisionLoss(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I note the function name already contains "not_allow_precision_loss", is the above setting still necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, because we construct the class Function by this config https://github.com/facebookincubator/velox/pull/10383/files#diff-4baee9f82f9c347f753972415d345941d2c2a110b608d480b090650171da096bR754
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this function name only exists in test code
velox/core/QueryConfig.h
Outdated
/// When true, establishing the result type of an arithmetic operation | ||
/// according to Hive behavior and SQL ANSI 2011 specification, i.e. | ||
/// rounding the decimal part of the result if an exact representation is not | ||
/// possible. Otherwise, NULL is returned when the actual result cannot be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some tests for NULL result case?
velox/core/QueryConfig.h
Outdated
/// possible. Otherwise, NULL is returned when the actual result cannot be | ||
/// represented with the calculated decimal type. | ||
static constexpr const char* kSparkDecimalOperationsAllowPrecisionLoss = | ||
"spark.decimal_operations.allow_precision_loss"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this config also controls the behavior of remainder and pmod in Spark. Maybe we need to clarify which operations are supported for now.
createDecimalFunction<Divide>); | ||
|
||
VELOX_DECLARE_STATEFUL_VECTOR_FUNCTION( | ||
udf_decimal_add_not_allow_precision_loss, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel these declarations need to be reconsidered for below reasons.
- they are in the runtime code but are only used by tests.
- their signatures are generated with
allowPrecisionLoss
being false, butcreateDecimalFunction
follows the query config to decide if the function allows precision loss. Therefore, its behavior can vary according to config but its signature is fixed.
To solve above issues, maybe we need to register them as special forms. How do you think? Thanks.
@@ -778,25 +728,40 @@ std::shared_ptr<exec::VectorFunction> createDecimalFunction( | |||
} | |||
} | |||
} | |||
|
|||
// Find the function according to input types under expression compiler. | |||
// Mark the return type as unknown, the result type will be computed by Function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a hack. Let's not do this.
It seems like function signatures are affected by configuration settings. If that's the case, let's come up with a way to allow access to configs during function registration time.
We do need to be able to resolve function calls without instantiating functions. This is what enables Fuzzer and unit testing.
@jinchengchenghh The PR is used by oap/velox, can you follow up? |
Address all the comments, can you help review again? @rui-mo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added several comments on the documentation.
SparkRegistration Configuration properties | ||
======================== | ||
|
||
Generic Configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allowPrecisionLoss
only applies to several functions so it is not a generic configuration. I feel this document might be not needed as the description in struct SparkRegistrationConfig
is clear enough.
:: | ||
|
||
p = p1 - s1 + s2 + max(6, s1 + p2 + 1) | ||
s = max(6, s1 + p2 + 1) | ||
|
||
When config ``SparkRegistrationConfig.allowPrecisionLoss`` is set to false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document suggestion:
If the function is registered with ``allowPrecisionLoss`` being false:
:: | ||
|
||
intDig = min(38, p1 - s1 + s2); | ||
decDig = min(38, max(6, s1 + p2 + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add one empty line below.
If ``indDig + decDig`` is more than 38: | ||
:: | ||
p = 38 | ||
s = decDig - (intDig + decDig - 38) / 2 - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
@@ -27,14 +27,28 @@ Multiplication | |||
|
|||
Division | |||
-------- | |||
|
|||
When config ``SparkRegistrationConfig.allowPrecisionLoss`` is set to true as default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document suggestion:
If the function is registered with ``allowPrecisionLoss`` being true (default behavior):
@@ -27,14 +27,28 @@ Multiplication | |||
|
|||
Division | |||
-------- | |||
|
|||
When config ``SparkRegistrationConfig.allowPrecisionLoss`` is set to true as default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the description at L5 only covers part of case now, would you add some introduction for allowPrecisionLoss being false there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For add, subtract, multiply, the precision and scale computation is same, only adjust or bounded them.
But for divide, the precision and scale computation is different, so I only add for divide.
And document bounded behavior in the below.
typename A /* Argument1 */, | ||
typename B /* Argument2 */, | ||
typename Operation /* Arithmetic operation */> | ||
class DecimalBaseFunction : public exec::VectorFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR also changes vector functions as simple functions. Does it make sense to do the vector-to-simple refactor in a separate PR, and support allowPrecisionLoss in this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you agree? If you think it is OK, I will only use SparkRegistrationConfig
in this PR and revert to vector function. @mbasmanova
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we don't need to change back to vector function in this PR. After the separate PR lands, we can do a rebase to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @rui-mo 's suggestion
return DecimalUtil::adjustPrecisionScale(precision, scale); | ||
} else { | ||
auto intDig = std::min(38, aPrecision - aScale + bScale); | ||
auto decDig = std::min(38, std::max(6, aScale + bPrecision + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name suggestion:
intDig -> wholeDigits, decDig -> fractionDigits
} | ||
} | ||
|
||
std::vector<exec::SignatureVariable> getDivideConstraitsAllowPrecisionLoss() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constraits -> Constraints, ditto for the others.
76c5795
to
182d32a
Compare
Do you have any more comments? @rui-mo |
@@ -0,0 +1,21 @@ | |||
================================ | |||
SparkRegistration Configuration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Title suggestion:
Spark function registration configurations
================================ | ||
|
||
struct SparkRegistrationConfig | ||
--------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: make the -
of the same length as the letters above it.
// If result is representable with long decimal, the result | ||
// scale is the maximum of 'aScale' and 'bScale'. If not, reduces result scale | ||
// and caps the result precision at 38. | ||
// Caps p and s at 38 when not allowing precision loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we Caps p and s at 38
when allowPrecisionLoss being true or false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not allow precision and scale loss mode just Caps p and s at 38
while another one may reduce the scale.
/// possible. Otherwise, NULL is returned when the actual result cannot be | ||
/// represented with the calculated decimal type. Now we support add, | ||
/// subtract, multiply and divide operations. | ||
bool allowPrecisionLoss = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems try to enabled per-function config? in Spark this should be a session level config and users may modify it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like current implementation only provides fixed-behavior after the functions are registered. We might need query config if we need to make them adjustable at runtime.
========================================== | ||
|
||
struct SparkRegistrationConfig | ||
------------------------------ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps remove this subtitle as it is more like an implementation stuff.
- When true, establishing the result type of an arithmetic operation according to Hive behavior and SQL ANSI 2011 specification, i.e. | ||
rounding the decimal part of the result if an exact representation is not | ||
possible. Otherwise, NULL is returned when the actual result cannot be represented with the calculated decimal type. Now we support add, | ||
subtract, multiply and divide operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc suggestion:
This configuration is currently in use for decimal add, subtract, multiply and divide operations.
@@ -43,6 +46,26 @@ precision and scale are adjusted. | |||
precision = 38 | |||
scale = max(38 - (p - s), min(s, 6)) | |||
|
|||
Caps p and s at 38 when not allowing precision loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case as below?
When the precision of result exceeds 38, caps p at 38 and the scale should not be reduced when not allowing precision loss.
:: | ||
|
||
wholeDigits = min(38, p1 - s1 + s2); | ||
fractionalDigits = min(38, max(6, s1 + p2 + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Decimal Precision and Scale Adjustment
section discusses how to handle the case when the calculated precision exceeds 38, so I assume this part and L66 belong elsewhere (L33 I think). Perhaps only keep the description for precision and scale adjustment at L57.
For decimal addition, subtraction, multiplication, the precision and scale computation logic is same
This can be described in the corresponding location, such as L12 or L20.
velox/docs/spark_functions.rst
Outdated
@@ -4,6 +4,8 @@ Spark Functions | |||
|
|||
The semantics of Spark functions match Spark 3.5 with ANSI OFF. | |||
|
|||
Spark functions can be registered by :doc:`struct SparkRegistrationConfig <functions/spark/config>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doc suggestion:
Spark functions can be registered with :doc:`user specified configurations <functions/spark/config>`.
/// possible. Otherwise, NULL is returned when the actual result cannot be | ||
/// represented with the calculated decimal type. Now we support add, | ||
/// subtract, multiply and divide operations. | ||
bool allowPrecisionLoss = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like current implementation only provides fixed-behavior after the functions are registered. We might need query config if we need to make them adjustable at runtime.
Any more comments? Thanks! @rui-mo |
Soft reminder, do you have any more comment? @rui-mo |
Talked with @jinchengchenghh offline, and she will help update this PR to support session-level precision loss mode. |
9d14c62
to
0dc20f6
Compare
Updated, thanks! @rui-mo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Added several comments on the documentation, and will take a look at the implementation later on.
@@ -9,6 +9,8 @@ https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_P | |||
|
|||
https://msdn.microsoft.com/en-us/library/ms190476.aspx | |||
|
|||
For decimal addition, subtraction, multiplication, the precision and scale computation logic is same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment suggestion:
Additionally, the computation of decimal division adapts to the allowing precision loss parameter, while the decimal addition, subtraction, and multiplication do not.
|
||
:: | ||
|
||
p = p1 - s1 + s2 + max(6, s1 + p2 + 1) | ||
s = max(6, s1 + p2 + 1) | ||
|
||
When not allowing precision loss: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When denying precision loss:
|
||
Decimal Precision and Scale Adjustment | ||
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< | ||
|
||
For above arithmetic operators, when the precision of result exceeds 38, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when the precision of result exceeds 38 and precision loss is allowed,
For above arithmetic operators, when the precision of result exceeds 38, | ||
caps p at 38 and reduces the scale, in order to prevent the truncation of | ||
caps p at 38 and reduces the scale when allowing precision loss, in order to prevent the truncation of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove ' when allowing precision loss'.
@@ -43,6 +57,20 @@ precision and scale are adjusted. | |||
precision = 38 | |||
scale = max(38 - (p - s), min(s, 6)) | |||
|
|||
Caps p at 38 and the scale should not be reduced when not allowing precision loss. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When precision loss is denied, caps p at 38, and the scale should not be reduced.
@@ -43,6 +57,20 @@ precision and scale are adjusted. | |||
precision = 38 | |||
scale = max(38 - (p - s), min(s, 6)) | |||
|
|||
Caps p at 38 and the scale should not be reduced when not allowing precision loss. | |||
For decimal addition, subtraction, multiplication, the adjustment logic is same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The below formula shows how the precision and scale are adjusted for decimal addition, subtraction, and multiplication.
|
||
:: | ||
precision = 38 | ||
scale = min(38, s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part does not render correctly. Perhaps one empty line after L63 is needed.
precision = 38 | ||
scale = min(38, s) | ||
|
||
But for decimal division, it is different as following: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decimal division uses a different formula, which is as follows:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
@@ -253,10 +254,12 @@ struct DecimalAddSubtractBase { | |||
} | |||
|
|||
// Computes the result precision and scale for decimal add and subtract | |||
// operations following Hive's formulas. | |||
// operations following Hive's formulas when `allowPrecisionLoss` is true. | |||
// If result is representable with long decimal, the result | |||
// scale is the maximum of 'aScale' and 'bScale'. If not, reduces result scale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is only on whether 'adjustPrecisionScale' or 'bounded' is called, so we need to move the change on L257 perhaps before 'If not'.
@@ -330,8 +337,10 @@ struct DecimalMultiplyFunction { | |||
B* /*b*/) { | |||
auto [aPrecision, aScale] = getDecimalPrecisionScale(*inputTypes[0]); | |||
auto [bPrecision, bScale] = getDecimalPrecisionScale(*inputTypes[1]); | |||
auto [rPrecision, rScale] = DecimalUtil::adjustPrecisionScale( | |||
aPrecision + bPrecision + 1, aScale + bScale); | |||
auto [rPrecision, rScale] = allowPrecisionLoss |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if constexpr (allowPrecisionLoss)
bool registerFunction( | ||
const std::vector<std::string>& aliases = {}, | ||
const std::vector<exec::SignatureVariable>& constraints = {}, | ||
bool overwrite = true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems to be the same as an existing function. Can we reuse it?
velox/velox/functions/Registerer.h
Line 63 in e13c591
bool registerFunction( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, template bool is different, and using funcClass = typename Func::template udf<exec::VectorExec>;
is different, so I add new binder to support template function with variable bool.
template <template <class, bool> typename T, bool allowPrecisionLoss>
using ParameterBinder = TempWrapper<T<exec::VectorExec, allowPrecisionLoss>>;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should move it to velox/velox/functions/Registerer.h in a separate PR?
It could allow function with constraints accept template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems compared with the below function, the new one added 'constraints'. It this the only difference? Thanks.
velox/velox/functions/Registerer.h
Lines 45 to 47 in e13c591
bool registerFunction( | |
const std::vector<std::string>& aliases = {}, | |
bool overwrite = true) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I wonder if we can consolidate these APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Perhaps we can change the API in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to only keep this one or keep both functions but the latter changes from Func<exec::VectorExec>
to typename Func::template udf<exec::VectorExec>
, for the backward compatibility, I would prefer keep both. The first could revoke the latter with the argument constraints = {}
.
bool registerFunction(
const std::vector<std::string>& aliases = {},
const std::vector<exec::SignatureVariable>& constraints = {},
bool overwrite = true) {
using funcClass = typename Func::template udf<exec::VectorExec>;
}
As the comment indicates, the wrapper for udf struct will be removed finally, so can I do that? @mbasmanova
CC @rui-mo
// New registration function; mostly a copy from the function above, but taking
// the inner "udf" struct directly, instead of the wrapper. We can keep both for
// a while to maintain backwards compatibility, but the idea is to remove the
// one above eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactor the register by using
template <typename TExec>
using DivideFunctionAllowPrecisionLoss = DecimalDivideFunction<TExec, true>;
like https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/GreatestLeast.h#L127
Can you help review again? Thanks! @rui-mo
registerDecimalBinary<Func, true>( | ||
name, makeConstraints(rPrecision, rScale, true)); | ||
registerDecimalBinary<Func, false>( | ||
name + "_not_allow_precision_loss", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suffix '_not_allow_precision_loss' is used several times. Consider defining a variable for it and renaming as '_deny_precision_loss'.
726d7b2
to
61f976b
Compare
Register the decimal functions by setting
SparkRegistrationConfig.allowPrecisionLoss
to false to enable this feature.Spark implementation: https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L814