Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support decimal operation precision not loss mode #10383

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

jinchengchenghh
Copy link
Contributor

@jinchengchenghh jinchengchenghh commented Jul 3, 2024

Register the decimal functions by setting SparkRegistrationConfig.allowPrecisionLoss to false to enable this feature.
Spark implementation: https://github.com/apache/spark/blob/branch-3.5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/arithmetic.scala#L814

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 3, 2024
Copy link

netlify bot commented Jul 3, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 204cab3
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/66f381fd7d6e7c000929ad26

@jinchengchenghh jinchengchenghh changed the title Support decimal operation not precision loss mode Support decimal operation precision not loss mode Jul 3, 2024
@jinchengchenghh
Copy link
Contributor Author

Can you help review this PR? Thanks! @rui-mo @PHILO-HE

@jinchengchenghh
Copy link
Contributor Author

hi @rui-mo @PHILO-HE Can I take your time to help review this PR?

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Added several comments.

}

std::pair<std::string, std::string>
getNotAllowPrecisionLossDivideResultScale() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name suggestion: getStrictDivideResultScale, and also document the difference with AllowPrecisionLoss being true.

/// rounding the decimal part of the result if an exact representation is not
/// possible. Otherwise, NULL is returned in those cases, as previously
static constexpr const char* kSparkDecimalOperationsAllowPrecisionLoss =
"spark.decimal_operations.allow_precision_loss";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps document this config and corresponding calculation for each arithmetic operation in https://github.com/facebookincubator/velox/blob/main/velox/docs/functions/spark/decimal.rst#division.

@@ -304,6 +304,13 @@ class QueryConfig {
/// The current spark partition id.
static constexpr const char* kSparkPartitionId = "spark.partition_id";

/// When true, establishing the result type of an arithmetic operation
/// happens according to Hive behavior and SQL ANSI 2011 specification, i.e.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove 'happens'

/// When true, establishing the result type of an arithmetic operation
/// happens according to Hive behavior and SQL ANSI 2011 specification, i.e.
/// rounding the decimal part of the result if an exact representation is not
/// possible. Otherwise, NULL is returned in those cases, as previously
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise, NULL is returned in those cases, as previously

nit: comment suggestion: NULL is returned when the actual result cannot be represented with the calculated decimal type.

- bool
- When true, establishing the result type of an arithmetic operation happens according to Hive behavior and SQL ANSI 2011 specification, i.e.
rounding the decimal part of the result if an exact representation is not
possible. Otherwise, NULL is returned in those cases, as previously
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

velox/functions/sparksql/DecimalUtil.h Outdated Show resolved Hide resolved
createDecimalFunction<Divide>);

VELOX_DECLARE_STATEFUL_VECTOR_FUNCTION(
udf_decimal_add_not_allow_precision_loss,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This decimal function signature is only used in test, so register it in test scope.
Downstream project can only set the query config to enable this feature.

Looks like the declaration is only used by test code. Can we also test the false behavior via setting the query config?

Copy link
Contributor Author

@jinchengchenghh jinchengchenghh Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@rui-mo rui-mo Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the declarations and registrations then? I wonder if it makes sense to register them as special forms if the result type of signature needs to determined at runtime.
https://github.com/facebookincubator/velox/pull/10383/files#diff-257bd1947469dfd8494c7e19f535d4a3247b3df086287110e53740c199104073R27-R37

Copy link
Contributor Author

@jinchengchenghh jinchengchenghh Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is different, with the registrations , we can test by the function name.
I prefer not, current implementation can cover the feature. We need to do many refactors if switching to special form. But if the community decides to do that, I will help to refactor.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel these declarations need to be reconsidered for below reasons.

  1. they are in the runtime code but are only used by tests.
  2. their signatures are generated with allowPrecisionLoss being false, but createDecimalFunction follows the query config to decide if the function allows precision loss. Therefore, its behavior can vary according to config but its signature is fixed.

To solve above issues, maybe we need to register them as special forms. How do you think? Thanks.

Copy link
Contributor Author

@jinchengchenghh jinchengchenghh Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it still has a problem, we need to register the decimal add name from add to decimal_add because the name should be unique in special form, can we register it as add for substrait plan conversion and downstream project plan conversion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also change here https://github.com/facebookincubator/velox/blob/main/velox/expression/ExprCompiler.cpp#L407 to break the special form name is unique, check if we can constructSpecialForm
Can we do that? @mbasmanova @rui-mo

Which way is better?

  1. current implementation;
  2. special form registered as decimal_add, substrait function resolution unique logic;
  3. special form resgistered as add, break special form name unique constraint.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor to current version, remove the most of the signatures, only keep input type as decimal and output type as unknown, construct Expr to test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the function signature requires returnType, but only input type and function name should specify the function, this also aligns with the sql function definition, we query by select hash(col); without return type.
And we don't permit the function with same argument and different return type, so I propose to change returnType in FunctionSignature to be optional, and for decimal functions, we don't need to supply returnType, the signature will be

  return {exec::FunctionSignatureBuilder()
              .integerVariable("a_precision")
              .integerVariable("a_scale")
              .integerVariable("b_precision")
              .integerVariable("b_scale")
              .argumentType("DECIMAL(a_precision, a_scale)")
              .argumentType("DECIMAL(b_precision, b_scale)")
              .build()};

@mbasmanova How about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jinchengchenghh
Copy link
Contributor Author

Thanks for your comments, I have address all the comments. @rui-mo

@@ -43,6 +57,8 @@ precision and scale are adjusted.
precision = 38
scale = max(38 - (p - s), min(s, 6))

Caps p ans s at 38 when not allowing precision loss.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a typo: ans

@@ -522,5 +545,50 @@ TEST_F(DecimalArithmeticTest, decimalDivTest) {
{makeConstant<int128_t>(
DecimalUtil::kLongDecimalMax, 1, DECIMAL(38, 0))});
}

TEST_F(DecimalArithmeticTest, notAllowPrecisionLoss) {
setDecimalOperationsAllowPrecisionLoss(false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I note the function name already contains "not_allow_precision_loss", is the above setting still necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this function name only exists in test code

/// When true, establishing the result type of an arithmetic operation
/// according to Hive behavior and SQL ANSI 2011 specification, i.e.
/// rounding the decimal part of the result if an exact representation is not
/// possible. Otherwise, NULL is returned when the actual result cannot be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests for NULL result case?

/// possible. Otherwise, NULL is returned when the actual result cannot be
/// represented with the calculated decimal type.
static constexpr const char* kSparkDecimalOperationsAllowPrecisionLoss =
"spark.decimal_operations.allow_precision_loss";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this config also controls the behavior of remainder and pmod in Spark. Maybe we need to clarify which operations are supported for now.

createDecimalFunction<Divide>);

VELOX_DECLARE_STATEFUL_VECTOR_FUNCTION(
udf_decimal_add_not_allow_precision_loss,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel these declarations need to be reconsidered for below reasons.

  1. they are in the runtime code but are only used by tests.
  2. their signatures are generated with allowPrecisionLoss being false, but createDecimalFunction follows the query config to decide if the function allows precision loss. Therefore, its behavior can vary according to config but its signature is fixed.

To solve above issues, maybe we need to register them as special forms. How do you think? Thanks.

@@ -778,25 +728,40 @@ std::shared_ptr<exec::VectorFunction> createDecimalFunction(
}
}
}

// Find the function according to input types under expression compiler.
// Mark the return type as unknown, the result type will be computed by Function
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a hack. Let's not do this.

It seems like function signatures are affected by configuration settings. If that's the case, let's come up with a way to allow access to configs during function registration time.

We do need to be able to resolve function calls without instantiating functions. This is what enables Fuzzer and unit testing.

FelixYBW pushed a commit to oap-project/velox that referenced this pull request Jul 25, 2024
@FelixYBW
Copy link
Contributor

@jinchengchenghh The PR is used by oap/velox, can you follow up?

zhztheplayer pushed a commit to oap-project/velox that referenced this pull request Jul 25, 2024
zhztheplayer pushed a commit to oap-project/velox that referenced this pull request Jul 25, 2024
zhztheplayer pushed a commit to oap-project/velox that referenced this pull request Jul 25, 2024
GlutenPerfBot pushed a commit to oap-project/velox that referenced this pull request Jul 25, 2024
zhztheplayer pushed a commit to oap-project/velox that referenced this pull request Jul 26, 2024
zhztheplayer pushed a commit to zhztheplayer/velox that referenced this pull request Jul 27, 2024
GlutenPerfBot pushed a commit to oap-project/velox that referenced this pull request Jul 29, 2024
GlutenPerfBot pushed a commit to oap-project/velox that referenced this pull request Jul 30, 2024
@jinchengchenghh
Copy link
Contributor Author

Address all the comments, can you help review again? @rui-mo

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added several comments on the documentation.

SparkRegistration Configuration properties
========================

Generic Configuration
Copy link
Collaborator

@rui-mo rui-mo Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowPrecisionLoss only applies to several functions so it is not a generic configuration. I feel this document might be not needed as the description in struct SparkRegistrationConfig is clear enough.

velox/docs/functions/spark/decimal.rst Outdated Show resolved Hide resolved
::

p = p1 - s1 + s2 + max(6, s1 + p2 + 1)
s = max(6, s1 + p2 + 1)

When config ``SparkRegistrationConfig.allowPrecisionLoss`` is set to false.
Copy link
Collaborator

@rui-mo rui-mo Jul 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document suggestion:

If the function is registered with ``allowPrecisionLoss`` being false:

::

intDig = min(38, p1 - s1 + s2);
decDig = min(38, max(6, s1 + p2 + 1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add one empty line below.

If ``indDig + decDig`` is more than 38:
::
p = 38
s = decDig - (intDig + decDig - 38) / 2 - 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

velox/docs/functions/spark/decimal.rst Outdated Show resolved Hide resolved
@@ -27,14 +27,28 @@ Multiplication

Division
--------

When config ``SparkRegistrationConfig.allowPrecisionLoss`` is set to true as default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document suggestion:

If the function is registered with ``allowPrecisionLoss`` being true (default behavior):

@@ -27,14 +27,28 @@ Multiplication

Division
--------

When config ``SparkRegistrationConfig.allowPrecisionLoss`` is set to true as default.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the description at L5 only covers part of case now, would you add some introduction for allowPrecisionLoss being false there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For add, subtract, multiply, the precision and scale computation is same, only adjust or bounded them.
But for divide, the precision and scale computation is different, so I only add for divide.
And document bounded behavior in the below.

typename A /* Argument1 */,
typename B /* Argument2 */,
typename Operation /* Arithmetic operation */>
class DecimalBaseFunction : public exec::VectorFunction {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also changes vector functions as simple functions. Does it make sense to do the vector-to-simple refactor in a separate PR, and support allowPrecisionLoss in this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree? If you think it is OK, I will only use SparkRegistrationConfig in this PR and revert to vector function. @mbasmanova

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we don't need to change back to vector function in this PR. After the separate PR lands, we can do a rebase to it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @rui-mo 's suggestion

return DecimalUtil::adjustPrecisionScale(precision, scale);
} else {
auto intDig = std::min(38, aPrecision - aScale + bScale);
auto decDig = std::min(38, std::max(6, aScale + bPrecision + 1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name suggestion:
intDig -> wholeDigits, decDig -> fractionDigits

}
}

std::vector<exec::SignatureVariable> getDivideConstraitsAllowPrecisionLoss() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constraits -> Constraints, ditto for the others.

GlutenPerfBot pushed a commit to oap-project/velox that referenced this pull request Jul 31, 2024
GlutenPerfBot pushed a commit to oap-project/velox that referenced this pull request Aug 1, 2024
GlutenPerfBot pushed a commit to oap-project/velox that referenced this pull request Aug 2, 2024
GlutenPerfBot pushed a commit to oap-project/velox that referenced this pull request Aug 3, 2024
GlutenPerfBot pushed a commit to oap-project/velox that referenced this pull request Aug 4, 2024
@jinchengchenghh
Copy link
Contributor Author

Do you have any more comments? @rui-mo

@@ -0,0 +1,21 @@
================================
SparkRegistration Configuration
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title suggestion:

Spark function registration configurations

================================

struct SparkRegistrationConfig
---------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: make the - of the same length as the letters above it.

velox/docs/functions/spark/config.rst Outdated Show resolved Hide resolved
// If result is representable with long decimal, the result
// scale is the maximum of 'aScale' and 'bScale'. If not, reduces result scale
// and caps the result precision at 38.
// Caps p and s at 38 when not allowing precision loss.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we Caps p and s at 38 when allowPrecisionLoss being true or false?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not allow precision and scale loss mode just Caps p and s at 38 while another one may reduce the scale.

/// possible. Otherwise, NULL is returned when the actual result cannot be
/// represented with the calculated decimal type. Now we support add,
/// subtract, multiply and divide operations.
bool allowPrecisionLoss = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems try to enabled per-function config? in Spark this should be a session level config and users may modify it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like current implementation only provides fixed-behavior after the functions are registered. We might need query config if we need to make them adjustable at runtime.

==========================================

struct SparkRegistrationConfig
------------------------------
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps remove this subtitle as it is more like an implementation stuff.

- When true, establishing the result type of an arithmetic operation according to Hive behavior and SQL ANSI 2011 specification, i.e.
rounding the decimal part of the result if an exact representation is not
possible. Otherwise, NULL is returned when the actual result cannot be represented with the calculated decimal type. Now we support add,
subtract, multiply and divide operations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc suggestion:

This configuration is currently in use for decimal add, subtract, multiply and divide operations.

@@ -43,6 +46,26 @@ precision and scale are adjusted.
precision = 38
scale = max(38 - (p - s), min(s, 6))

Caps p and s at 38 when not allowing precision loss.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the case as below?

When the precision of result exceeds 38, caps p at 38 and the scale should not be reduced when not allowing precision loss.

::

wholeDigits = min(38, p1 - s1 + s2);
fractionalDigits = min(38, max(6, s1 + p2 + 1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Decimal Precision and Scale Adjustment section discusses how to handle the case when the calculated precision exceeds 38, so I assume this part and L66 belong elsewhere (L33 I think). Perhaps only keep the description for precision and scale adjustment at L57.

For decimal addition, subtraction, multiplication, the precision and scale computation logic is same

This can be described in the corresponding location, such as L12 or L20.

@@ -4,6 +4,8 @@ Spark Functions

The semantics of Spark functions match Spark 3.5 with ANSI OFF.

Spark functions can be registered by :doc:`struct SparkRegistrationConfig <functions/spark/config>`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc suggestion:
Spark functions can be registered with :doc:`user specified configurations <functions/spark/config>`.

/// possible. Otherwise, NULL is returned when the actual result cannot be
/// represented with the calculated decimal type. Now we support add,
/// subtract, multiply and divide operations.
bool allowPrecisionLoss = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like current implementation only provides fixed-behavior after the functions are registered. We might need query config if we need to make them adjustable at runtime.

@jinchengchenghh
Copy link
Contributor Author

As this discussion #10467, we cannot set it at runtime. @rui-mo

@jinchengchenghh
Copy link
Contributor Author

Any more comments? Thanks! @rui-mo

@jinchengchenghh
Copy link
Contributor Author

Soft reminder, do you have any more comment? @rui-mo

@rui-mo
Copy link
Collaborator

rui-mo commented Sep 10, 2024

Talked with @jinchengchenghh offline, and she will help update this PR to support session-level precision loss mode.

@jinchengchenghh
Copy link
Contributor Author

Updated, thanks! @rui-mo

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Added several comments on the documentation, and will take a look at the implementation later on.

@@ -9,6 +9,8 @@ https://cwiki.apache.org/confluence/download/attachments/27362075/Hive_Decimal_P

https://msdn.microsoft.com/en-us/library/ms190476.aspx

For decimal addition, subtraction, multiplication, the precision and scale computation logic is same.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment suggestion:

Additionally, the computation of decimal division adapts to the allowing precision loss parameter, while the decimal addition, subtraction, and multiplication do not.


::

p = p1 - s1 + s2 + max(6, s1 + p2 + 1)
s = max(6, s1 + p2 + 1)

When not allowing precision loss:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When denying precision loss:


Decimal Precision and Scale Adjustment
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

For above arithmetic operators, when the precision of result exceeds 38,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the precision of result exceeds 38 and precision loss is allowed,

For above arithmetic operators, when the precision of result exceeds 38,
caps p at 38 and reduces the scale, in order to prevent the truncation of
caps p at 38 and reduces the scale when allowing precision loss, in order to prevent the truncation of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove ' when allowing precision loss'.

@@ -43,6 +57,20 @@ precision and scale are adjusted.
precision = 38
scale = max(38 - (p - s), min(s, 6))

Caps p at 38 and the scale should not be reduced when not allowing precision loss.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When precision loss is denied, caps p at 38, and the scale should not be reduced.

@@ -43,6 +57,20 @@ precision and scale are adjusted.
precision = 38
scale = max(38 - (p - s), min(s, 6))

Caps p at 38 and the scale should not be reduced when not allowing precision loss.
For decimal addition, subtraction, multiplication, the adjustment logic is same.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The below formula shows how the precision and scale are adjusted for decimal addition, subtraction, and multiplication.


::
precision = 38
scale = min(38, s)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part does not render correctly. Perhaps one empty line after L63 is needed.

precision = 38
scale = min(38, s)

But for decimal division, it is different as following:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decimal division uses a different formula, which is as follows:

Copy link
Collaborator

@rui-mo rui-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@@ -253,10 +254,12 @@ struct DecimalAddSubtractBase {
}

// Computes the result precision and scale for decimal add and subtract
// operations following Hive's formulas.
// operations following Hive's formulas when `allowPrecisionLoss` is true.
// If result is representable with long decimal, the result
// scale is the maximum of 'aScale' and 'bScale'. If not, reduces result scale
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is only on whether 'adjustPrecisionScale' or 'bounded' is called, so we need to move the change on L257 perhaps before 'If not'.

@@ -330,8 +337,10 @@ struct DecimalMultiplyFunction {
B* /*b*/) {
auto [aPrecision, aScale] = getDecimalPrecisionScale(*inputTypes[0]);
auto [bPrecision, bScale] = getDecimalPrecisionScale(*inputTypes[1]);
auto [rPrecision, rScale] = DecimalUtil::adjustPrecisionScale(
aPrecision + bPrecision + 1, aScale + bScale);
auto [rPrecision, rScale] = allowPrecisionLoss
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if constexpr (allowPrecisionLoss)

bool registerFunction(
const std::vector<std::string>& aliases = {},
const std::vector<exec::SignatureVariable>& constraints = {},
bool overwrite = true) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to be the same as an existing function. Can we reuse it?

bool registerFunction(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, template bool is different, and using funcClass = typename Func::template udf<exec::VectorExec>; is different, so I add new binder to support template function with variable bool.

template <template <class, bool> typename T, bool allowPrecisionLoss>
using ParameterBinder = TempWrapper<T<exec::VectorExec, allowPrecisionLoss>>;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should move it to velox/velox/functions/Registerer.h in a separate PR?
It could allow function with constraints accept template.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems compared with the below function, the new one added 'constraints'. It this the only difference? Thanks.

bool registerFunction(
const std::vector<std::string>& aliases = {},
bool overwrite = true) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I wonder if we can consolidate these APIs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Perhaps we can change the API in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to only keep this one or keep both functions but the latter changes from Func<exec::VectorExec> to typename Func::template udf<exec::VectorExec>, for the backward compatibility, I would prefer keep both. The first could revoke the latter with the argument constraints = {}.

bool registerFunction(
    const std::vector<std::string>& aliases = {},
    const std::vector<exec::SignatureVariable>& constraints = {},
    bool overwrite = true) {
using funcClass = typename Func::template udf<exec::VectorExec>;
}

As the comment indicates, the wrapper for udf struct will be removed finally, so can I do that? @mbasmanova
CC @rui-mo

// New registration function; mostly a copy from the function above, but taking
// the inner "udf" struct directly, instead of the wrapper. We can keep both for
// a while to maintain backwards compatibility, but the idea is to remove the
// one above eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactor the register by using

template <typename TExec>
using DivideFunctionAllowPrecisionLoss = DecimalDivideFunction<TExec, true>;

like https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/GreatestLeast.h#L127

Can you help review again? Thanks! @rui-mo

registerDecimalBinary<Func, true>(
name, makeConstraints(rPrecision, rScale, true));
registerDecimalBinary<Func, false>(
name + "_not_allow_precision_loss",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suffix '_not_allow_precision_loss' is used several times. Consider defining a variable for it and renaming as '_deny_precision_loss'.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants