Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent result for Spark Round function #10929

Open
rui-mo opened this issue Sep 5, 2024 · 1 comment
Open

Inconsistent result for Spark Round function #10929

rui-mo opened this issue Sep 5, 2024 · 1 comment
Assignees
Labels
bug Something isn't working triage Newly created issue that needs attention.

Comments

@rui-mo
Copy link
Collaborator

rui-mo commented Sep 5, 2024

Bug description

We noticed that the result was inconsistent with Spark for round(0.575, 2). The desired result is 0.58, but since double cannot accurately represent 0.575, which is 0.57499999999999999, we obtained 0.57. #5992 was proposed to fix this case, which uses std::nextafter to enlarge the input slightly, yielding a result of 0.58. This change was not accepted by Velox since receiving double means we can not recover the original value.

We introduced this function internally to Gluten to handle the majority of cases, but for a high-precision case like round (0.5549999999999999, 2), the result will become 0.56 rather than 0.55. This prompts us to reconsider this issue. One idea is to map the Spark double type to a higher-precision C++ type, such as boost::multiprecision::cpp_dec_float_50, but this will take a lot of changes in Velox.

The following table lists the expected and actual results from the Velox and Gluten rounds for some special cases. All cases can be reproduced with 8408d33.

Input Round to scale Expected Velox round Gluten round
0.575 2 0.58 0.57 0.58
0.5549999999999999 2 0.55 0.56 0.56
0.499999999999994 0 0 1 0

* Velox round: refers to the round implementation under presosql in Velox.
* Gluten round: refers to the Gluten proposed round implementation with nextafter used.

Additionally, the Velox round results do not fully match the Presto outputs.

presto:test> select round(cast(0.575 as double), 2), round(cast(0.5549999999999999 as double), 2), round(cast(0.499999999999994 as double), 0);
 _col0 | _col1 | _col2 
-------+-------+-------
  0.57 |  0.55 |   0.0 

System information

Velox System Info v0.0.2
Commit: fc5889f
CMake Version: 3.28.3
System: Linux-5.4.0-189-generic
Arch: x86_64
C++ Compiler: /usr/bin/c++
C++ Compiler Version: 11.1.0
C Compiler: /usr/bin/cc
C Compiler Version: 11.1.0
CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

@rui-mo rui-mo added bug Something isn't working triage Newly created issue that needs attention. labels Sep 5, 2024
@rui-mo
Copy link
Collaborator Author

rui-mo commented Sep 5, 2024

@majetideepak Opened this issue for the round result issue we met and added reproducible tests. Could you spare some time to take a look? Thank you.

@majetideepak majetideepak self-assigned this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Newly created issue that needs attention.
Projects
None yet
Development

No branches or pull requests

2 participants