Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add high order function for checking equality after transformation #238

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

eborden
Copy link

@eborden eborden commented Oct 24, 2018

It is very common to transform two Text values and then compare them
for equality.

toCaseFold x == toCaseFold y

This operation ends up being costly because it must materialize each
newly transformed Text in order to check their equality. Instead we
can provide a helper for such patterns transformEq, which relies on
the underlying ustream to check equality and avoids materializing the
transformed Text in the average case.

Benchmark results for this operation show a worst case that is close
to the naive comparison and a best case which is much more agreeable.

benchmarking transformEq/Text ==: Eq
time                 629.6 ns   (625.7 ns .. 634.5 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 630.0 ns   (626.6 ns .. 634.2 ns)
std dev              12.93 ns   (9.856 ns .. 16.21 ns)
variance introduced by outliers: 25% (moderately inflated)

benchmarking transformEq/Text transformEq: Eq
time                 578.4 ns   (575.6 ns .. 581.2 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 577.9 ns   (575.1 ns .. 581.3 ns)
std dev              10.35 ns   (8.092 ns .. 13.02 ns)
variance introduced by outliers: 21% (moderately inflated)
benchmarking transformEq/Text ==: Not Eq
time                 663.1 ns   (659.3 ns .. 666.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 660.4 ns   (657.9 ns .. 664.4 ns)
std dev              10.49 ns   (7.178 ns .. 16.44 ns)
variance introduced by outliers: 17% (moderately inflated)

benchmarking transformEq/Text transformEq: Not Eq
time                 126.9 ns   (126.1 ns .. 127.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 126.7 ns   (126.2 ns .. 127.4 ns)
std dev              2.122 ns   (1.663 ns .. 2.636 ns)
variance introduced by outliers: 21% (moderately inflated)
benchmarking transformEq/Text ==: Not Length
time                 499.0 ns   (496.3 ns .. 502.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 499.8 ns   (497.2 ns .. 502.9 ns)
std dev              9.655 ns   (7.545 ns .. 11.84 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking transformEq/Text transformEq: Not Length
time                 17.53 ns   (17.45 ns .. 17.62 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 17.65 ns   (17.55 ns .. 17.78 ns)
std dev              359.4 ps   (283.1 ps .. 500.4 ps)
variance introduced by outliers: 31% (moderately inflated)

It is very common to transform two `Text` values and then compare them
for equality.

```hs
toCaseFold x == toCaseFold y
```

This operation ends up being costly because it must materialize each
newly transformed `Text` in order to check their equality. Instead we
can provide a helper for such patterns `transformEq`, which relies on
the underlying `ustream` to check equality and avoids materializing the
transformed `Text` in the average case.

Benchmarks results for this operation show a worst case, which is close
to the naive comparison and a best case which is much more agreeable.

```
benchmarking transformEq/Text ==: Eq
time                 629.6 ns   (625.7 ns .. 634.5 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 630.0 ns   (626.6 ns .. 634.2 ns)
std dev              12.93 ns   (9.856 ns .. 16.21 ns)
variance introduced by outliers: 25% (moderately inflated)

benchmarking transformEq/Text transformEq: Eq
time                 578.4 ns   (575.6 ns .. 581.2 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 577.9 ns   (575.1 ns .. 581.3 ns)
std dev              10.35 ns   (8.092 ns .. 13.02 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking transformEq/Text ==: Not Eq
time                 663.1 ns   (659.3 ns .. 666.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 660.4 ns   (657.9 ns .. 664.4 ns)
std dev              10.49 ns   (7.178 ns .. 16.44 ns)
variance introduced by outliers: 17% (moderately inflated)

benchmarking transformEq/Text transformEq: Not Eq
time                 126.9 ns   (126.1 ns .. 127.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 126.7 ns   (126.2 ns .. 127.4 ns)
std dev              2.122 ns   (1.663 ns .. 2.636 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking transformEq/Text ==: Not Length
time                 499.0 ns   (496.3 ns .. 502.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 499.8 ns   (497.2 ns .. 502.9 ns)
std dev              9.655 ns   (7.545 ns .. 11.84 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking transformEq/Text transformEq: Not Length
time                 17.53 ns   (17.45 ns .. 17.62 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 17.65 ns   (17.55 ns .. 17.78 ns)
std dev              359.4 ps   (283.1 ps .. 500.4 ps)
variance introduced by outliers: 31% (moderately inflated)
```
@eborden
Copy link
Author

eborden commented Oct 24, 2018

Note, we've implemented this function in Freckle's codebase on a newtyped Text type (NameComponent) and saw a large performance improvement in a codepath that was utilizing this pattern heavily.

Relevant profiling details:

Before

name              location                                             %time %alloc
caseConvert.next  Data/Text/Internal/Fusion/Common.hs:(405,5)-(410,45)  13.4   28.8
onNameComponent   library/FrontRow/Types/NameComponent.hs:38:1-55        7.4    9.8

After

name               location                                               %time %alloc
caseInsensitiveEq  library/FrontRow/Types/NameComponent.hs:(54,1)-(58,78)   3.2    5.0

This version is ever so slightly more efficient and utilizes the
underlying stream's equality instead of using `zip` and `all`.
@Bodigrim Bodigrim marked this pull request as draft February 28, 2023 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants