Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add implementation to forward linear iterators in strided views #2797

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

spectre-ns
Copy link
Contributor

Checklist

  • The title and commit message(s) are descriptive.
  • Small commits made to fix your PR have been squashed to avoid history pollution.
  • Tests have been added for new features or bug fixes.
  • API of new functions and classes are documented.

Description

@spectre-ns
Copy link
Contributor Author

@JohanMabille I'm trying to forward linear iterators where possible but I think the implementations expect steppers in certain places and there are some subtle differences I haven't figured out yet. I thought it was weird that the linear methods on the strided view actually return xiterators of steppers rather than linear iterators as the interface would suggest.

Test bench from issue #2734

Without Changes:
case 1: 10.6235ms
case 2a: 48.4154ms
case 2b: 8.93694ms
case 2c: 48.7691ms
case 3a: 47.3316ms
case 3b: 9.05212ms
case 3c: 47.3312ms

With changes results in:
case 1: 10.0266ms
case 2a: 9.23594ms
case 2b: 8.97092ms
case 2c: 9.09802ms
case 3a: 8.86476ms
case 3b: 9.55695ms
case 3c: 8.67326ms

The implementation is super rough but I wanted to show you the basic idea. I ran into a pesky issue with reshaped views that resulted in my having to force a fallback to the old implementation. Might need some help there but the performance improvements speak for themselves.

@spectre-ns
Copy link
Contributor Author

spectre-ns commented Jul 12, 2024

Can a linear assignment be used when the inner stride is not one? It seems like the linear iterators increment by one but when doing conversion from row major to column major the strides between adjacent elements are non-unity.

@spectre-ns spectre-ns force-pushed the strided-view-linear-iterator-forwarding branch from bdde934 to b4f7e3d Compare July 12, 2024 23:52
@spectre-ns
Copy link
Contributor Author

Can a linear assignment be used when the inner stride is not one? It seems like the linear iterators increment by one but when doing conversion from row major to column major the strides between adjacent elements are non-unity.

This issue here was trying to iterate on the object like a column major when the internal storage is row major. That requires a stepper... In these cases the implementation falls back to a stepper even though the underlying expression might be linearly iterable.

@spectre-ns spectre-ns force-pushed the strided-view-linear-iterator-forwarding branch from b4f7e3d to 2d268ad Compare July 12, 2024 23:55
@spectre-ns spectre-ns force-pushed the strided-view-linear-iterator-forwarding branch from 2d268ad to 7935a69 Compare July 13, 2024 00:15
@spectre-ns
Copy link
Contributor Author

See benchmark code below:

#include <chrono>
#include <xtensor/xrandom.hpp>
#include <xtensor/xtensor.hpp>
#include <xtensor/xnoalias.hpp>

double mean_milliseconds_from_total(std::chrono::nanoseconds total,
    size_t num_repeats) {
    std::chrono::duration<double, std::milli> total_ms = total;
    return total_ms.count() / (double)num_repeats;
}

int main() {
    size_t num_repeats = 100;
    xt::xtensor<double, 1> a = xt::random::rand<double>({ 10000000 });
    xt::xtensor<double, 1> b = xt::random::rand<double>({ 10000000 });
    xt::xtensor<double, 1> c = xt::zeros<double>({ 10000000 });

    // case 1: full tensor
    auto started = std::chrono::high_resolution_clock::now();
    for (size_t i = 0; i < num_repeats; ++i)
        xt::noalias(c) = a + b;
    auto finished = std::chrono::high_resolution_clock::now();
    std::cout << "case 1:  "
        << mean_milliseconds_from_total(finished - started, num_repeats)
        << "ms" << std::endl;

    // case 2a: view of tensor and expression with xt::all()
    started = std::chrono::high_resolution_clock::now();
    for (size_t i = 0; i < num_repeats; ++i)
        xt::noalias(xt::strided_view(c, { xt::all() })) = xt::strided_view(a + b, { xt::all() });
    finished = std::chrono::high_resolution_clock::now();
    std::cout << "case 2a: "
        << mean_milliseconds_from_total(finished - started, num_repeats)
        << "ms" << std::endl;

    // case 2b: view of only tensor with xt::all()
    started = std::chrono::high_resolution_clock::now();
    for (size_t i = 0; i < num_repeats; ++i)
        xt::noalias(xt::strided_view(c, { xt::all() })) = a + b;
    finished = std::chrono::high_resolution_clock::now();
    std::cout << "case 2b: "
        << mean_milliseconds_from_total(finished - started, num_repeats)
        << "ms" << std::endl;

    // case 2c: view of only expression with xt::all()
    started = std::chrono::high_resolution_clock::now();
    for (size_t i = 0; i < num_repeats; ++i)
        xt::noalias(c) = xt::strided_view(a + b, { xt::all() });
    finished = std::chrono::high_resolution_clock::now();
    std::cout << "case 2c: "
        << mean_milliseconds_from_total(finished - started, num_repeats)
        << "ms" << std::endl;

    // case 3a: view of tensor and expression with xt::range()
    started = std::chrono::high_resolution_clock::now();
    for (size_t i = 0; i < num_repeats; ++i)
        xt::noalias(xt::strided_view(c, { xt::range(0, c.size()) })) =
        xt::strided_view(a + b, { xt::range(0, c.size()) });
    finished = std::chrono::high_resolution_clock::now();
    std::cout << "case 3a: "
        << mean_milliseconds_from_total(finished - started, num_repeats)
        << "ms" << std::endl;

    // case 3b: view of only tensor with xt::range()
    started = std::chrono::high_resolution_clock::now();
    for (size_t i = 0; i < num_repeats; ++i)
        xt::noalias(xt::strided_view(c, { xt::range(0, c.size()) })) = a + b;
    finished = std::chrono::high_resolution_clock::now();
    std::cout << "case 3b: "
        << mean_milliseconds_from_total(finished - started, num_repeats)
        << "ms" << std::endl;

    // case 3c: view of only expression with xt::range()
    started = std::chrono::high_resolution_clock::now();
    for (size_t i = 0; i < num_repeats; ++i)
        xt::noalias(c) = xt::strided_view(a + b, { xt::range(0, c.size()) });
    finished = std::chrono::high_resolution_clock::now();
    std::cout << "case 3c: "
        << mean_milliseconds_from_total(finished - started, num_repeats)
        << "ms" << std::endl;
    return 0;
}

Copy link
Member

@JohanMabille JohanMabille left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for tackling this! I think the linear / not linear iteration was supposed to be handled in the flat_storage_adaptor but we probably missed some cases.

I wonder if it would make sense to merge the new linear_flat_expression_adaptor with the existing flat_expression_adaptor. This would put the reponsibility of choosing between the linear iterator and the regular one in the flat adaptor rather than in the strided_view. Also It would probably benefit other classes using the flat_expression adaptor.

include/xtensor/xstrided_view.hpp Outdated Show resolved Hide resolved
include/xtensor/xstrided_view.hpp Outdated Show resolved Hide resolved
include/xtensor/xstrided_view_base.hpp Outdated Show resolved Hide resolved
@@ -75,6 +76,52 @@ namespace xt
size_type m_size;
};

template <class CT, layout_type L>
class linear_flat_expression_adaptor : public flat_expression_adaptor<CT, L>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why inheriting from flat_expression_adaptor here? It does not seem that you reuse anything from it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanMabille I only implemented the linear iterator methods for the new class. linear_flat_expression_adaptor relies on inheriting the implementation of flat_expression_adaptor for calls made to methods such as cbegin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohanMabille what do you think? Should I implement this differently?

@spectre-ns
Copy link
Contributor Author

Thanks a lot for tackling this! I think the linear / not linear iteration was supposed to be handled in the flat_storage_adaptor but we probably missed some cases.

I wonder if it would make sense to merge the new linear_flat_expression_adaptor with the existing flat_expression_adaptor. This would put the reponsibility of choosing between the linear iterator and the regular one in the flat adaptor rather than in the strided_view. Also It would probably benefit other classes using the flat_expression adaptor.

@JohanMabille I debated that as well. Let me play around with the logic and see if I can make this work without modifying the strided_view class and instead us the flat_expression_adaptor implementation. That would likely be cleaner. I was hoping to demonstrate the type of performance improvements before coming up with the final solution.

@spectre-ns spectre-ns force-pushed the strided-view-linear-iterator-forwarding branch from 9f834af to 7935a69 Compare July 21, 2024 01:15
@spectre-ns
Copy link
Contributor Author

spectre-ns commented Jul 21, 2024

Thanks a lot for tackling this! I think the linear / not linear iteration was supposed to be handled in the flat_storage_adaptor but we probably missed some cases.

I wonder if it would make sense to merge the new linear_flat_expression_adaptor with the existing flat_expression_adaptor. This would put the reponsibility of choosing between the linear iterator and the regular one in the flat adaptor rather than in the strided_view. Also It would probably benefit other classes using the flat_expression adaptor.

@JohanMabille combining them became difficult because the uvector implementation doesn't have the linear_iterator interface. The begin and end methods are linear iterators but the interface is not consistent. I'm not sure if this is intentional or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants