Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid device-to-host copy in ∇getindex! #800

Open
pxl-th opened this issue Jun 19, 2024 · 2 comments · May be fixed by #801
Open

Avoid device-to-host copy in ∇getindex! #800

pxl-th opened this issue Jun 19, 2024 · 2 comments · May be fixed by #801
Labels

Comments

@pxl-th
Copy link

pxl-th commented Jun 19, 2024

Can we use custom kernel with atomics for ∇getindex!(dx::AbstractGPUArray, dy, inds...) instead of copying everything to CPU?

This way we'd be able to avoid synchronizations and we can add such kernel via extension

@mcabbott
Copy link
Member

To be clear the method which copies to CPU should only be for inds which are arrays, which is where you have to worry about races. For simpler things like A[1,:] or B[3:end-3] it should not do this.

I think this method was added as the simplest way to solve the problem. But having a faster kernel in a package extension would be fine. I believe it's a lot like NNlib.scatter.

@mcabbott mcabbott added the GPU label Jun 19, 2024
@pxl-th
Copy link
Author

pxl-th commented Jun 19, 2024

To be clear the method which copies to CPU should only be for inds which are arrays

Yes, that's exactly my situation :)

I can try to come up with a PR for this soon

@pxl-th pxl-th linked a pull request Jun 21, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants