Avoid device-to-host copy in `∇getindex!` #800

pxl-th · 2024-06-19T20:43:32Z

Can we use custom kernel with atomics for ∇getindex!(dx::AbstractGPUArray, dy, inds...) instead of copying everything to CPU?

This way we'd be able to avoid synchronizations and we can add such kernel via extension

The text was updated successfully, but these errors were encountered:

mcabbott · 2024-06-19T20:55:07Z

To be clear the method which copies to CPU should only be for inds which are arrays, which is where you have to worry about races. For simpler things like A[1,:] or B[3:end-3] it should not do this.

I think this method was added as the simplest way to solve the problem. But having a faster kernel in a package extension would be fine. I believe it's a lot like NNlib.scatter.

pxl-th · 2024-06-19T21:04:01Z

To be clear the method which copies to CPU should only be for inds which are arrays

Yes, that's exactly my situation :)

I can try to come up with a PR for this soon

mcabbott added the GPU label Jun 19, 2024

pxl-th linked a pull request Jun 21, 2024 that will close this issue

Avoid device-to-host copy in ∇getindex! #801

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid device-to-host copy in `∇getindex!` #800

Avoid device-to-host copy in `∇getindex!` #800

pxl-th commented Jun 19, 2024

mcabbott commented Jun 19, 2024

pxl-th commented Jun 19, 2024

Avoid device-to-host copy in ∇getindex! #800

Avoid device-to-host copy in ∇getindex! #800

Comments

pxl-th commented Jun 19, 2024

mcabbott commented Jun 19, 2024

pxl-th commented Jun 19, 2024

Avoid device-to-host copy in `∇getindex!` #800

Avoid device-to-host copy in `∇getindex!` #800