Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only move model to device when model is in cpu and target device is xpu #3133

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

faaany
Copy link
Contributor

@faaany faaany commented Sep 29, 2024

What does this PR do?

When model is loaded across multiple devices, fine-tuning on XPU crashes with the message:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, xpu:0 and xpu:3! (when checking argument for argument mat2 in method wrapper_XPU__bmm)

The reason is that the whole model is moved back to xpu:0 in the _prepare_ipex_or_xpu method. After the fix, fine-tuning works.

Who can review?

@SunMarc and @muellerzr

@faaany
Copy link
Contributor Author

faaany commented Sep 30, 2024

@yao-matrix

@faaany faaany changed the title fix tensor device misalignment on xpu for model loaded with device_map="auto" only move model to device when model is in cpu and target device in xpu Sep 30, 2024
@faaany faaany changed the title only move model to device when model is in cpu and target device in xpu only move model to device when model is in cpu and target device is xpu Sep 30, 2024
@yao-matrix
Copy link

@yao-matrix

fine for me.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense ! Thanks for fixing !

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants