Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open question: support for non-IEEE 754 float point types #23

Open
wacky6 opened this issue Feb 10, 2022 · 1 comment
Open

Open question: support for non-IEEE 754 float point types #23

wacky6 opened this issue Feb 10, 2022 · 1 comment

Comments

@wacky6
Copy link
Contributor

wacky6 commented Feb 10, 2022

Relates to webmachinelearning/webnn#252

Some accelerators use non-standard float point types (e.g. bfloat16 and TF32). They are important to achieve high performance (e.g. by using Nvidia's tensor cores), and/or reduce resource usage (e.g. FP32->FP16 reduces memory usage by half).

How could MLLoader leverage these types? Some ideas:

  • Do it transparently, auto convert based on the accelerator
  • Should the API allow JS code to specify acceptable quantization levels (e.g. use bf16 but not fp16)
  • What if the chip doesn't support the model's declared data type (e.g. BF16 chip + FP32 model)
@josephrocca
Copy link

josephrocca commented Feb 13, 2022

Another factor is download time. IIUC, the current tfjs format (for example) doesn't support float16, and so tfjs-converter converts weights to float32. This isn't ideal because it doubles the model size. I think it makes more sense to always optimistically serve the model in its "native" floating point format and for conversion to be done at run time based on the device's hardware.

@anssiko anssiko changed the title Open question: support for non-IETF 754 float point types Open question: support for non-IEEE 754 float point types Mar 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants