Open question: support for non-IEEE 754 float point types #23

wacky6 · 2022-02-10T02:36:11Z

Some accelerators use non-standard float point types (e.g. bfloat16 and TF32). They are important to achieve high performance (e.g. by using Nvidia's tensor cores), and/or reduce resource usage (e.g. FP32->FP16 reduces memory usage by half).

How could MLLoader leverage these types? Some ideas:

Do it transparently, auto convert based on the accelerator
Should the API allow JS code to specify acceptable quantization levels (e.g. use bf16 but not fp16)
What if the chip doesn't support the model's declared data type (e.g. BF16 chip + FP32 model)

josephrocca · 2022-02-13T03:29:36Z

Another factor is download time. IIUC, the current tfjs format (for example) doesn't support float16, and so tfjs-converter converts weights to float32. This isn't ideal because it doubles the model size. I think it makes more sense to always optimistically serve the model in its "native" floating point format and for conversion to be done at run time based on the device's hardware.

anssiko changed the title ~~Open question: support for non-IETF 754 float point types~~ Open question: support for non-IEEE 754 float point types Mar 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open question: support for non-IEEE 754 float point types #23

Open question: support for non-IEEE 754 float point types #23

wacky6 commented Feb 10, 2022

josephrocca commented Feb 13, 2022 •

edited

Loading

Open question: support for non-IEEE 754 float point types #23

Open question: support for non-IEEE 754 float point types #23

Comments

wacky6 commented Feb 10, 2022

josephrocca commented Feb 13, 2022 • edited Loading

josephrocca commented Feb 13, 2022 •

edited

Loading