new binaryNumberRep values #36

mbeckerle · 2022-12-01T16:53:21Z

Are needed - for google zigzag integers, offset binary, etc.

mbeckerle · 2022-12-01T22:20:08Z

Issue #7 specifies a new binaryNumberRep 'offsetBinary'.

Per email discussion Jun 8, 2022 title "binaryNumberRep limitations, xs:decimal and binaryDecimalVirtualPoint", several others are also needed:

This table comes from a format specification we use:

Ignore the 'Logical' column above, that's about enums. Ignore the "*" which is just about when a value must be reserved as an in-band null indicator which is the suggested such value.

What is called 'Mod Twos Complement' here is what our existing proposed DFDL 2.0 feature (Issue #7) calls 'offsetBinary'.

So this table suggests the need for 'unsignedBinary' (already mentioned), but also two others: 'signPlusMagnitudeBinary', and 'onesComplementBinary'.

Google Protocol Buffers has popularized zigZag, a signed integer representation:

Binary Value Zig Zag
000 0
001 -1
010 1
011 -2
100 2
101 -3
110 3
111 -4

The above can be summarized to:

unsignedBinary
twosComplementBinary
offsetBinary
signPlusMagnitudeBinary
onesComplementBinary
zigZagBinary

Those are about how the bit string is interpreted once it is assembled.

There is also the issue of variable-length integers. Numerous formats exist where the integer consists of a number of bytes, but there is no stored length to tell us how many bytes. Rather, the most-significant-bit is used as a flag. 0 means 'last byte', 1 means "there is another byte". The 7-bit contributions from each byte are concatenated (taking dfdl:bitOrder into account) and the resulting bits are then interpreted per one of the applicable above schemes and dfdl:byteOrder.

This notion of variable length where the most-significant-bit (or least) of a byte is used as a flag is effectively a new dfdl:lengthKind (perhaps called flagBitPerByte) which can be combined with many of the above binaryNumberRep values.

Since the length is variable binaryNumberReps which depend on the most-significant bit being a sign bit are problematic. For those, an extra byte of 0x10 (or 0b10000000 must be added if the MSB of the integer would have been 1 as it would otherwise be interpreted as a sign. (ASN.1 BER uses this convention.)

Of the above suggested binaryNumberReps, only offsetBinary makes no sense for variable-length representation because the mid-point of the potential integer range must be known.

mbeckerle · 2023-01-19T16:58:21Z

Next step would be to create experimental implementations, and an experimental features document to propose for DFDL v2.0 inclusion.

mbeckerle · 2023-07-20T18:18:03Z

Another number rep, though this is for decimal, not integer.

EXI stores decimal numbers as two integers. One for the integer part, one for the fraction part, but the fraction part integer is created by taking the digits (base 10) of the fraction part, reversing their order, then converting to a binary integer. This preserves the exact number of leading zeros in the fraction part. Trailing zeros in the fraction part are not captured.

mbeckerle · 2024-05-09T16:22:31Z

This feature is "in use" in that dfdl:inputValueCalc and dfdl:outputValueCalc are used to synthesize the proper integers from these different representations.

mbeckerle · 2024-05-09T16:25:43Z

Closing #7 as duplicate.

This was the description in #7 .

We have found a number of places that use offset-binary numeric representation. This is also called excess-K, or biased, but I think offset binary is a better description of it.

In this representation you take an unsigned binary, and just subtract an offset. E.g., for a 3-bit number, mostSignificantBitFirst:

bits unsigned twos-comp offsetBinary

000 0 0 -4
001 1 1 -3
010 2 2 -2
011 3 3 -1
100 4 -4 0
101 5 -3 1
110 6 -2 2
111 7 -1 3
At the moment, users have to work around this in Daffodil using inputValueCalc and outputValueCalc. This is feasible, but really awkward for such a simple concept.

So we suggest that the next revision of DFDL include dfdl:binaryNumberRep="offsetBinary" as a required feature.

mbeckerle added the DFDL 2.0 For issues associated with DFDL v2.0 (next major revision) label Dec 1, 2022

mbeckerle closed this as completed Dec 1, 2022

mbeckerle reopened this Dec 1, 2022

smhdfdl added the experimental label Feb 2, 2023

mbeckerle added the in use indicates implemented and feature being used label May 9, 2024

mbeckerle mentioned this issue May 9, 2024

dfdl:binaryNumberRep="offsetBinary" new behavior #7

Closed

mbeckerle removed the in use indicates implemented and feature being used label Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new binaryNumberRep values #36

new binaryNumberRep values #36

mbeckerle commented Dec 1, 2022

mbeckerle commented Dec 1, 2022

mbeckerle commented Jan 19, 2023

mbeckerle commented Jul 20, 2023

mbeckerle commented May 9, 2024

mbeckerle commented May 9, 2024

new binaryNumberRep values #36

new binaryNumberRep values #36

Comments

mbeckerle commented Dec 1, 2022

mbeckerle commented Dec 1, 2022

mbeckerle commented Jan 19, 2023

mbeckerle commented Jul 20, 2023

mbeckerle commented May 9, 2024

mbeckerle commented May 9, 2024