Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new binaryNumberRep values #36

Open
mbeckerle opened this issue Dec 1, 2022 · 5 comments
Open

new binaryNumberRep values #36

mbeckerle opened this issue Dec 1, 2022 · 5 comments
Labels
DFDL 2.0 For issues associated with DFDL v2.0 (next major revision) experimental

Comments

@mbeckerle
Copy link
Collaborator

Are needed - for google zigzag integers, offset binary, etc.

@mbeckerle mbeckerle added the DFDL 2.0 For issues associated with DFDL v2.0 (next major revision) label Dec 1, 2022
@mbeckerle mbeckerle reopened this Dec 1, 2022
@mbeckerle
Copy link
Collaborator Author

Issue #7 specifies a new binaryNumberRep 'offsetBinary'.

Per email discussion Jun 8, 2022 title "binaryNumberRep limitations, xs:decimal and binaryDecimalVirtualPoint", several others are also needed:

This table comes from a format specification we use:

image

Ignore the 'Logical' column above, that's about enums. Ignore the "*" which is just about when a value must be reserved as an in-band null indicator which is the suggested such value.

What is called 'Mod Twos Complement' here is what our existing proposed DFDL 2.0 feature (Issue #7) calls 'offsetBinary'.

So this table suggests the need for 'unsignedBinary' (already mentioned), but also two others: 'signPlusMagnitudeBinary', and 'onesComplementBinary'.

Google Protocol Buffers has popularized zigZag, a signed integer representation:

Binary Value Zig Zag
000 0
001 -1
010 1
011 -2
100 2
101 -3
110 3
111 -4

The above can be summarized to:

  • unsignedBinary
  • twosComplementBinary
  • offsetBinary
  • signPlusMagnitudeBinary
  • onesComplementBinary
  • zigZagBinary

Those are about how the bit string is interpreted once it is assembled.

There is also the issue of variable-length integers. Numerous formats exist where the integer consists of a number of bytes, but there is no stored length to tell us how many bytes. Rather, the most-significant-bit is used as a flag. 0 means 'last byte', 1 means "there is another byte". The 7-bit contributions from each byte are concatenated (taking dfdl:bitOrder into account) and the resulting bits are then interpreted per one of the applicable above schemes and dfdl:byteOrder.

This notion of variable length where the most-significant-bit (or least) of a byte is used as a flag is effectively a new dfdl:lengthKind (perhaps called flagBitPerByte) which can be combined with many of the above binaryNumberRep values.

Since the length is variable binaryNumberReps which depend on the most-significant bit being a sign bit are problematic. For those, an extra byte of 0x10 (or 0b10000000 must be added if the MSB of the integer would have been 1 as it would otherwise be interpreted as a sign. (ASN.1 BER uses this convention.)

Of the above suggested binaryNumberReps, only offsetBinary makes no sense for variable-length representation because the mid-point of the potential integer range must be known.

@mbeckerle
Copy link
Collaborator Author

Next step would be to create experimental implementations, and an experimental features document to propose for DFDL v2.0 inclusion.

@mbeckerle
Copy link
Collaborator Author

Another number rep, though this is for decimal, not integer.

EXI stores decimal numbers as two integers. One for the integer part, one for the fraction part, but the fraction part integer is created by taking the digits (base 10) of the fraction part, reversing their order, then converting to a binary integer. This preserves the exact number of leading zeros in the fraction part. Trailing zeros in the fraction part are not captured.

@mbeckerle
Copy link
Collaborator Author

This feature is "in use" in that dfdl:inputValueCalc and dfdl:outputValueCalc are used to synthesize the proper integers from these different representations.

@mbeckerle mbeckerle added the in use indicates implemented and feature being used label May 9, 2024
@mbeckerle
Copy link
Collaborator Author

Closing #7 as duplicate.

This was the description in #7 .

We have found a number of places that use offset-binary numeric representation. This is also called excess-K, or biased, but I think offset binary is a better description of it.

In this representation you take an unsigned binary, and just subtract an offset. E.g., for a 3-bit number, mostSignificantBitFirst:

bits unsigned twos-comp offsetBinary

000 0 0 -4
001 1 1 -3
010 2 2 -2
011 3 3 -1
100 4 -4 0
101 5 -3 1
110 6 -2 2
111 7 -1 3
At the moment, users have to work around this in Daffodil using inputValueCalc and outputValueCalc. This is feasible, but really awkward for such a simple concept.

So we suggest that the next revision of DFDL include dfdl:binaryNumberRep="offsetBinary" as a required feature.

@mbeckerle mbeckerle removed the in use indicates implemented and feature being used label Jun 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DFDL 2.0 For issues associated with DFDL v2.0 (next major revision) experimental
Projects
None yet
Development

No branches or pull requests

2 participants