Diagnostic notation wrong for floating-point numbers #6

PJK · 2017-01-12T12:44:12Z

Hi, I came across the following inconsistency:

cbor2diag ignores width of floating-point numbers and thus is not bijective, as one would expect.

The RFC specifies that the width should be encoded using _n suffix, but it's not, therefore the information is lost, as demonstrated by the following example:

perl -e' print "\xfb\x00\x00\x00\x00\x00\x00\x00\x00"' | cbor2diag.rb | diag2cbor.rb | hexdump

prints

0000000 f9 00 00                                       
0000003

instead of \xfb\x00\x00\x00\x00\x00\x00\x00\x00.

Moreover, the parser doesn't seem to support parsing the _ width suffixes at all:

pavel@dt3:~|⇒  echo 1.0_2 | diag2cbor.rb
*** can't parse 1.0_2
*** Expected one of [0-9], [Ee], [ \t\n\r], "/" at line 1, column 4 (byte 4) after

This in general goes back to the design of Diagnostic Notation which doesn't specify the exact rules for encoding 'logical' values to CBOR items. The same problem occurs with e.g. lengths of all items, where this implementation also chooses the 'canonical' encoding over preserving the full information:

pavel@dt3:~|⇒  perl -e' print "\x98\x00"' | cbor2diag.rb | diag2cbor.rb | hexdump  
0000000 80                                             
0000001

If these are the intended semantics of DN, then the standard should be amended with unambiguous specification of 'Canonical' CBOR, which would render section 6.1 unnecessary.

Otherwise the implementation should be changed to at least allow parsing all valid DN inputs. In this case, cbor2diag 'composed' with diag2cbor can never be an identity unless DN is extended to allow specifying width in all constructs where multiple choices produce valid and logically equivalent item.

The text was updated successfully, but these errors were encountered:

jyasskin · 2017-05-02T19:59:13Z

cbor2diag.rb also loses the fact that integers are encoded with a non-minimal encoding. This comes at least partially from the fact that the cbor decoder represents integers and floating numbers as native Ruby types, rather than including their size information.

cabo · 2017-05-02T20:13:47Z

Indeed, cbor-diag is based on cbor-pure, which maps the CBOR data model to Ruby data structures. Adding meta information such as encoding sizes is not easy in the latter (I have a partial solution, but there are some Ruby bugs around floating point numbers). Maybe we need to go away from actual Ruby data here and treat integers and primitives the same way we already treat unknown simples.

So far, I haven't needed these capabilities -- what is your use case that requires full roundtripping?

jyasskin · 2017-05-02T20:19:07Z

In https://github.com/dimich-g/webpackage/pull/50/files#diff-04c6e90faac2675aa89e2176d2eec7d8R128, I'm requiring that the trailing length be encoded in 8 bytes, so that tools parsing from the end of the file can tell whether h'1b0000001a00191817' encodes 111670794263, 1644567, 6167, or 23. I'd like the diagnostic format to be able to diagnose mistakes in that encoding and to produce that encoding.

cabo · 2017-05-02T22:51:52Z

I see. You would get the same effect from using any CBOR encoder, including those that don't give you control over encoding lengths, by just describing that field as a byte string of eight bytes and then just saying at the semantic level that this is a length encoded in network byte order. Small cognitive dissonance, much better leverage of the existing CBOR ecosystem.

jyasskin · 2017-05-03T00:05:52Z

Yeah, that's fair. Another use, which I don't need yet, is for cases that require Canonical CBOR. If an item fails to parse due to being non-canonical, the diagnostic form should make it possible to distinguish that item from its canonical form.

cabo · 2017-05-05T12:25:34Z

The above thread notwithstanding, it should indeed be possible to use this gem to get more representation information. I will try to extract some code from the cddl gem to use here, but it will probably take a week or two until we have something to play with.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Diagnostic notation wrong for floating-point numbers #6

Diagnostic notation wrong for floating-point numbers #6

PJK commented Jan 12, 2017

jyasskin commented May 2, 2017

cabo commented May 2, 2017

jyasskin commented May 2, 2017

cabo commented May 2, 2017

jyasskin commented May 3, 2017

cabo commented May 5, 2017

Diagnostic notation wrong for floating-point numbers #6

Diagnostic notation wrong for floating-point numbers #6

Comments

PJK commented Jan 12, 2017

jyasskin commented May 2, 2017

cabo commented May 2, 2017

jyasskin commented May 2, 2017

cabo commented May 2, 2017

jyasskin commented May 3, 2017

cabo commented May 5, 2017