Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diagnostic notation wrong for floating-point numbers #6

Open
PJK opened this issue Jan 12, 2017 · 6 comments
Open

Diagnostic notation wrong for floating-point numbers #6

PJK opened this issue Jan 12, 2017 · 6 comments

Comments

@PJK
Copy link

PJK commented Jan 12, 2017

Hi, I came across the following inconsistency:

cbor2diag ignores width of floating-point numbers and thus is not bijective, as one would expect.

The RFC specifies that the width should be encoded using _n suffix, but it's not, therefore the information is lost, as demonstrated by the following example:

perl -e' print "\xfb\x00\x00\x00\x00\x00\x00\x00\x00"' | cbor2diag.rb | diag2cbor.rb | hexdump

prints

0000000 f9 00 00                                       
0000003

instead of \xfb\x00\x00\x00\x00\x00\x00\x00\x00.

Moreover, the parser doesn't seem to support parsing the _ width suffixes at all:

pavel@dt3:~|⇒  echo 1.0_2 | diag2cbor.rb
*** can't parse 1.0_2
*** Expected one of [0-9], [Ee], [ \t\n\r], "/" at line 1, column 4 (byte 4) after 

This in general goes back to the design of Diagnostic Notation which doesn't specify the exact rules for encoding 'logical' values to CBOR items. The same problem occurs with e.g. lengths of all items, where this implementation also chooses the 'canonical' encoding over preserving the full information:

pavel@dt3:~|⇒  perl -e' print "\x98\x00"' | cbor2diag.rb | diag2cbor.rb | hexdump  
0000000 80                                             
0000001

If these are the intended semantics of DN, then the standard should be amended with unambiguous specification of 'Canonical' CBOR, which would render section 6.1 unnecessary.

Otherwise the implementation should be changed to at least allow parsing all valid DN inputs. In this case, cbor2diag 'composed' with diag2cbor can never be an identity unless DN is extended to allow specifying width in all constructs where multiple choices produce valid and logically equivalent item.

@jyasskin
Copy link

jyasskin commented May 2, 2017

cbor2diag.rb also loses the fact that integers are encoded with a non-minimal encoding. This comes at least partially from the fact that the cbor decoder represents integers and floating numbers as native Ruby types, rather than including their size information.

@cabo
Copy link
Owner

cabo commented May 2, 2017

Indeed, cbor-diag is based on cbor-pure, which maps the CBOR data model to Ruby data structures. Adding meta information such as encoding sizes is not easy in the latter (I have a partial solution, but there are some Ruby bugs around floating point numbers). Maybe we need to go away from actual Ruby data here and treat integers and primitives the same way we already treat unknown simples.

So far, I haven't needed these capabilities -- what is your use case that requires full roundtripping?

@jyasskin
Copy link

jyasskin commented May 2, 2017

In https://github.com/dimich-g/webpackage/pull/50/files#diff-04c6e90faac2675aa89e2176d2eec7d8R128, I'm requiring that the trailing length be encoded in 8 bytes, so that tools parsing from the end of the file can tell whether h'1b0000001a00191817' encodes 111670794263, 1644567, 6167, or 23. I'd like the diagnostic format to be able to diagnose mistakes in that encoding and to produce that encoding.

@cabo
Copy link
Owner

cabo commented May 2, 2017

I see. You would get the same effect from using any CBOR encoder, including those that don't give you control over encoding lengths, by just describing that field as a byte string of eight bytes and then just saying at the semantic level that this is a length encoded in network byte order. Small cognitive dissonance, much better leverage of the existing CBOR ecosystem.

@jyasskin
Copy link

jyasskin commented May 3, 2017

Yeah, that's fair. Another use, which I don't need yet, is for cases that require Canonical CBOR. If an item fails to parse due to being non-canonical, the diagnostic form should make it possible to distinguish that item from its canonical form.

@cabo
Copy link
Owner

cabo commented May 5, 2017

The above thread notwithstanding, it should indeed be possible to use this gem to get more representation information. I will try to extract some code from the cddl gem to use here, but it will probably take a week or two until we have something to play with.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants