diff --git a/README.md b/README.md index 3dca17d..b868e37 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,7 @@ This is the working area for the IETF [CORE Working Group](https://datatracker.i * [Working Group Draft](https://datatracker.ietf.org/doc/html/draft-ietf-core-href) * [Compare Editor's Copy to Working Group Draft](https://core-wg.github.io/href/#go.draft-ietf-core-href.diff) +[Access to all branches](https://core-wg.github.io/href/) ## Contributing diff --git a/code/schemes-numbers.md b/code/schemes-numbers.md index 150f247..67784e7 100644 --- a/code/schemes-numbers.md +++ b/code/schemes-numbers.md @@ -1,15 +1,15 @@ -| CRI scheme number | URI scheme | Reference | -|-------------------|------------|-------------| -| 0 | coap | \[RFC-XXXX] | -| 1 | coaps | \[RFC-XXXX] | -| 2 | http | \[RFC-XXXX] | -| 3 | https | \[RFC-XXXX] | -| 4 | urn | \[RFC-XXXX] | -| 5 | did | \[RFC-XXXX] | -| 6 | coap+tcp | \[RFC-XXXX] | -| 7 | coaps+tcp | \[RFC-XXXX] | -| 8 | coap+ws | \[RFC-XXXX] | -| 9 | coaps+ws | \[RFC-XXXX] | +| CRI scheme number | URI scheme name | Reference | +|-------------------|-----------------|-------------| +| 0 | coap | \[RFC-XXXX] | +| 1 | coaps | \[RFC-XXXX] | +| 2 | http | \[RFC-XXXX] | +| 3 | https | \[RFC-XXXX] | +| 4 | urn | \[RFC-XXXX] | +| 5 | did | \[RFC-XXXX] | +| 6 | coap+tcp | \[RFC-XXXX] | +| 7 | coaps+tcp | \[RFC-XXXX] | +| 8 | coap+ws | \[RFC-XXXX] | +| 9 | coaps+ws | \[RFC-XXXX] | | 1059 | ms-gamingoverlay | \[RFC-XXXX] | | 1165 | snmp | \[RFC-XXXX] | | 1220 | cast | \[RFC-XXXX] | @@ -95,6 +95,7 @@ | 5410 | ms-sttoverlay | \[RFC-XXXX] | | 5425 | irc | \[RFC-XXXX] | | 5472 | sieve | \[RFC-XXXX] | +| 5477 | machineProvisioningProgressReporter | \[RFC-XXXX] | | 5480 | lvlt | \[RFC-XXXX] | | 5492 | sftp | \[RFC-XXXX] | | 5536 | ms-excel | \[RFC-XXXX] | @@ -283,6 +284,7 @@ | 13068 | git | \[RFC-XXXX] | | 13094 | openpgp4fpr | \[RFC-XXXX] | | 13098 | ms-secondary-screen-controller | \[RFC-XXXX] | +| 13228 | mvrps | \[RFC-XXXX] | | 13285 | snews | \[RFC-XXXX] | | 13340 | smtp | \[RFC-XXXX] | | 13348 | pack | \[RFC-XXXX] | @@ -291,6 +293,7 @@ | 13404 | afp | \[RFC-XXXX] | | 13440 | msrps | \[RFC-XXXX] | | 13442 | ldap | \[RFC-XXXX] | +| 13451 | mvrp | \[RFC-XXXX] | | 13499 | nntp | \[RFC-XXXX] | | 13608 | onenote | \[RFC-XXXX] | | 13650 | sarif | \[RFC-XXXX] | diff --git a/draft-ietf-core-href.md b/draft-ietf-core-href.md index a6f4064..5e288ba 100644 --- a/draft-ietf-core-href.md +++ b/draft-ietf-core-href.md @@ -7,6 +7,7 @@ submissiontype: IETF consensus: true lang: en title: Constrained Resource Identifiers +updates: 7595 wg: CoRE Working Group author: @@ -49,24 +50,29 @@ venue: informative: RFC7228: term - RFC9110: http + STD97: http +# RFC9110 RFC7252: coap RFC8141: urn RFC8288: web-linking - RFC8820: lawn + BCP190: lawn +# BCP205: + # RFC8820 W3C.REC-html52-20171214: I-D.ietf-cbor-edn-literals: edn + I-D.carpenter-6man-rfc6874bis: zonebis normative: - RFC3986: uri + STD66: uri +# RFC 3986 RFC3987: iri RFC6874: zone - RFC7595: schemes + BCP35: schemes +# RFC7595 IANA.uri-schemes: BCP26: -: ianacons # =: RFC8126 IANA.core-parameters: - I-D.carpenter-6man-rfc6874bis: zonebis RFC8610: cddl Unicode: target: https://www.unicode.org/versions/Unicode13.0.0/ @@ -76,7 +82,8 @@ normative: date: 2020-03 seriesinfo: ISBN: 978-1-936213-26-9 - RFC8949: cbor + STD94: cbor +# RFC8949 RFC9165: cddlcontrol --- abstract @@ -84,7 +91,7 @@ normative: The Constrained Resource Identifier (CRI) is a complement to the Uniform Resource Identifier (URI) that represents the URI components in Concise -Binary Object Representation (CBOR) instead of a sequence of characters. +Binary Object Representation (CBOR) instead of in a sequence of characters. This simplifies parsing, comparison, and reference resolution in environments with severe limitations on processing power, code size, and memory size. @@ -109,7 +116,7 @@ memory size. # Introduction -The [Uniform Resource Identifier (URI)](#RFC3986) and its most common +The [Uniform Resource Identifier (URI)](#STD66) and its most common usage, the URI reference, are the Internet standard for linking to resources in hypertext formats such as [HTML](#W3C.REC-html52-20171214) or the [HTTP "Link" header field](#RFC8288). @@ -130,36 +137,43 @@ This can be a problem especially in [constrained environments](#RFC7228), where nodes often have severe code size and memory size limitations. As a result, many implementations in such environments support only an ad-hoc, informally-specified, bug-ridden, non-interoperable subset of -half of {{RFC3986}}. +half of {{STD66}}. This document defines the *Constrained Resource Identifier (CRI)* by constraining URIs to a simplified subset and representing their -components in [Concise Binary Object Representation (CBOR)](#RFC8949) +components in [Concise Binary Object Representation (CBOR)](#STD94) instead of a sequence of characters. This allows typical operations on URI references such as parsing, comparison, and reference resolution (including all corner cases) to be implemented in a comparatively small amount of code. As a result of simplification, however, CRIs are not capable of -expressing all URIs permitted by the generic syntax of {{RFC3986}} (hence +expressing all URIs permitted by the generic syntax of {{STD66}} (hence the "constrained" in "Constrained Resource Identifier"). The supported subset includes all URIs of the [Constrained Application Protocol (CoAP)](#RFC7252), most URIs of the -[Hypertext Transfer Protocol (HTTP)](#RFC9110), +[Hypertext Transfer Protocol (HTTP)](#STD97), [Uniform Resource Names (URNs)](#RFC8141), and other similar URIs. The exact constraints are defined in {{constraints}}. ## Notational Conventions -{::boilerplate bcp14-tagged} +{::boilerplate bcp14-tagged-bcp} In this specification, the term "byte" is used in its now customary sense as a synonym for "octet". Terms defined in this document appear in *cursive* where they -are introduced (rendered in plain text as the new term surrounded by -underscores). +are introduced (in the plaintext form of this document, they are +rendered as the new term surrounded by underscores). +The general structure of data items is shown in the [Concise Data Definition +Language (CDDL)](#RFC8610) [including its control +extensions](#RFC9165). +Specific examples are notated in CBOR Extended +Diagnostic Notation (EDN), as originally introduced in {{Section 8 of +RFC8949@-cbor}} and extended in {{Appendix G of -cddl}}. +({{-edn}} more rigorously defines and further extends EDN.) # Constraints {#constraints} @@ -171,7 +185,7 @@ The components are subject to the following constraints: {: type="C%d."} 1. {:#c-scheme} The scheme name can be any Unicode string (see Definition D80 in {{Unicode}}) that matches the syntax of a URI - scheme (see {{Section 3.1 of RFC3986}}, which constrains schemes to + scheme (see {{Section 3.1 of RFC3986@-uri}}, which constrains scheme names to ASCII) and is lowercase (see Definition D139 in {{Unicode}}). The scheme is always present. @@ -180,7 +194,7 @@ The components are subject to the following constraints: and optionally preceded by user information. Alternatively, the authority can be absent; the two cases for this - defined in {{Section 3.3 of RFC3986}} are modeled by two different + defined in {{Section 3.3 of RFC3986@-uri}} are modeled by two different values used in place of an absent authority: * the path can be root-based (zero or more path components that are @@ -192,8 +206,8 @@ The components are subject to the following constraints: not all CRI implementations will support authority-less URIs.) 3. {:#c-userinfo} A userinfo is a text string built out of unreserved - characters ({{Section 2.3 of RFC3986}}) or "sub-delims" ({{Section 2.2 - of RFC3986}}); any other character needs to be percent-encoded ({{pet}}). + characters ({{Section 2.3 of RFC3986@-uri}}) or "sub-delims" ({{Section 2.2 + of RFC3986@-uri}}); any other character needs to be percent-encoded ({{pet}}). Note that this excludes the ":" character, which is commonly deprecated as a way to delimit a cleartext password in a userinfo. @@ -209,12 +223,12 @@ The components are subject to the following constraints: result in a Unicode string that is lowercase and in Unicode Normalization Form C (NFC) (see Definition D120 in {{Unicode}}). (The syntax may be further restricted by the scheme. - As per {{Section 3.2.2 of -uri}}, a registered name can be empty, for + As per {{Section 3.2.2 of RFC3986@-uri}}, a registered name can be empty, for which case a scheme can define a default for the host.) 6. {:#c-port-range} A port is always an integer in the range from 0 to 65535. Ports outside this range, empty ports (port subcomponents with no - digits, see {{Section 3.2.3 of RFC3986}}), or ports with redundant + digits, see {{Section 3.2.3 of RFC3986@-uri}}), or ports with redundant leading zeros, are not supported. 7. {:#c-port-omitted} The port is omitted if and only if the port would be the same as the @@ -228,8 +242,8 @@ The components are subject to the following constraints: this is considered equivalent to a path of zero path segments by HTTP and CoAP, but this equivalence does not hold for CRIs in general as they only perform normalization on the Syntax-Based Normalization level ({{Section - 6.2.2 of -uri}}), not on the scheme-specific Scheme-Based - Normalization level ({{Section 6.2.3 of -uri}}). + 6.2.2 of RFC3986@-uri}}), not on the scheme-specific Scheme-Based + Normalization level ({{Section 6.2.3 of RFC3986@-uri}}). (A CRI implementation may want to offer scheme-cognizant interfaces, performing this scheme-specific normalization for @@ -292,12 +306,12 @@ Examples of this are: * `[0, null, []]`: leaves the path alone but unsets the query and the fragment (Full) CRIs that do not correspond to a valid URI are not valid on their own, and cannot be used. -Normatively they are characterized by the {{cri-to-uri}} process producing a valid and syntax-normalized URI. +Normatively they are characterized by the {{cri-to-uri}} process not producing a valid and syntax-normalized URI. For easier understanding, they are listed here: * CRIs (and CRI references) containing a path component "." or "..". - These would be removed by the remove_dot_segments algorithm of {{RFC3986}}, + These would be removed by the remove_dot_segments algorithm of {{STD66}}, and thus never produce a normalized URI after resolution. (In CRI references, the `discard` value is used to afford segment removal, @@ -321,7 +335,7 @@ of a resource under a particular resource identifier. A Constrained Resource Identifier SHOULD be created by the naming authority that governs the namespace of the resource -identifier (see also {{RFC8820}}). +identifier (see also {{BCP190}}). For example, for the resources of an HTTP origin server, that server is responsible for creating the CRIs for those resources. @@ -355,7 +369,8 @@ fail gracefully in the face of malicious inputs.) # Comparison One of the most common operations on CRIs is comparison: determining -whether two CRIs are equivalent, without dereferencing the CRIs (using +whether two CRIs are equivalent, without dereferencing the CRIs (i.e., +using them to access their respective resource(s)). Determination of equivalence or difference of CRIs is based on simple @@ -385,8 +400,8 @@ it in resource representations, e.g., to express a hyperlink between the represented resource and the resource identified by the CRI. This section defines the representation of CRIs in -[Concise Binary Object Representation (CBOR)](#RFC8949). -When reduced representation size is desired, CRIs are not represented directly. +[Concise Binary Object Representation (CBOR)](#STD94). +When reduced representation size is desired, CRIs are often not represented directly. Instead, CRIs are indirectly referenced through *CRI references*. These take advantage of hierarchical locality and provide a very compact encoding. @@ -416,9 +431,9 @@ resolved to their respective CRI before comparison. [^replace-xxxx] -A CRI or CRI reference is encoded as a CBOR array {{RFC8949}}, with the -structure as described in the [Concise Data Definition Language -(CDDL)](#RFC8610) [including its control extensions](#RFC9165) as follows: +A CRI or CRI reference is encoded as a CBOR array (Major type 4 in +{{Section 3.1 of RFC8949@-cbor}}), with the structure described in CDDL as +follows: ~~~~ cddl {::include cddl/cri.cddl} @@ -446,7 +461,7 @@ references: For interchange as separate encoded data items, CRIs MUST NOT use indefinite length encoding (see -{{Section 3.2 of RFC8949}}); this requirement is relaxed for +{{Section 3.2 of RFC8949@-cbor}}); this requirement is relaxed for specifications that embed CRIs into an encompassing CBOR representation that does provide for indefinite length encoding. @@ -476,7 +491,7 @@ scheme-id `-1`. The `discard` section can be used in a CRI reference when neither a scheme nor an authority is present. It then expresses the operations performed on a base CRI by CRI references that -are equivalent to URI references with relative paths and path prefixes such as "/", "./", "../", "../../", etc. +are equivalent to URI references with relative paths and path prefixes such as "/", "./", "../", "../../", etc.\\ "." and ".." are not available in CRIs and are therefore expressed using `discard` after a normalization step, as is the presence or absence of a leading "/". @@ -488,11 +503,11 @@ and "/foo" removes all segments, represented in the `discard` section as the val The exact semantics of the section values are defined by {{reference-resolution}}. -Most URI references that {{Section 4.2 of RFC3986}} calls "relative +Most URI references that {{Section 4.2 of RFC3986@-uri}} calls "relative references" (i.e., references that need to undergo a resolution process to obtain a URI) correspond to the CRI reference form that starts with `discard`. The exception are relative references with an `authority` -(called a "network-path reference" in {{Section 4.2 of RFC3986}}), which +(called a "network-path reference" in {{Section 4.2 of RFC3986@-uri}}), which discard the entire path of the base CRI. These CRI references never carry a `discard` section: the value of `discard` defaults to `true`. @@ -529,9 +544,9 @@ This visualization does not go into the details of the elements. ~~~~ ~~~~ cbor-diag -[-6, / scheme-id -- equivalent to "did" / - true, / authority = NOAUTH-ROOTLESS / - ["web:alice:bob"] / path / +[-6, / scheme-id -- equivalent to "did" / + true, / authority = NOAUTH-ROOTLESS / + ["web:alice:bob"] / path / ] ~~~~ @@ -579,14 +594,14 @@ they are both null, they are both left out and only discard is transferred. Trailing null values are removed from the array. As a special case, an empty array is sent in place for a remaining -`[0]` (URI ""). +`[0]` (URI reference ""). ### Error handling and extensibility {#unprocessable} It is recommended that specifications that describe the use of CRIs in CBOR-based protocols use the error handling mechanisms outlined in this section. Implementations of this document MUST adhere to these rules -unless the containing document overrides them. +unless a containing document overrides them. When encountering a CRI that is well-formed in terms of CBOR, but that @@ -610,7 +625,7 @@ that is distinct from all processable CRIs, and distinct from all unprocessable CRIs with different CBOR representations. It is up to implementation whether unprocessable CRIs with identical representations are treated as identical to each other or not. -Unprocessable CRIs can not be dereferenced, +Unprocessable CRIs cannot be dereferenced, and it is an error to query any of their components. This mechanism ensures that CRI extensions @@ -671,7 +686,7 @@ an absolute CRI reference: # Relationship between CRIs, URIs, and IRIs -CRIs are meant to replace both [Uniform Resource Identifiers (URIs)](#RFC3986) +CRIs are meant to replace both [Uniform Resource Identifiers (URIs)](#STD66) and [Internationalized Resource Identifiers (IRIs)](#RFC3987) in [constrained environments](#RFC7228). Applications in these environments may never need to use URIs and IRIs @@ -718,7 +733,7 @@ references, and IRI references. Applications MUST convert a CRI reference to a URI reference by determining the components of the URI reference according to the following steps and then recomposing the components to a URI -reference string as specified in {{Section 5.3 of RFC3986}}. +reference string as specified in {{Section 5.3 of RFC3986@-uri}}. {:vspace} scheme @@ -744,15 +759,15 @@ authority appending a "@". Otherwise, both the subcomponent and the "@" sign are omitted. Any character in the value of the `userinfo` elements that is not in - the set of unreserved characters ({{Section 2.3 of RFC3986}}) or - "sub-delims" ({{Section 2.2 of RFC3986}}) MUST be + the set of unreserved characters ({{Section 2.3 of RFC3986@-uri}}) or + "sub-delims" ({{Section 2.2 of RFC3986@-uri}}) MUST be percent-encoded. The `host-name` is turned into a single string by joining the elements separated by dots ("."). Any character in the elements of a `host-name` item that is not in - the set of unreserved characters ({{Section 2.3 of RFC3986}}) or - "sub-delims" ({{Section 2.2 of RFC3986}}) MUST be + the set of unreserved characters ({{Section 2.3 of RFC3986@-uri}}) or + "sub-delims" ({{Section 2.2 of RFC3986@-uri}}) MUST be percent-encoded. If there are dots (".") in such elements, the conversion fails (percent-encoding is not able to represent such elements, as @@ -762,10 +777,12 @@ authority {: #host-ip-to-uri} The value of a `host-ip` item MUST be represented as a string that matches the "IPv4address" or - "IP-literal" rule ({{Section 3.2.2 of RFC3986}}). - Any zone-id is appended to the string, separated by "%25" as - defined in {{Section 2 of -zone}}, or as specified in a superseding - zone-id specification document {{-zonebis}}; this also leads to a modified + "IP-literal" rule ({{Section 3.2.2 of RFC3986@-uri}}). + Any zone-id is appended to the string; the details for how this is + done are currently in flux in the URI specification: {{Section 2 of + -zone}} uses percent-encoding and a separator of "%25", while + proposals for a future superseding zone-id specification document + (such as {{-zonebis}}) are being prepared; this also leads to a modified "IP-literal" rule as specified in these documents. If the CRI reference contains a `port` item, the port @@ -784,7 +801,7 @@ path one indicates. If the discard value is `1` and the first element of the path contains a `:`, the path component is prefixed by "./" (this avoids the first element to appear as supplying a URI scheme; - compare `path-noscheme` in {{Section 4.2 of -uri}}). + compare `path-noscheme` in {{Section 4.2 of RFC3986@-uri}}). {:#colon} If the discard item is not present and the CRI reference contains an @@ -811,16 +828,16 @@ path If the authority component is present (not `null` or `true`) and the path component does not match the "path-abempty" rule ({{Section 3.3 - of RFC3986}}), the conversion fails. + of RFC3986@-uri}}), the conversion fails. If the authority component is not present, but the scheme component is, and the path component does not match the "path-absolute", "path-rootless" (authority == `true`) or "path-empty" rule ({{Section - 3.3 of RFC3986}}), the conversion fails. + 3.3 of RFC3986@-uri}}), the conversion fails. If neither the authority component nor the scheme component are present, and the path component does not match the "path-absolute", - "path-noscheme" or "path-empty" rule ({{Section 3.3 of RFC3986}}), the + "path-noscheme" or "path-empty" rule ({{Section 3.3 of RFC3986@-uri}}), the conversion fails. query @@ -858,7 +875,7 @@ It should be sufficient for all applications that use the CoAP protocol, as well as most other protocols employing URIs. However, Basic CRIs have one limitation: They do not support URI -components that *require* percent-encoding ({{Section 2.1 of -uri}}) to +components that *require* percent-encoding ({{Section 2.1 of RFC3986@-uri}}) to represent them in the URI syntax, except where that percent-encoding is used to escape the main delimiter in use. @@ -989,8 +1006,8 @@ of the CoAP protocol {{-coap}}. ## Converting Between CoAP CRIs and Sets of CoAP Options This section provides an analogue to {{Sections 6.4 and 6.5 of -coap}}: -Computing a set of CoAP options from a request CRI {{decompose-coap}} and computing a -request CRI from a set of COAP options {{compose-coap}}. +Computing a set of CoAP options from a request CRI ({{decompose-coap}}) and computing a +request CRI from a set of COAP options ({{compose-coap}}). This section makes use of the mapping between CRI scheme numbers and URI scheme names shown in {{scheme-map}}: @@ -1174,10 +1191,10 @@ in the face of malicious inputs. Additionally, parsers MUST be prepared to deal with resource exhaustion (e.g., resulting from the allocation of big data items) or exhaustion of the call stack (stack overflow). -See {{Section 10 of RFC8949}} for additional +See {{Section 10 of RFC8949@-cbor}} for additional security considerations relating to CBOR. -The security considerations discussed in {{Section 7 of RFC3986}} and +The security considerations discussed in {{Section 7 of RFC3986@-uri}} and {{Section 8 of RFC3987}} for URIs and IRIs also apply to CRIs. @@ -1252,7 +1269,7 @@ provided in {{tab-numbers}} in {{sec-numbers}}. ## Update to "Uniform Resource Identifier (URI) Schemes" Registry {#upd} -{{-schemes}} is updated to add the following note in the "Uniform +{{RFC7595@-schemes}} is updated to add the following note in the "Uniform Resource Identifier (URI) Schemes" Registry {{IANA.uri-schemes}}: {:quote} @@ -1334,7 +1351,7 @@ representative of the normal operation of CRIs. without Uri-Path Options from the other URI-related CoAP Options produces `s://x/`, not `s://x` -- CoAP prefers the lone empty path segment form. - Similarly, after discussing HTTP semantics, {{Section 6.2.3 of -uri}} states: + Similarly, after discussing HTTP semantics, {{Section 6.2.3 of RFC3986@-uri}} states: {:quote} > In general, a URI that uses the generic syntax for authority with an @@ -1391,7 +1408,7 @@ representative of the normal operation of CRIs. {{-edn}} more rigorously defines and further extends the CBOR Extended Diagnostic Notation (EDN), as originally introduced in {{Section 8 of --cbor}} and extended in {{Appendix G of -cddl}}. +RFC8949@-cbor}} and extended in {{Appendix G of -cddl}}. Among others, it provides an extension point for "application-extension identifiers" that can be used to notate CBOR data items in application-specific ways. @@ -1428,8 +1445,9 @@ See {{cri-grammar}} for an ABNF definition for the content of `cri` literals. ## cri: ABNF Definition of URI Representation of a CRI {#cri-grammar} The syntax of the content of `cri` literals can be described by the -ABNF for `URI-reference` in {{Section 4.1 of -uri}}, as reproduced -in {{abnf-grammar-cri}}. +ABNF for `URI-reference` in {{Section 4.1 of RFC3986@-uri}} with certain +re-arrangements taken from {{figure-5 (Figure 5)