diff --git a/appendix.md b/appendix.md new file mode 100644 index 0000000..e7692f6 --- /dev/null +++ b/appendix.md @@ -0,0 +1,38 @@ +# Appendix + + +## ZMQ + +Some useful hints regarding how the zmq package works. + + +### ROUTER sockets + +A small introduction to ROUTER sockets, for more details see [zmq guide chapter 3](https://zguide.zeromq.org/docs/chapter3/#Exploring-ROUTER-Sockets). + +The ROUTER socket is mostly used in a server role as it can maintain connections to many peers. +In order to distinguish peers, it assigns a random _identity_ to each connected peer. +Application code does not know which peer gets which identity, however, the identity of a peer stays the same for the lifetime of the connection. + +If, for example, two Components `CA`, `CB` connect to a ROUTER socket, the socket assigns identities, which will be called `IA`, `IB` here (they can be any byte sequence). + +Whenever a message is sent to a ROUTER socket from a peer, the socket prepends that identity in front of the message frames, before handing the message to the application code. +For example, if `CA` sends the message `Request A`, the ROUTER socket will read `IA|Request A`. +That way, an answer to that message can be returned to this same peer, and not another one (a ROUTER socket may have many connected peers). +Consequently, in order to send such an answer, the identity has to be prepended to the frames to send: Calling the ROUTER's send command with `IA|Reply A`, the socket will send `Reply A` to the peer, whose identity is `IA`, in this case that is `CA`. + +The following diagram shows this example communication with two Components: +sequenceDiagram + participant Code as Message handling + participant ROUTER as ROUTER socket + Note over Code, ROUTER: Coordinator + CA ->> ROUTER: "Request A" + ROUTER ->> Code: "IA|Request A" + Code ->> ROUTER: "IA|Reply A" + ROUTER ->> CA: "Reply A" + CB ->> ROUTER: "Request B" + ROUTER ->> Code: "IB|Request B" + Code ->> ROUTER: "IB|Reply B" + ROUTER ->> CB: "Reply B" +::: + diff --git a/conf.py b/conf.py index d9a9d06..5a1d0c4 100644 --- a/conf.py +++ b/conf.py @@ -18,7 +18,7 @@ myst_enable_extensions = [ "colon_fence", ] -myst_heading_anchors = 3 +myst_heading_anchors = 5 templates_path = ['_templates'] exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] diff --git a/control_protocol.md b/control_protocol.md new file mode 100644 index 0000000..e38ce96 --- /dev/null +++ b/control_protocol.md @@ -0,0 +1,334 @@ +# Control protocol + +The control protocol transmits messages via its {ref}`control_protocol.md#transport-layer` from one Component to another. +The {ref}`control_protocol.md#message-layer` is the common language to understand commands, thus creating a remote procedure call. + + +## Transport layer + +The transport layer ensures that a message arrives at its destination. + + +### Protocol basics + + +#### Socket Configuration + +Each {ref}`Coordinator ` shall offer one {ref}`ROUTER ` socket, bound to an address. +The address consists of a host (this can be the host name, an IP address of the device, or "\*" for all IP addresses of the device) and a port number, for example `*:12345` for all IP addresses at the port `12345`. + +{ref}`Components ` shall have one DEALER socket connecting to one Coordinator's ROUTER socket. + +Coordinators shall have one DEALER socket per other Coordinator in the Network. +This DEALER socket shall connect to the other Coordinator's ROUTER socket. + +:::{note} +While the number of DEALER sockets thus required scales badly with the number of Connectors in a LECO Network, the scope of the protocol means that at most a few Coordinators will be involved. +::: + +Communicating with a Coordinator, messages must be sent to a Coordinator's ROUTER socket. +Only for acknowledging a {ref}`control_protocol.md#coordinator-sign-in`, it is permitted to send a message to a Coordinator's DEALER socket. + + +#### Naming scheme + +Each Component must have an individual name, given by the user, the _Component name_. +Component names must be unique in a {ref}`Node `, i.e. among the Components (except other Coordinators) connected to a single Coordinator. +A Coordinator itself must have the Component name `COORDINATOR`. + +Similarly, every Node must have a name, the _Namespace_. +Every Namespace must be unique in the Network. + +A Component name or a Namespace must be a series of printable ASCII characters (byte values 0x20 to 0x7E), without the character "." (byte value 0x2E). + +As each Component belongs to exactly one Node, it is fully identified by the combination of Namespace and Component name, which is globally unique. +This _Full name_ is the composition of Namespace, ".", and Component name. +For example `N1.CA` is the Full name of the Component `CA` in the Node `N1`. + +The receiver of a message may be specified by Component name alone if the receiver belongs to the same Node as the sender. +In all other cases, the receiver of a message must be specified by the Full name. + +The sender of a message must be specified by Full name, except during SIGNIN, when the Component name alone is sufficient. + + +#### Message composition + +A message consists of 4 or more frames. +1. The protocol version (abbreviated with "V" in examples). +2. The receiver Full name or Component name, as appropriate. +3. The sender Full name. +4. A content header (abbreviated with "H" in examples). +5. Message content: The optional payload, which can be 0 or more frames. + + +#### Directory + +Each Coordinator shall have a list of the Components connected to it. +This is its _local Directory_. + +They shall also keep a list of the addresses of all Coordinators, they are connected to. + +Additionally, they shall maintain a _global Directory_, which is a Coordinator's copy of the union of the local Directories of all Coordinators in a Network. + + +### Conversation protocol + +In the protocol examples, `CA`, `CB`, etc. indicate Component names. +`N1`, `N2`, etc. indicate Node Namespaces and `Co1`, `Co2` their corresponding Coordinators. + +Here the Message content is expressed in plain English and placed in the Content frame, for the exact definition see {ref}`control_protocol.md#message-layer`. + +:::{note} +TBD: How to show the encoded content in the examples? +::: + + +In the exchange of messages, only the messages over the wire are shown, the connection identity used by the ROUTER socket is not shown. + + +#### Communication with the Coordinator + + +##### Signing-in + +After connecting to a Coordinator (`Co1`), a Component (`CA`) shall send a SIGNIN message indicating its Component name. +The Coordinator shall indicate success/acceptance with an ACKNOWLEDGE response, giving the Namespace and other relevant information, or reply with an ERROR, e.g. if the Component name is already taken. +In that case, the Coordinator may indicate a suitable, still available variation on the indicated Component name. +The Component may retry SIGNIN with a different chosen name. + +After a successful handshake, the Coordinator shall store the Component name in its {ref}`control_protocol.md#directory` and shall ensure message delivery to that Component (e.g. by storing the (zmq) connection identity with the local directory). +It shall also notify the other Coordinators in the network that this Component signed in, see {ref}`control_protocol.md#coordinator-coordination`. +Similarly, the Component shall store the Namespace and use it from this moment on, to generate its Full name. + +If a Component does send a message to someone without having signed in, the Coordinator shall refuse message handling and return an error. + +:::{mermaid} +sequenceDiagram + Note over CA,N1: Name "CA" is still free + participant N1 as N1.COORDINATOR + CA ->> N1: V|COORDINATOR|CA|H|SIGNIN + Note right of N1: Connection identity "IA" + Note right of N1: Stores "CA" with identity "IA" + N1 ->> CA: V|N1.CA|N1.COORDINATOR|H|ACKNOWLEDGE: Namespace is "N1" + Note left of CA: Stores "N1" as Namespace + Note over CA,N1: Name "CA" is already used + CA ->> N1: V|COORDINATOR|CA|H|SIGNIN + N1 ->> CA: V|CA|N1.COORDINATOR|H|ERROR: Name "CA" is already used. + Note left of CA: May retry with another Name + Note over CA,N1: "CA" has not send SIGNIN + Note left of CA: Wants to send a message to CB + CA ->> N1: V|N1.CB|CA|H|Content + Note right of N1: Does not know CA + N1 ->> CA: V|CA|N1.COORDINATOR|H|ERROR:I do not know you + Note left of CA: Must send a SIGNIN message before further messaging. +::: + + +##### Heartbeat + +Heartbeats are used to know whether a communication peer is still online. + +Every message received counts as a heartbeat. + +A Component should and a Coordinator shall send a PING and wait some time before considering a connection dead. +A Coordinator shall follow the {ref}`control_protocol.md#signing-out` for a signed in Component considered dead. + +:::{note} +TBD: Heartbeat details are still to be determined. +::: + + +##### Signing out + +A Component should send a SIGNOUT message to its Coordinator when it stops participating in the Network. +The Coordinator shall ACKNOWLEDGE the sign-out and remove the Component name from its local {ref}`control_protocol.md#directory`. +It shall also notify the other Coordinators in the network that this Component signed out, see {ref}`control_protocol.md#coordinator-coordination`. + +:::{mermaid} +sequenceDiagram + CA ->> N1: V|COORDINATOR|N1.CA|H|SIGNOUT + participant N1 as N1.COORDINATOR + N1 ->> CA: V|N1.CA|N1.COORDINATOR|H|ACKNOWLEDGE + Note right of N1: Removes "CA" with identity "IA"
from local Directory + Note right of N1: Notifies other Coordinators about sign-out of "CA" + Note left of CA: Shall not send any message anymore except SIGNIN +::: + + +#### Communication with other Components + +The following two examples show how a message is transferred between two components `CA`, `CB` via one or two Coordinators. + +Coordinators shall route the message to the corresponding Coordinator or connected Component. + + +:::{mermaid} +sequenceDiagram + alt Full name + CA ->> N1: V|N1.CB|N1.CA|H| Give me property A. + else only Component name + CA ->> N1: V|CB|N1.CA|H| Give me property A. + end + participant N1 as N1.COORDINATOR + N1 ->> CB: V|N1.CB|N1.CA|H| Give me property A. + Note left of CB: Reads property A + CB ->> N1: V|N1.CA|N1.CB|H| Property A has value 5. + N1 ->> CA: V|N1.CA|N1.CB|H| Property A has value 5. +::: + + +:::{mermaid} +sequenceDiagram + CA ->> N1: V|N2.CB|N1.CA|H| Give me property A. + participant N1 as N1.COORDINATOR + Note over N1,N2: N1 DEALER socket sends to N2 ROUTER + participant N2 as N2.COORDINATOR + N1 ->> N2: V|N2.CB|N1.CA|H| Give me property A. + N2 ->> CB: V|N2.CB|N1.CA|H| Give me property A. + Note left of CB: Reads property A + CB ->> N2: V|N1.CA|N2.CB|H| Property A has value 5. + Note over N1,N2: N2 DEALER socket sends to N1 ROUTER + N2 ->> N1: V|N1.CA|N2.CB|H| Property A has value 5. + N1 ->> CA: V|N1.CA|N2.CB|H| Property A has value 5. +::: + +Prerequisites of Communication between two Components are: +- Both Components are connected to a Coordinator and {ref}`signed in`. +- Both Components are either connected to the same Coordinator (example one), or their Coordinators are connected to each other (example two). + + +The following flow chart shows the decision scheme and message modification in the Coordinator `Co1` of Node `N1`. +Its Full name is `N1.Coordinator`. +`nS`, `nR` are placeholders for sender and recipient Namespaces. +`recipient` is a placeholder for the recipient Component name. +`iA` is a placeholder for the connection identity of the incoming message and `iB` that of `N1.Recipient`. +Bold arrows indicate message flow, thin lines indicate decision flow. +Thin, dotted lines indicate decision flow in case of errors. +Placeholder values are written in lowercase, while actually known values begin with an uppercase letter. + +:::{mermaid} +flowchart TB + C1([N1.CA DEALER]) == "V|nR.recipient|nS.CA|H|Content" ==> R0 + C0([nS.COORDINATOR DEALER]) == "V|nR.recipient|nS.CA|H|Content" ==> R0 + R0[receive] == "iA|V|nR.recipient|nS.CA|H|Content" ==> CnS{nS == N1?} + CnS-->|no| RemIdent + CnS-->|yes| Clocal{CA in
local Directory?} + Clocal -->|yes| CidKnown{iA is CA's identity?} + CidKnown -->|yes| RemIdent + Clocal -.->|no| E1[ERROR: Sender unknown] ==>|"iA|V|nS.CA|N1.COORDINATOR|H|ERROR: Sender unknown"| S + S[send] ==> WA([N1.CA DEALER]) + CidKnown -.->|no| E2[ERROR: Name and identity do not match]==>|"iA|V|nS.CA|N1.COORDINATOR|H|ERROR: Name and identity do not match"| S + RemIdent[remove sender identity] == "V|nR.recipient|nS.CA|H|Content" ==> CnR + CnR -- "is None" --> Local + CnR{nR?} -- "== N1"--> Local + Local{recipient
==
COORDINATOR?} -- "yes" --> Self[Message for Co1
itself] + Self == "V|nR.recipient|nS.CA|H|Content" ==> SC([Co1 Message handling]) + Local -- "no" --> Local2a{recipient in local Directory?} + Local2a -->|yes, with Identity iB| Local2 + Local2[add recipient identity iB] == "iB|V|nR.recipient|nS.CA|H|Content" ==> R1[send] + R1 == "V|nR.recipient|nS.CA|H|Content" ==> W1([Wire to N1.recipient DEALER]) + Local2a -.->|no| E3[ERROR recipient unknown
send Error to original sender] ==>|"V|nS.CA|N1.COORDINATOR|H|ERROR N1.recipient is unknown"|CnR + CnR -- "== N2" --> Keep + Keep[send to N2.COORDINATOR] == "V|nR.recipient|nS.CA|H|Content" ==> R2[send] + R2 == "V|nR.recipient|nS.CA|H|Content" ==> W2([Wire to N2.COORDINATOR ROUTER]) + subgraph Co1 ROUTER socket + R0 + end + subgraph Co1 ROUTER socket + R1 + S + end + subgraph Co1 DEALER socket
to N2.COORDINATOR + R2 + end +::: + + +#### Coordinator coordination + +Coordinators are the backbone of the Network and need to coordinate themselves. + + +##### Coordinator sign-in + +A Coordinator joins a Network by signing in to any Coordinator of that Network. +The sign-in/sign-out procedure between two Coordinators is more thorough than that of Components. +During the sign-in procedure, Coordinators exchange their local Directories and addresses of all known Coordinatos. +They shall sign in to all Coordinators, they are not yet signed in. +The sign-in might happen because the Coordinator learns a new Coordinator address via Directory updates or at startup. +The sign-out might happen because the Coordinator shuts down. + +These are the sign-in/sign-out sequences between Coordinators, where `address` is for example the host name and port number of the Coordinator's ROUTER socket. + +:::{mermaid} +sequenceDiagram + participant r1 as ROUTER + participant d1 as DEALER + participant r2 as ROUTER + participant d2 as DEALER + Note over r1,d1: N1 Coordinator
at address1 + Note over r2,d2: N2 Coordinator
at address2 + Note over r1,d2: Sign in between two Coordinators + Note right of r1: shall connect
to address2 + activate d1 + Note left of d1: created with
name "temp-NS" + d1-->>r2: connect to address2 + d1->>r2: V|COORDINATOR|N1.COORDINATOR|H|
CO_SIGNIN + Note right of r2: stores N1 identity + r2->>d1: V|N1.COORDINATOR|N2.COORDINATOR|H|ACK + Note left of d1: DEALER name
set to "N2" + d1->>r2: V|N1.COORDINATOR|N2.COORDINATOR|H|
Here is my local directory
and Coordinator addresses + Note right of r2: Updates global
Directory and signs
in to all unknown
Coordinators,
also N1 + Note over d1,r2: Mirror of above sign-in procedure + activate d2 + Note left of d2: created with
name "N1" + d2-->>r1: connect to address1 + d2->>r1: V|COORDINATOR|N2.COORDINATOR|H|
CO_SIGNIN + Note right of r1: stores N2 identity + r1->>d2: V|N2.COORDINATOR|N1.COORDINATOR|H|ACK + Note left of d2: Name is already "N1" + d2->>r1: V|N2.COORDINATOR|N1.COORDINATOR|H|
Here is my local directory
and Coordinator addresses + Note right of r1: Updates global
Directory and signs
in to all unknown
Coordinators + Note over r1,d2: Sign out between two Coordinators + Note right of r1: shall sign out from N2 + d1->>r2: CO_SIGNOUT + Note right of r2: removes N1 identity + d2->>-r1: CO_SIGNOUT + Note right of r1: removes N2 identity + deactivate d1 +::: + +:::{note} +Note that the DEALER socket responds with the local Directory and Coordinator addresses to the received Acknowledgment. +::: + + +##### Coordinator updates + +Each Coordinator shall keep an up-to-date global {ref}`control_protocol.md#directory` with the Full names of all Components in the Network. +For this, whenever a Component signs in to or out from its Coordinator, the Coordinator shall notify all the other Coordinators regarding this event. +The other Coordinators shall update their global Directory according to this message (add or remove an entry). + +On request, Coordinators shall send the Names of their local or global Directory, depending on the request type. + +For the format of the Messages, see {ref}`control_protocol.md#message-layer`. + + +## Message layer + + +### Messages for Transport Layer + +- SIGNIN +- SIGNOUT +- ACKNOWLEDGE +- ERROR +- PING +- CO_SIGNIN +- CO_SIGNOUT + + + +:::{note} +TODO +::: diff --git a/glossary.md b/glossary.md index 8580c3d..e4f4afe 100644 --- a/glossary.md +++ b/glossary.md @@ -10,6 +10,9 @@ Actor Component A type of entity, a set of which make up the LECO communication Network, see {ref}`components.md#components`. +Component name + The individual name in a Node, under which a Component can be addressed, see {ref}`control_protocol.md#naming-scheme`. + Coordinator A Component primarily concerned with routing/coordinating the message flow between other Components, see {ref}`components.md#coordinator`. There are Control Coordinators, Data Coordinators, and Logging Coordinators. @@ -20,9 +23,16 @@ Device Director A Component which takes part in orchestrating a (i.e. LECO-controlled) measurement setup, see {ref}`components.md#director`. +Directory + Each Coordinator maintains a local Directory with all the Components connected to it (i.e. other Coordinators and the Components of its own Node), and a global Directory with all Components in the whole Network, see {ref}`control_protocol.md#directory`. + Driver An object that takes care of communicating with a Device. This object is external to LECO, for example coming from and instrument control library like `pymeasure`, `instrumentkit` or `yaq`. See {ref}`components.md#driver`. +Full name + The name of a Component unique for the whole setup. + It consists of the namespace and component-name, see {ref}`control_protocol.md#naming-scheme`. + LECO The **L**aboratory **E**xperiment **CO**ntrol protocol framework. @@ -39,6 +49,9 @@ Message Layer Message Transport Mode (LMT/DMT) The Node-local Message Layer can have a local or distributed mode, see {ref}`network-structure.md#message-transport-mode-lmtdmt`. +Namespace + The name of a Node in the Network, see {ref}`control_protocol.md#naming-scheme`. + Node A Node is a local context in which (part of) a LECO deployment runs. This may be a single application using one or more threads or processes. diff --git a/index.rst b/index.rst index 172aa26..6c887d1 100644 --- a/index.rst +++ b/index.rst @@ -15,8 +15,10 @@ components messages network-structure + control_protocol glossary Hello_world + appendix Indices and tables ================== diff --git a/network-structure.md b/network-structure.md index 180c1d1..84c4b93 100644 --- a/network-structure.md +++ b/network-structure.md @@ -47,12 +47,27 @@ flowchart LR Control Coordinator to Control Coordinator communication always uses DMT. ## Message Transport Mode (LMT/DMT) + The Node-local Message Layer can have a local or distributed mode. -The Distributed Message Transport (DMT, using the zeromq TCP protocol) works within or across Nodes. -The Local Message Transport (LMT) only works within a Node _and_ within a process. +### Distributed Message Transport (DMT) + +The Distributed Message Transport works within or across Nodes. +Currently, the only defined transport layer uses [Zmq](https://zeromq.org/) sockets for the communication. For more details see the [zmq guide](https://zguide.zeromq.org/) or [zmq API](http://api.zeromq.org/) + +Zmq messages consist in a series of frames, each is a byte sequence. +In this documentation, the separation between frames is indicated by `|`. +An empty frame is indicated with two frame separators `||`, even at the beginning or end of a message. +For example, the message `||Second frame|Third frame||Fifth frame` consists of 5 frames, with the first and fourth frames being empty frames. + +For some useful information see our {ref}`appendix.md#zmq`. + + +### Local Message Transport (LMT) + +The Local Message Transport only works within a Node _and_ within a process. Local Message Transport options include queues between threads/processes and zeromq inproc. :::{admonition} Warning -LMT details are still notional and not to be relied upon this will be fleshed out at a later date. +LMT details are still notional and not to be relied upon, this will be fleshed out at a later date. The list of LMT options is not definitive yet. :::