Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control protocol header format #33

Closed
BenediktBurger opened this issue Jan 30, 2023 · 40 comments · Fixed by #38
Closed

Control protocol header format #33

BenediktBurger opened this issue Jan 30, 2023 · 40 comments · Fixed by #38
Labels
distributed_ops Aspects of a distributed operation, networked or on a node messages Concerns the message format

Comments

@BenediktBurger
Copy link
Member

How do we separate the individual parts of the header?

  1. We could use separate frames, but that creates some overhead
  2. We could use specified lengths
  3. We could use a separation character
  4. a mix: Separate recipient and sender with a char and have a fixed length for the messageID. The rest is the conversation ID

I propose the following scheme with a separation character (i.e. forbidden character):
"Recipient name";"Sender name";"ConversationID";"MessageID".
The message ID might be some binary data (#16). So we can look for the first three semi colons and the rest is the message ID independent of the ASCII representation.

Consequences:

  • ConversationID/reply-reference cannot contain a semi colon and therefore cannot be any binary number.

Alternative:
If we have a messageID with a fixed byte length, we can do the following header: "Recipient";"Sender";MessageIDConversationID
The code would look for the first two semi colons for the recipient and sender. Afterwards it takes a certain number of bytes for the message ID and the rest is the ConversationID, which can be flexible in length and can be have any bytes value.

@BenediktBurger BenediktBurger added distributed_ops Aspects of a distributed operation, networked or on a node discussion-needed A solution still needs to be determined messages Concerns the message format labels Jan 30, 2023
@bklebel
Copy link
Collaborator

bklebel commented Jan 30, 2023

I think the most convenient way would be to use separate frames. Yes, it does create a bit of overhead, but I do not think this is critical for us - speed is important, but readability and versatility are too. With frames, we can use zmq's multipart framework, and directly get a well separated list of items, where we can have the first 1 to n frames as header, and then the n+1st frame (which is also the last) as the payload.
Like this, we do not have problems with forbidden characters, or need anything extra in the implementation code which separates them, we can simply use whatever methods are present (in python we have framelist.pop(0)) in the programming language in use for an implementation. So essentially, we always have some overhead, I think, either directly inside of zmq, or in our implementation around it, just that if we do it within zmq we do not have to concern ourselves too much with it.

@BenediktBurger
Copy link
Member Author

For the names, the dot "." is already reserved due to a namespaces.

Ok, so we keep it simple: Frame separation is already needed, so why not use that even more instead of inventing new methods.

In that case (separated by frames), we can choose any order. I'd say: Recipient, Sender, MessageID (because necessary), ConversationID (because optional).

@bilderbuchi
Copy link
Member

As I understood it so far, Avro can take should a lot of that burden (but maybe I skim-read too quickly).
How about: we put in the first frame(s) only enough info so that zmq can do its thing (topic for pub/sub filtering, whatever else), and the rest gets put into Avro messages, where this gets (de)composed into fields, and validated, etc., and we don't need to mess around with separation characters or separate frames?
Does that work?

The nice thing about Avro is that we get schemas for our messages, so the structure is "in the code", schemas get exchanged during handshake, so participating Components know what to expect, the schemas are used to validate and process the messages, and we don't have to keep track of those details manually (with hand-wrought formats).

@bklebel
Copy link
Collaborator

bklebel commented Jan 30, 2023

I am not perfectly sure about Avro (I did not look at it as detailed as I looked at zmq), but what we are looking at here will not be included, I think, since it is mostly about routing. For routing, we need separate frames, because when sending a multipart message from the ROUTER socket, the first frame needs to be the recipients ID (I think), and nothing else, so we cannot stuff everything inside of it. Also, the conversationID is to track the ongoing conversation, and be sure that the answer we got is now the one we actually asked right here, and not the answer to a previous question to an instrument which reacted slower, especially if we look at asynchronously working Directors, where we need to be able to tie replies to the requests, on one and the same socket.
I don't think it Avro is supposed to do routing, is it?

@BenediktBurger
Copy link
Member Author

Oh, you intend, that we serialize the receiver, sender, message, and conversation IDs via avro?

I intended to do this header (routing information) manually (this issue) and add avro on top (another issue).

The only requirement, from zmq, is, that the topic of the data protocol is in the first frame. Everything else is up to us.

@bilderbuchi
Copy link
Member

but what we are looking at here will not be included, I think, since it is mostly about routing.

Ah, then I got confused, because you guys were mentioning "Recipient, Sender, MessageID, ConversationID ", I thought for the zmq routing part indeed you only need the recipient.
All the other parts I'd have considered part of the message, e.g. the conversationID to associate it with the correct conversation -- this would/couldhappen with an unpacked message anyway, so might as well serialise everything together.

The only thing I was not sure about is that we would duplicate the "recipient" information, once in the first zmq frame, then in the message header (packed in the zmq payload in the avro message).

I don't think it Avro is supposed to do routing, is it?

No, afaik not. But zmq only needs the recipient for the routing, right? all the rest (convo tracking, message id) is stuff that LECO components put on top, right, as zmq is not doing that, either?

@bilderbuchi
Copy link
Member

bilderbuchi commented Jan 30, 2023

Oh, you intend, that we serialize the receiver, sender, message, and conversation IDs via avro?

I intended to do this header (routing information) manually (this issue) and add avro on top (another issue).

Yeah, indeed, I did. If we are already using a specified/schematised message format, why do the header stuff manually separately if we don't need it for the transport layer?

The only requirement, from zmq, is, that the topic of the data protocol is in the first frame. Everything else is up to us.

or the recipient of a control message, IIUC Benjamin's reply above.

@BenediktBurger
Copy link
Member Author

The only thing I was not sure about is that we would duplicate the "recipient" information, once in the first zmq frame, then in the message header (packed in the zmq payload in the avro message).

ZMQ does not need the recipient information in the first header. ZMQ does just send messages over tcp connections. Our Coordinators need somehow the "recipient" information to match that (human readable) name to its own (arbitrary, not selectable) zmq socket addresses.

Basically, the data protocol does not need any "zmq" header. We could do everything in avro (or whatever).

I wanted to have a three step communication: zmq manages the connections. Then we have routing information (human readable) to route messages where we want them. Finally some serialization helps to transport data. The last two are arbitrary distinctions in my head.

@BenediktBurger
Copy link
Member Author

For routing, we need separate frames, because when sending a multipart message from the ROUTER socket, the first frame needs to be the recipients ID (I think)

@bilderbuchi that first frame is the address the zmq socket assigns to the other side. We map our own names to that (arbitrarily chosen address).

@BenediktBurger
Copy link
Member Author

Yeah, indeed, I did. If we are already using a specified/schematised message format, why do the header stuff manually separately if we don't need it for the transport layer?

I understood that Routing, that we set up, as "Transport Layer", therefore my requests in the PR.

It is good to have the routing information external to the payload, such that the Coordinators do not have to deserialize the whole payload, just to know, to whom to deliver the message.

@bilderbuchi
Copy link
Member

It is good to have the routing information external to the payload, such that the Coordinators do not have to deserialize the whole payload, just to know, to whom to deliver the message.

Thanks, I'll have to think.
We ahve to find out if a Coordinator could pull out routing info from an Avro message without unpacking the whole thing -- unpacking a multi-MB oscilloscope trace or image just to get the recipient humand-readable name would indeed be quite wasteful. Maybe the header is distinctly deserialisable.

At the same time, we need to make sure that the Avro message retains enough context (I had considered the routing info just that) so that it remains "meaningful"/interpretable once its separated from the transport layer and goes through our system.

@BenediktBurger
Copy link
Member Author

I try to explain how a zmq ROUTER socket works:

  • the socket binds to some host and port (as known from tcp).
  • Other sockets (from different programs, for example Components) connect to this ROUTER socket, let's call these other sockets "peer". (I think, that each connecting peer gets assigned an address, some random byte number, at this point)
  • Whenever some peer sends a message, for example "some example message", the ROUTER socket returns (at readout) the peer's address (i.e. "iojy") and the message, that is for example "iojy", "some example message" (as two frames).
  • If you want to send a response to that peer (and not any other one), you send again two frames: "iojy", "my response" to the ROUTER port. Now the ROUTER knows to which peer it has to send the message and it sends (over the wire) only "my response" as the underlying tcp connection is 1on1.
  • These addresses exist ONLY in that single ROUTER socket and are there, to distinguish the different connections (as the ROUTER is 1 to many).
  • If a peer destroys its socket and connects with a new socket, it will get a new address!

What our Coordinators have to do, is to know, the peer better than by its (arbitrary) address. Therefore we introduced (for our own code) the recipient name. Now a Coordinator can look up the address that corresponds to that recipient's name and send it a message.
For ease of usage, we kept the recipient's name and the sender's name and message id (basic message information) outside of the payload.
That way, we can route messages without knowing the content. Similar to a letter: Outside you have recipient and sender (in case of an error or rejection) and a date (poststamp).

This issue is exactly about that envelope.

@bilderbuchi
Copy link
Member

bilderbuchi commented Jan 30, 2023

Thanks, that's informative!
I have read parts of the Avro spec, and there does not seem to be anything in it regarding selectively unpacking a message. There was a section on message frames, but that doesn't seem applicable for us.
Therefore, I agree now that the routing information should not live exclusively in the Avro message, and has to be included in the zmq message somehow.

When planning the message format (this will not be restricted to the header, I think), please keep in mind the further flow of the message -- how do we keep the routing info near/associated with the Avro message - do we wrap both in some container? duplicate the info in the Avro message? Do both parts get deserialised into some data structure upon arrival (that is then passed on to the business logic? Something else?

Also, by gut feeling I would prefer if the header fields are separated by something more resilient than a magic character (zmq frames sounds fine), even if we pay a little overhead cost. If that should turn out to be a bottleneck later, we can always change that then.

@bklebel
Copy link
Collaborator

bklebel commented Jan 30, 2023

(I think, that each connecting peer gets assigned an address, some random byte number, at this point)

This does not necessarily need to be true. If the peer first sets an identity to itself, this identity is what the ROUTER socket will see. If the ROUTERs are allowed to choose the identities of peer Components by themselves (random bytes), how would you know that identity (even more so if you do a second hop through a second Coordinator) when writing your Director which tries to request data from a specific Actor? The random peer ID works well if you have a ROUTER connected to a REQ socket, because then the ROUTER takes the first frame from the REQ as their identity, do something with the payload, and answer to it using the identity which it generated (and which contents do not matter for this purpose). In the zmqguide it is sometimes written that the ROUTER cannot start sending messages, because it does not know where to send it in advance. This is only true in certain messaging patterns, especially with all these service-oriented servers which do some jobs for clients, who do not care who actually provides them with some data (imagine a weather information service, as used as example in the zmqguide). We have a fundamentally different problem, in that we want data from very precisely defined Components, and yes, I think the user has to hand out those identities at some point. After all, they need to know whom to talk to, the temperature of which controller they now need/want to change, or the motor setting of a very specific mirror. This needs to be transparent when writing sequences, and we need some names for it.

What is important for the ROUTER-DEALER connection here, is that the DEALER sets their identity BEFORE connecting to the ROUTER, so that the ROUTER receives this set identity, and does not set its own random ID for this particular connection.

For this reason, we really need the Recipients ID and the Senders ID in the header, preferable in the header frames, so we can use them without deserializing whatever payload comes with the message, as discussed above. For me, this is almost as @bmoneke put it:

zmq manages the connections. Then we have routing information (human readable) to route messages where we want them. Finally some serialization helps to transport data. The last two are arbitrary distinctions in my head.

It is just that the human-readable names can be used and are important for the zmq connection management, because this is intimately tied to how zmq routes manages (in our current way of thinking with ROUTER sockets).

@BenediktBurger
Copy link
Member Author

BenediktBurger commented Jan 30, 2023

Oh, thanks @bklebel for reminding of that possibility to set your own identity beforehand, which I had forgotten (I did not need it).
Here is the corresponding text from the guide https://zguide.zeromq.org/docs/chapter3/#Identities-and-Addresses

With that, it could be necessary to overhaul the routing concept. I have to think about it.

Here an example http://wiki.zeromq.org/tutorials:dealer-and-router

@bklebel
Copy link
Collaborator

bklebel commented Jan 30, 2023

I think the routing concept is just fine, it's only that for sending something between Coordinators, the corresponding sending socket needs to get an additional frame prepended to the rest for routing to the second Coordinator, who in turn needs to scrap that first frame (if this happens to be a ROUTER socket which will make it visible), as well as the first part of the recipientID which contains its own identity (zmq would try to send something to "Coord2.Comp1" which does not exist, because the only zmq identities which might exist are "Coord2" and "Comp1", connected to each other respectively).
So the principle of the routing stays the same, but the frames and implementation might get a bit hairy.

(I did not need it).

How do you send your messages if you do not set your ID's yourself? How do you know where you're sending messages? I am intrigued :D - or did you chose a fundamentally different approach to send command/control messages to Actors so far?

@BenediktBurger
Copy link
Member Author

Until recently, I did not send to Actors, just to ha handful of Observers and I used a new port number for each one...
Only in my test implementation, sparked by our discussion, did I implement a routing mechanism. I put the receiver name in the first data frame (together with sender name and message ID). Then the Coordinator would create a dictionary of sender names (from data) and their addresses (very first ROUTER frame).
If we use, as we should, routing via frames, we have to change the routing (and framing) a bit. I open a new issue.

@BenediktBurger
Copy link
Member Author

  • We agreed upon using separate frames for the header.
  • In New message routing concept  #34 we found out, that we need more than one frame (for better routing) for the recipient and that it should be the first (few) frames.

Message format is: Namespace|Recipient||More frames to be determined (Namespace frame is optional)

Where do we place the additional information (sender, messageID etc.?)
The sender is added by the Coordinator (in #34), therefore it is fixed to two frames. So we have some options:

  1. Namespace|Recipient||Senderframes|| Content (Separation indicates begin of more data, so number of senderframes is flexible)
  2. Namespace|Recipient||2 Senderframes| More content (Senderframes are the first two content frames)
  3. Namespace|Recipient||Content||Senderframes (Senderframes are the frames after the last empty frame)

As we agreed upon only two Coordinators per message (and even multi-hop would not increase the number of address frames), we could do the following (based onproposal 2), regarding the full header:
A. Namespace|Recipient||2 Senderframes| MessageID|ConversationID|Content
B. Namespace|Recipient||2 Senderframes| Content (including MessageID and ConversationID, determined by message layer)

I prefer option 2, but I'm unsure whether suboption A or B, I tend slightly to B

@BenediktBurger BenediktBurger linked a pull request Feb 2, 2023 that will close this issue
5 tasks
@BenediktBurger
Copy link
Member Author

Current state is:

  • Recipient full Id|Sender full Id|More frames, because the first two frames are necessary for the routing.

Now the question is, what else should enter the header, what the content.

I think it is good, if we have at least a third header frame, which indicates the type of content, such that the recipient knows how to interpret it.

  • The third frame could be one byte for the content type (for example binary, avrò, json..., maybe also the serialization version) and the message ID. (my preference)
  • Alternatively, we could require, that the third frame (if present) is always serialized in our serialization scheme (current contender is avro). Then this third frame can indicate whether bytes follow thereafter (for example) and it could contain message ID and reply reference.

@bilderbuchi
Copy link
Member

bilderbuchi commented Feb 4, 2023

If the third frame contains "metadata", we would remain more flexible.
E.g. if we place the content type in there, in case we need special rules for sender/recipient of SIGN_IN messages, we could determine that the message is a SIGN_IN without deserializing the payload frame.
Also, we would be free to put more stuff in there like the message id etc.
Maybe include a version identifier, so we can deal with it cleanly if the format of the third frame evolves.
So, 👍 on a third header frame.

@bilderbuchi
Copy link
Member

For what else we could/should include in the header, maybe this can serve as a "reading inspiration". (Not judging technically if that is applicable/relevant for us).

@bklebel
Copy link
Collaborator

bklebel commented Feb 5, 2023

I think it would be a good idea to have one or more additional header frames for metadata. The CoAP protocol @bilderbuchi linked to is quite interesting I think, especially regarding the header. From an implementation point of view, it might be good to put the header frames in the order in which they are needed when reading them. Initially, having the routing frames at the front was motivated by using them together with the zmq identity itself, but this is not a thing for us anymore. Therefore the order could be (strongly taking from CoAP):

  1. version
  2. type (i.e. Sign_IN/GET/whatever)
  3. recipient full ID
  4. sender full ID
  5. message ID and conversation ID (could also be separated into multiple frames, with the conversation ID coming first)
  6. payload

The version of the protocol in use is a good point from CoAP which @bilderbuchi linked to, although this opens a fully new rabbit-hole of "how to deal with messages from a different version than my current one", as seen from a particular Components' view. Still, that might be better than "wth do you want to tell me with that? ". Also, if we put the version all the way to the front, we are free to change anything and everything behind that from version to version (even though it might not be advisable in itself), and still get a working "this version is no longer supported" response, without errors, since the rest of the message does not need to be looked at.

In general I would like to avoid serializing this too much, I simply do not know how we would benefit from it, if you have a good reason I am open for it.

@BenediktBurger
Copy link
Member Author

Coap is a message protocol over a tcp connection, so no routing is necessary.
We, however, have to include routing information.
The Coordinators have to use the routing information to get the messege, wherever it should go.

Therefore, I'd put the routing information in front of the type.
Maybe even in front of the version.
It the version is first, the formatting of the routing information could change potentially (depending on the version.
If the routing is fixed, the version could be combined with different things (message ID...) into one frame.

@bilderbuchi
Copy link
Member

I think version first is a safer choice, in case we want to make changes to the routing info in the future (e.g. allow more levels to address Paramaters directly, like Node1.CA.voltage, or for deeper hierarchies).

Question: Is it even necessary (with the current status) anymore to have separate frames for the sender/recipient, or could we just have one header frame (with a number of bytes containing all the necessary info)?
Or is that inconvenient from a technical/performance point of view? IIRC, we stepped away from mutating the sender/recipient info along the message flow, so the header would be unchanged for the message lifetime (which seems convenient), correct?

I would not serialise the header info at all, just encode info into an array of bytes (easily decoded).

@bklebel
Copy link
Collaborator

bklebel commented Feb 6, 2023

By separating it into different frames, we can avoid additional forbidden symbols, which we would otherwise need to separate the information. Also, at least in python, you can do a very straightforward implementation for the CCoordinator:

msg = zmq.recv_multipart()
version = msg.pop(0)
if version is not my_version:
    ...send an error
type = msg.pop(0)
if type is SIGN_IN: 
    do something myself
else:
    recipient = msg.pop(0)
    ...look up the recipient in the zmq.identity table
    zmq.send_multipart([zmq ID for recipient] + [version, type, recipient] + msg)

Here also my reason to put the type before the routing becomes maybe more apparent, in that we can pick out the SIGN_IN easily. In the example above, the messageID and conversationID are in the rest of the msg, this could be a separate frame, but it is not relevant for the Control Coordinator

@BenediktBurger
Copy link
Member Author

Just an implementation detail: I would not pop parts of the msg, as the Coordinator normally gives the message as is to the recipient (or other Coordinator).

We do not need to pick the SIGN_IN first: The Coordinator looks, whether a message is for itself (Recipient==self) and then it handles the message, be it SIGN_IN or something else.
In most cases, the Coordinator looks in vain at the type.
In fact, it has too look twice at the type, if the message is destined at the Coordinator:

  1. Look at type: It is not sign_in
  2. look at recipient: Oh, it is for me.
  3. Look at type again: What should I do?

With a fixed Recipient name (Coordinator or None or whatever we decide), the Coordinator does not need to look at any other field than the Recipient.
We allowed to leave out Recipien namespace, therefore "Coordinator" is an allowed recipient ID.
We could also allow to leave the Recipient empty for the SIGN_IN message. So:

msg = zmq.recv_multipart()
if msg["version"] != my_version:
    send_error()
if msg["recipient"]:
    do_routing()  # might include to myself and then doing sign_in()
elif msg["type"] == SIGN_IN:
   sign_in()
else:
   send_error()

Note: I used dictionary keys as placeholders for the actual position.

@BenediktBurger
Copy link
Member Author

I would not serialise the header info at all, just encode info into an array of bytes (easily decoded).

👍

IIRC, we stepped away from mutating the sender/recipient info along the message flow, so the header would be unchanged for the message lifetime (which seems convenient), correct?

You're correct.

By separating it into different frames, we can avoid additional forbidden symbols, which we would otherwise need to separate the information.

👍 For that reason, I'm for four header frames: version, recipient, sender, "content header" (type etc.).

@bilderbuchi
Copy link
Member

Thanks! I failed to consider that our addresses are variable-length, which makes separation into frames very attractive.

For that reason, I'm for four header frames: version, recipient, sender, "content header" (type etc.).

LGTM

@bilderbuchi
Copy link
Member

We allowed to leave out Recipien namespace, therefore "Coordinator" is an allowed recipient ID.

Only for the signin, right? In that case I guess we might as well leave it out, completely.

@BenediktBurger
Copy link
Member Author

No, we allowed the Recipient to be only the Component ID, if it is in the same Node.

The Sender ID has always to be complete.

@bilderbuchi
Copy link
Member

Ok, thanks. We could keep in mind that we could revisit that if it makes our logic flow less complex.

@bklebel
Copy link
Collaborator

bklebel commented Feb 7, 2023

For that reason, I'm for four header frames: version, recipient, sender, "content header" (type etc.).

That looks good to me too, I agree with the reasoning about looking at the type twice, and very often unnecessarily looking at the type.

@BenediktBurger
Copy link
Member Author

We reached a conclusion, I close this issue

@bilderbuchi
Copy link
Member

Should the issue not be closed when the conclusion/relevant information turns up in main? I.e. when a PR that records the issue's results is merged?

@bklebel
Copy link
Collaborator

bklebel commented Feb 8, 2023

I think so too - it is a bit cumbersome to have those issues still open even though we reached a conclusion, but as long as it is not done with a merged PR, I would also leave the corresponding issues open. I am going to reopen this now.

@bklebel bklebel reopened this Feb 8, 2023
@BenediktBurger
Copy link
Member Author

it is a bit cumbersome to have those issues still open even though we reached a conclusion

That was my reasoning for those many issues with a single PR, as an exception to the default rule to close with PRs.

So, no exception 😁

@bilderbuchi
Copy link
Member

Well, you can summarize the conclusion/takeaway in a "last" comment. Then people looking at the issue know immediately what the result of a lenghty discussion was.

When it has been added in a PR, you can link to that, too, in a comment.

If you put "closes #blabla" in the description/first post of a Pr, that issue will automatically be closed when the PR is merged.

@bklebel
Copy link
Collaborator

bklebel commented Feb 8, 2023

Good, now I wanted to write a summarizing comment, in which I say that we agreed on everything, and find that I am not sure we did ^^

Here is the summary so far

Summary

We agreed on the following Control protocol header format:

  1. version
  2. recipient
  3. sender
  4. content header

where "content header" includes

not agreed on

the content header:
it could include the message type (e.g. SIGN_IN), as well as the conversation ID and message ID, separated by....a ";"? Or do we want to break that into multiple frames too?

@BenediktBurger
Copy link
Member Author

BenediktBurger commented Feb 8, 2023

For the transport layer, it does not matter (so for my PR it is done).

The format of the header (four frames) is done.

The content of these frames is not yet fully agreed upon (version and content header).

The content header enters the message layer, I'd say. (it has its issue #41)

The version formatting is also open, but can be discussed separately (#42).

@BenediktBurger BenediktBurger removed the discussion-needed A solution still needs to be determined label Feb 8, 2023
@bilderbuchi
Copy link
Member

Yes, let's discuss the content of these frames separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distributed_ops Aspects of a distributed operation, networked or on a node messages Concerns the message format
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants