Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TIP-2 #2

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
Draft

TIP-2 #2

wants to merge 13 commits into from

Conversation

guiltygyoza
Copy link
Contributor

@guiltygyoza guiltygyoza commented Jun 18, 2024

TODO

  • add author github link
  • get vercel preview work - so we can review the post internally
  • suggested: have abstract section (summary of this TIP) as common sections across TIPs
  • suggested: change context to motivation

Copy link

vercel bot commented Jun 18, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
tips ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 19, 2024 9:04am

---
layout: post
title: tip-2
topic: "TIP-2: CRO snapshot algorithm using threshold logical clock"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"CRO" is never defined within the document nor referenced to an external resource

This TIP proposes a snapshot algorithm for CROs, which uses a asychronous consensus algorithm based on threshold logical clock.

## 2. Motivation
In distributed systems, a global snapshot (or simply, snapshot) is the capturing of the states of all processes and all inflight messages between processes at an instant of time. It is important that taking a global snapshot does not require halting the distributed system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik, snapshots don't capture inflight messages between processes (might be wrong)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chandy & lamport's algorithm does

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

## 2. Motivation
In distributed systems, a global snapshot (or simply, snapshot) is the capturing of the states of all processes and all inflight messages between processes at an instant of time. It is important that taking a global snapshot does not require halting the distributed system.

At any given moment, a running CRO can have multiple replicas progressing at the same time. When the rate of individual replica making progress is higher than the rate at which replicas synchronize among themselves, replicas temporarily diverge. We can assume that a running CRO can always have diverging replicas, or diverging "versions" of itself. A CRO snapshot, then, is defined as the squashing of its multiple versions into a single representative version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"... defined as the squashing of its multiple versions into a single representative version." - I don't agree with this sentence, we are not squashing different versions, we are agreeing on a specific version (consensus)

A CRO snapshot is useful in at least 3 ways:
1. A snapshot can serve as a single "state save" in time for the CRO to be persisted somewhere.
2. A snapshot can be used to determine which part of the CRO history (append-only log) is safe to be prune (compacted) away. State pruning or compaction is needed to solve the unbounded memory problem that plagues append-only logs and CRDTs. To bound memory usage, it is helpful to perform snapshot periodically.
3. A snapshot can be used to generate transactions to be submitted to a blockchain.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this useful?

If it's storing state, it's already included in point 1. If it's to verify integrity (with hashes), we should specify it

We need to devise a snapshot algorithm for CROs.

## 3. System model
Following [paper draft](https://github.com/topology-foundation/paper) v7.5 (and replacing "live objects" with "CROs"), we define the following system model:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts here:

  • Our paper is missing in the references
  • The paper version that should be referenced here is the final version. Improvement proposals should be on top of the basic version of the protocol (which is the 1st version of the paper that was implemented aka final version)
  • We can remove the parentheses

2. In `t=3r+1`, the message payload equals the process's `WitSet` in `t=3r`.
3. In `t=3r+2`, each process simply broadcasts their `WitSet` in `t=3r+1`.

When `t=3r+2` completes, each process yields a list of proposals (denoted as $$\vec{P}$$) for the current round `r`. Each of these proposals ($$P$$) is either "not witnessed" (denoted as $$W^0$$), "singly witnessed" (denoted as $$W^1$$), or "doubly witnessed" (denoted as $$W^2$$).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

notation seems bad for a list of proposals

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P was previously defined as Payload, might confuse the reader

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is singly and doubly witnessed? witnessed by 1 and 2 nodes?
If so, why being witnessed by more nodes wouldn't be the weight for ranking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single witnessed is witnessed in one TLC tick. double is in two ticks. in every tick, witnessed = ACKed by threshold amount

4. Otherwise, $$P_0$$ is decided. In the next round, this process's proposal will be built on top of $$P_0$$.

#### "Compatibility" between subsequent proposals
Vidigueira's notion of compatibility means that for process $$i$$ in round $$r$$, if a proposal $$P_r$$ is decided, then $$i$$'s proposal for $$r+1$$ or $$P_{r+1}$$ must build on top of $$P_r$$. In other words, $$P_{r+1}$$ causally depends on $$P_r$$. This is equivalent to in blockchains a new block is always proposed on top of the main (e.g. longest) chain.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing reference for Vidigueira's work (link)

A lemma is provided in [Vidigueira 2019]:

***Lemma 5.1**
If any node sees a message m as K-certified by round R, all honest nodes will see m as (K-1)-certified by round R+1.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing context for K, should be paraphrased

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


However, what degree of consistency guarantee is provided by this snapshot algorithm requires further study. The basic question of interest to answer here is:

*If an honest node $$i$$ decides proposal $$P$$ (finality) in round $$r$$, is there a lowerbound over $$\delta$$ such that another random honest node $$j$$ decides $$P$$ in round $$r+\delta$$ with probability $$p=1$$? Or, is this finality always probabilistic i.e. $$j$$ decides $$P$$ in round $$r+\delta$$ with $$p=1-\epsilon$$, and $$\epsilon$$ decreases (how fast?) over increasing $$\delta$$?*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduced delta and epsilon out of nowhere. What do they mean?
delta - time interval?
epsilon - error?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and after re-reading multiple times, i don't understand the question

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, delta is additional rounds and epsilon is the difference from probability one. standard notations from calculus (epsilon-delta definition of limit). agreed it can be explained clearer

_posts/TIPs/2024-06-18-tip-2.md Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants