Fail to start a distributed DB using unidirectional out-of-band peering #2386

rmedina97 · 2024-03-13T09:46:36Z

What happened:

I attempted to deploy the Liqo example of a stateful application in my hierarchical architecture, comprising one consumer and two provider clusters. However, only the first POD, db-mariadb-galera-0, successfully starts. The second POD fails to connect with the first and enters a CrashLoopBackOff state. Both PODs were scheduled in the provider clusters.

What you expected to happen:

I expected the PODs to be able to communicate with each other.

How to reproduce it (as minimally and precisely as possible):

Create 3 clusters using k3s (with different POD and service CIDR), peer them with Liqo as 1 consumer and 2 providers, and install the example Helm chart.

Anything else we need to know?:

I found a working solution: the entire DB is able to start only if there is a working POD in the consumer cluster. Otherwise, only the first POD starts. Additionally, bidirectional peering between every cluster resolves the issue, but my preference is to adhere to the hierarchical structure.
I first noticed this problem using the Percona XtraDB operator (another distributed DB application)with three PODs. In the event that the POD in the consumer cluster is deleted and scheduled to another provider cluster, this POD will again be in CrashLoopBackOff, but the other running PODs will continue to work as normal

Environment:

Liqo version: v0.10.1
Liqoctl version: v0.10.1
Kubernetes version (use kubectl version): k3s v1.24.17+k3s1
Cloud provider or hardware configuration:
Node image: Linux Ubuntu Server 20.04
Network plugin and version:
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

aleoli · 2024-03-13T15:41:06Z

Hi @RiccardoStud! For better reproducibility, how do you install the MariaDB-galera cluster? Do you use an operator or chart? If yes, please indicate which

rmedina97 · 2024-03-14T10:01:08Z

Hi @aleoli! I used the Helm chart from the Liqo guide, running the command:
helm install db bitnami/mariadb-galera -n liqo-demo -f manifests/values.yaml.
I only changed the namespace name to match mine. For additional context, when I develop the chart using only two of my clusters (one provider and one consumer), it functions normally.

fra98 · 2024-03-25T14:33:47Z

Hi @rmedina97.
I reproduced your deployment and can confirm it is not working with this specific topology.
This is because in Liqo by design pods on different leaf clusters can't communicate directly with original IPs, but they are remapped on the external CIDR of the originating cluster. The deployment could still work in some cases:

pods in leaf clusters connect to a pod in the originating cluster (if the DB does not require a full mesh between all replicas)
leaf clusters are peered directly (although you do not need a bidirectional peer, just a peering in any direction). This works because leaf clusters now know each other podCIDRs and do not need to use the ExternalCIDR for IP mappings.

Please note that a new redesigned network will be merged soon and we will test again distributed DB scenarios.

rmedina97 · 2024-03-29T07:34:31Z

Thanks for the comprehensive answer, I will adopt one of the suggested solutions for now

rmedina97 added the kind/bug Something isn't working label Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to start a distributed DB using unidirectional out-of-band peering #2386

Fail to start a distributed DB using unidirectional out-of-band peering #2386

rmedina97 commented Mar 13, 2024

aleoli commented Mar 13, 2024

rmedina97 commented Mar 14, 2024

fra98 commented Mar 25, 2024

rmedina97 commented Mar 29, 2024

Fail to start a distributed DB using unidirectional out-of-band peering #2386

Fail to start a distributed DB using unidirectional out-of-band peering #2386

Comments

rmedina97 commented Mar 13, 2024

What happened:

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

Environment:

aleoli commented Mar 13, 2024

rmedina97 commented Mar 14, 2024

fra98 commented Mar 25, 2024

rmedina97 commented Mar 29, 2024