Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS Certificate Verification Error During Liqo Peering Setup #2392

Open
jonathon2nd opened this issue Mar 14, 2024 · 4 comments
Open

TLS Certificate Verification Error During Liqo Peering Setup #2392

jonathon2nd opened this issue Mar 14, 2024 · 4 comments
Labels
kind/bug Something isn't working

Comments

@jonathon2nd
Copy link

Hello :D
It has been a while 👋
I reviewed doc and updates, but I have missed something basic please let me know. I feel like I have, but my searches have not turned up anything.

What happened:

During the setup of Liqo peering using liqoctl, I encountered a TLS certificate verification error. The specific error message was: ERRO Failed peering clusters: Error from server (InternalError): Internal error occurred: failed calling webhook 'fc.mutate.liqo.io': failed to call webhook: Post 'https://liqo-controller-manager.liqo.svc:9443/mutate/foreign-cluster?timeout=10s': tls: failed to verify certificate: x509: certificate signed by unknown authority.

What you expected to happen:

I expected the Liqo peering process to complete successfully without any TLS certificate errors.

How to reproduce it (as minimally and precisely as possible):

Set up two Liqo clusters.
Run liqoctl generate peer-command on the first cluster to generate a peering command.
Execute the generated peering command on the second cluster using liqoctl peer out-of-band.
Observe the TLS certificate verification error.

Anything else we need to know?:

One cluster is on prem:
k8s v1.27.11+rke2r1
rocky 9.3 vms
Calico v3.27.0

The remote cluster is OVH:
k8s v1.28.3
Canal
registry.kubernatine.ovh/public/flannel:v0.21.3
registry.kubernatine.ovh/public/calico-node:v3.26.1-amd64

Both have liqo installed with helm via argo-cd, with values generated and modified by liqoctl install k3s -n cluster1 --only-output-values

Environment:

  • Liqo version:
  • Liqoctl version: v0.10.1
  • Kubernetes version (use kubectl version): See above
  • Cloud provider or hardware configuration: See above
  • Node image: See above
  • Network plugin and version: See above
  • Install tools:
  • Others:

On prem cluster

jonathon@jonathon-framework:~$ liqoctl status
┌─ Namespace existence check ──────────────────────────────────────────────────────┐
|  INFO  ✔ liqo control plane namespace liqo exists                                |
└──────────────────────────────────────────────────────────────────────────────────┘

┌─ Control plane check ────────────────────────────────────────────────────────────┐
|  Deployment                                                                      |
|      liqo-metric-agent:       Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-auth:               Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-proxy:              Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-network-manager:    Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-gateway:            Desired: 2, Ready: 2/2, Available: 2/2             |
|      liqo-controller-manager: Desired: 2, Ready: 2/2, Available: 2/2             |
|      liqo-crd-replicator:     Desired: 1, Ready: 1/1, Available: 1/1             |
|  DaemonSet                                                                       |
|      liqo-route:              Desired: 4, Ready: 4/4, Available: 4/4             |
└──────────────────────────────────────────────────────────────────────────────────┘

┌─ Local cluster information ──────────────────────────────────────────────────────┐
|  Cluster identity                                                                |
|      Cluster ID:   1c8d5e90-f076-4916-88f5-b1b6dc356caf                          |
|      Cluster name: autumn-waterfall                                              |
|      Cluster labels                                                              |
|          liqo.io/provider: k3s                                                   |
|  Network                                                                         |
|      Pod CIDR:         10.42.0.0/16                                              |
|      Service CIDR:     10.43.0.0/16                                              |
|      External CIDR:    10.2.0.0/16                                               |
|      Reserved Subnets                                                            |
|      • 10.1.0.0/24                                                               |
|      • 10.1.1.0/24                                                               |
|      • 10.1.3.0/28                                                               |
|      • 10.1.8.0/22                                                               |
|      • 10.0.1.0/24                                                               |
|      • 10.0.2.0/24                                                               |
|  Endpoints                                                                       |
|      Network gateway:       udp://ip:5871                                |
|      Authentication:        https://ip                                   |
|      Kubernetes API server: https://ip:6443                          |
└──────────────────────────────────────────────────────────────────────────────────┘
jonathon@jonathon-framework:~$ liqoctl peer out-of-band green-surf --auth-url https://ip:31691 --cluster-id 5fc4e25e-937e-4df2-9521-a66c1c402831 --auth-token EEE
 ERRO  Failed peering clusters: Error from server (InternalError): Internal error occurred: failed calling webhook "fc.mutate.liqo.io": failed to call webhook: Post "https://liqo-controller-manager.liqo.svc:9443/mutate/foreign-cluster?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority

Remote cluster

jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl generate peer-command
 INFO  Peering information correctly retrieved                                                                                                                                                                                                                                            

Execute this command on a *different* cluster to enable an outgoing peering with the current cluster:

liqoctl peer out-of-band green-surf --auth-url https://ip:31691 --cluster-id 5fc4e25e-937e-4df2-9521-a66c1c402831 --auth-token EEE
jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl status
┌─ Namespace existence check ──────────────────────────────────────────────────────┐
|  INFO  ✔ liqo control plane namespace liqo exists                                |
└──────────────────────────────────────────────────────────────────────────────────┘

┌─ Control plane check ────────────────────────────────────────────────────────────┐
|  Deployment                                                                      |
|      liqo-metric-agent:       Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-auth:               Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-proxy:              Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-network-manager:    Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-gateway:            Desired: 2, Ready: 2/2, Available: 2/2             |
|      liqo-controller-manager: Desired: 2, Ready: 2/2, Available: 2/2             |
|      liqo-crd-replicator:     Desired: 1, Ready: 1/1, Available: 1/1             |
|  DaemonSet                                                                       |
|      liqo-route:              Desired: 3, Ready: 3/3, Available: 3/3             |
└──────────────────────────────────────────────────────────────────────────────────┘

┌─ Local cluster information ──────────────────────────────────────────────────────┐
|  Cluster identity                                                                |
|      Cluster ID:   5fc4e25e-937e-4df2-9521-a66c1c402831                          |
|      Cluster name: green-surf                                                    |
|      Cluster labels                                                              |
|          liqo.io/provider: k3s                                                   |
|  Network                                                                         |
|      Pod CIDR:         10.2.0.0/16                                               |
|      Service CIDR:     10.3.0.0/16                                               |
|      External CIDR:    10.4.0.0/16                                               |
|      Reserved Subnets                                                            |
|      • 10.1.0.0/24                                                               |
|      • 10.1.1.0/24                                                               |
|      • 10.1.3.0/28                                                               |
|      • 10.1.8.0/22                                                               |
|      • 10.0.1.0/24                                                               |
|      • 10.0.2.0/24                                                               |
|  Endpoints                                                                       |
|      Network gateway:       udp://ip:30109                               |
|      Authentication:        https://ip:31691                         |
|      Kubernetes API server: https://remote-ip                   |
└──────────────────────────────────────────────────────────────────────────────────┘
@jonathon2nd jonathon2nd added the kind/bug Something isn't working label Mar 14, 2024
@jonathon2nd
Copy link
Author

Dumping information here from slack thread, as that will not be readable later.

So, before, I had to add the following to cluster.yaml for rke

  kube-controller: 
    extra_args: 
      cluster-signing-cert-file: "/etc/kubernetes/ssl/kube-ca.pem"
      cluster-signing-key-file: "/etc/kubernetes/ssl/kube-ca-key.pem"   

Now I am on RKE2, and things are different. I did not find any reference in doc, and only one in repo is this issue.
Which of these certs do you think would work best? The cert setup seems completely different.

[root@ovbh-vtest-k8s03-master02 user]# ls /var/lib/rancher/rke2/server/tls
client-admin.crt       client-ca.crt          client-controller.key      client-kube-proxy.crt             client-rke2-controller.crt  client-supervisor.crt  kube-controller-manager  server-ca.crt          service.key                 temporary-certs
client-admin.key       client-ca.key          client-kube-apiserver.crt  client-kube-proxy.key             client-rke2-controller.key  client-supervisor.key  kube-scheduler           server-ca.key          serving-kube-apiserver.crt
client-auth-proxy.crt  client-ca.nochain.crt  client-kube-apiserver.key  client-rke2-cloud-controller.crt  client-scheduler.crt        dynamic-cert.json      request-header-ca.crt    server-ca.nochain.crt  serving-kube-apiserver.key
client-auth-proxy.key  client-controller.crt  client-kubelet.key         client-rke2-cloud-controller.key  client-scheduler.key        etcd                   request-header-ca.key    service.current.key    serving-kubelet.key

eh, adding the following

    kube-controller:
      extra_args:
        cluster-signing-cert-file: "/var/lib/rancher/rke2/server/tls/server-ca.crt"
        cluster-signing-key-file: "/var/lib/rancher/rke2/server/tls/server-ca.key"

results in this error
E0401 22:51:00.828155 1 run.go:74] "command failed" err="cannot specify --cluster-signing-{cert,key}-file and other --cluster-signing-*-file flags at the same time"
kube-controller-manager logs without modification.

I0401 22:53:15.812803       1 controllermanager.go:187] "Starting" version="v1.27.11+rke2r1"
2024-04-01T15:53:15.813124034-07:00 I0401 22:53:15.812929       1 controllermanager.go:189] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
2024-04-01T15:53:15.817730506-07:00 I0401 22:53:15.817595       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
2024-04-01T15:53:15.817746036-07:00 I0401 22:53:15.817673       1 shared_informer.go:311] Waiting for caches to sync for RequestHeaderAuthRequestController
2024-04-01T15:53:15.817777777-07:00 I0401 22:53:15.817667       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
I0401 22:53:15.817719       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
I0401 22:53:15.817754       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0401 22:53:15.817758       1 shared_informer.go:311] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
2024-04-01T15:53:15.818800235-07:00 I0401 22:53:15.818752       1 secure_serving.go:213] Serving securely on 127.0.0.1:10257
2024-04-01T15:53:15.819215433-07:00 I0401 22:53:15.819116       1 leaderelection.go:245] attempting to acquire leader lease kube-system/kube-controller-manager...
2024-04-01T15:53:15.819508638-07:00 I0401 22:53:15.819458       1 dynamic_serving_content.go:132] "Starting controller" name="serving-cert::/var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.crt::/var/lib/rancher/rke2/server/tls/kube-controller-manager/kube-controller-manager.key"
I0401 22:53:15.819738       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
2024-04-01T15:53:15.918025203-07:00 I0401 22:53:15.917877       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
2024-04-01T15:53:15.918245270-07:00 I0401 22:53:15.917906       1 shared_informer.go:318] Caches are synced for RequestHeaderAuthRequestController
2024-04-01T15:53:15.921282637-07:00 I0401 22:53:15.920067       1 shared_informer.go:318] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file

@jonathon2nd
Copy link
Author

I am not able to peer using an OVH k8s cluster, Kubernetes Version: v1.28.3, unmodified, either.

jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl version
Client version: v0.10.2
Server version: v0.10.2
jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl status
┌─ Namespace existence check ──────────────────────────────────────────────────────┐
|  INFO  ✔ liqo control plane namespace liqo exists                                |
└──────────────────────────────────────────────────────────────────────────────────┘

┌─ Control plane check ────────────────────────────────────────────────────────────┐
|  Deployment                                                                      |
|      liqo-controller-manager: Desired: 2, Ready: 2/2, Available: 2/2             |
|      liqo-crd-replicator:     Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-metric-agent:       Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-auth:               Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-proxy:              Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-network-manager:    Desired: 1, Ready: 1/1, Available: 1/1             |
|      liqo-gateway:            Desired: 2, Ready: 2/2, Available: 2/2             |
|  DaemonSet                                                                       |
|      liqo-route:              Desired: 3, Ready: 3/3, Available: 3/3             |
└──────────────────────────────────────────────────────────────────────────────────┘

┌─ Local cluster information ──────────────────────────────────────────────────────┐
|  Cluster identity                                                                |
|      Cluster ID:   7829a681-9dd5-4840-b6e8-f3a1f1a19d53                          |
|      Cluster name: frosty-wave                                                   |
|      Cluster labels                                                              |
|          liqo.io/provider: k3s                                                   |
|  Configuration                                                                   |
|      Version: v0.10.2                                                            |
|  Network                                                                         |
|      Pod CIDR:         10.2.0.0/16                                               |
|      Service CIDR:     10.3.0.0/16                                               |
|      External CIDR:    10.4.0.0/16                                               |
|      Reserved Subnets                                                            |
|      • 10.1.0.0/24                                                               |
|      • 10.1.1.0/24                                                               |
|      • 10.1.3.0/28                                                               |
|      • 10.1.8.0/22                                                               |
|      • 10.0.1.0/24                                                               |
|      • 10.0.2.0/24                                                               |
|  Endpoints                                                                       |
|      Network gateway:       udp://10.1.10.40:32110                               |
|      Authentication:        https://redacted:31343                         |
|      Kubernetes API server: https://redacted.c1.bhs5.k8s.ovh.net                   |
└──────────────────────────────────────────────────────────────────────────────────┘
jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl install k3s -n liqo-test-ovh --only-output-values --pod-cidr 10.2.0.0/16 --service-cidr 10.3.0.0/16 --enable-ha --verbose --api-server-url=https://redacted.c1.bhs5.k8s.ovh.net --reserved-subnets 10.1.0.0/24,10.1.1.0/24,10.1.3.0/28,10.1.8.0/22,10.0.1.0/24,10.0.2.0/24
 INFO  Using chart from "liqo/liqo"                                                                                                                                                                                                                                                       
 INFO  Installer initialized                                                                                                                                                                                                                                                              
 INFO  Cluster name: damp-mountain                                                                                                                                                                                                                                                        
 INFO  Kubernetes API Server: https://redacted.c1.bhs5.k8s.ovh.net                                                                                                                                                                                                                          
 INFO  Pod CIDR: 10.2.0.0/16                                                                                                                                                                                                                                                              
 INFO  Service CIDR: 10.3.0.0/16                                                                                                                                                                                                                                                          
 INFO  Cluster configuration correctly retrieved                                                                                                                                                                                                                                          
 INFO  Installation parameters correctly generated                                                                                                                                                                                                                                        
 INFO  All Set! Chart values written to "./values.yaml"meters for your cluster (0s)                                                                                                                                                                                                       
jonathon@jonathon-framework:~/liqo-test-ovh$ liqoctl peer out-of-band nameless-silence --auth-url https://redacted:31223 --cluster-id f441ba85-1b35-42e3-b58c-0da2303b03f8 --auth-token EEE
 ERRO  Failed peering clusters: Error from server (InternalError): Internal error occurred: failed calling webhook "fc.mutate.liqo.io": failed to call webhook: Post "https://liqo-controller-manager.liqo.svc:9443/mutate/foreign-cluster?timeout=10s": tls: failed to verify certificate: x509: certificate signed by unknown authority

@aleoli
Copy link
Member

aleoli commented Apr 3, 2024

It expects that the CA found in the connections parameters inside the pod (the pod kubeconfig for simplicity, you can find it at something like /var/run/secrets/kubernetes.io/serviceaccount) and the CA signing remote user certificates is the same

We can consider adding the possibility for the user to override this value in the new auth module #2382

@dennispan
Copy link
Contributor

Here's what we do when we peer an RKE2 cluster (from an EKS cluster)

Prerequisites:

  1. Enable Authorized Cluster Endpoint on RKE2.
  2. For apiServer.address (in helm values), use ACE, e.g. https://40.52.12.84:6443. Alternatively, if load balancer is properly set up, use that instead.

Peering:

  1. Run liqoctl generate peer-command on the RKE2 cluster to generate the peering command
  2. On the EKS cluster, run the liqoctl peer out-of-band command. Expect this would fail due to missing CA and/or lack of permission (we haven't had chance to verify what exactly was the cause)
  3. After the liqoctl peer command, a secret (with liqo-identiy- prefixed name) is created under the tenant namespace (e.g. liqo-tenant-test-cluster-c37ac5). We need to update its certificate and CA
  4. Make sure the secret has apiServerCa, apiServerUrl, certificate, namespace and private-key.
    • apiServerCa. Use the CA from /etc/rancher/rke2/rke2.yaml (find it on a control plane node)
    • certificate. Use cert client-kube-apiserver.crt from /var/lib/rancher/rke2/server/tls (find it on a control plane node)
    • private-key. Use key client-kube-apiserver.key from /var/lib/rancher/rke2/server/tls (find it on a control plane node)
    • For the rest, they should be left as is. But doesn't hurt to double check. For example, apiServerUrl should use ACE
  5. Restart liqo controller manager and crd replicator to pick up the changes

We haven't had a chance to verify whether we have to update the certificate and private-key data. Once the peering is there, we didn't bother to touch it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants