Replies: 3 comments 2 replies
-
Can we. Move this to a discussion rather than issue |
Beta Was this translation helpful? Give feedback.
-
So I ran into this issue with multiple replicas within Kubernetes. I'm using a shared storage system (EFS), I have it setup in a cluster. One leader node, many read only replicas. Apache-MQ using a postgres database for persistence (3 replicas in a statefulset + headless service so each pod can be contacted via a cluster DNS) Initally with small deployments 1-2 readonly replicas as statefulsets we had no problems, once I changed the type to deployment as I wanted to improve the rollout time of a readonly (stateless) pod I started to run into issues. We mount the same storage from leader to readonly, however we cannot set the mount for the data_dir to be readonly, each geoserver instance expects to write some small data (logs, cluster-node config), and here in lies our problem. Each pod that starts will attempt to hash a new password to an already existing hash that is shared between pods, causing the leader when it sees the change to send out a request to all workers to reload configration. What I would often see with more than 2/3 pods starting at the same time is a few files going blank. ./data_dir/security/roles/default/roles.xml + roles.xml.orig These files would be wiped out & tomcat would not be able to start causing the pod to fail its health check. This is confirmed from /usr/local/tomcat/logs/localhost.x.log (a more detailed error will describe the problem) My fix (create a custom readonly image that removes some of the wrapper scripts around password handling (only our master instance should be making these changes) Since this change, I can spawn as many pods without a problem. |
Beta Was this translation helpful? Give feedback.
-
I am seeing this on openshift w/ nfs-provisioner and only a single pod that is redeployed. |
Beta Was this translation helpful? Give feedback.
-
What is the bug or the crash?
GeoServer running in a Kubernetes pod encounters startup failures after a pod restart. The primary issue involves ThreadLocal errors and WebappClassLoaderBase illegal state exceptions. This seems to be occurring specifically when the pod, which mounts a persistent volume for the GeoServer data directory, is restarted.
Steps to reproduce the issue
Deploy GeoServer on Kubernetes using the kartoza charts with this value for persistence.
Attach a persistent volume to the GeoServer pod for storing the data directory.
Add workspace, stores and layers to the instance.
Kill the pod
Let k8s automatically restart the GeoServer pod.
Observe the errors in the pod logs.
Versions
GeoServer Version: 2.23.2 (but appears with any version)
Docker Image: docker.io/kartoza/geoserver:2.23.2
GCP Kubernetes Engine standard cluster
Filestore istance as persistent volume
Additional context
The problem arises specifically when restarting the pod. Initial deployment, with no data linked to geoserver instance, doesn't show these errors. The persistent volume seems to be correctly configured, and this setup worked seamlessly before. I found recommendations online suggesting a change in the default data_dir. Consequently, I have mounted my persistent volume to a different location (/opt/persistence/data_dir) and updated the container environment variables accordingly:
GEOSERVER_DATA_DIR: /opt/persistence/data_dir
GEOWEBCACHE_CACHE_DIR: /opt/persistence/data_dir/gwc
This change was expected to resolve the issue, but the startup errors persisted.
Additionally, as a solution to this problem, I am open to recommendations on best practices for maintaining the state of GeoServer when deploying on Kubernetes, ensuring horizontal scalability.
Beta Was this translation helpful? Give feedback.
All reactions