Skip to content

Running in Production

The mc-operator container image is built on Ubuntu Noble Chiseled — an ultra-minimal, distroless-style image with:

  • No shell, no package manager, no curl/wget
  • Minimal attack surface: only the ASP.NET Core runtime and direct dependencies
  • Non-root execution by default (built-in app user, UID 1654)

The Helm chart enforces the security context at the pod level:

securityContext:
runAsNonRoot: true
runAsUser: 1001
fsGroup: 1001

The operator requires cluster-scoped RBAC to watch MinecraftServer resources across namespaces and manage apps/statefulsets. The Helm chart creates a ClusterRole with exactly the permissions needed — nothing more. Review charts/mc-operator/templates/clusterrole.yaml for the exact grants.

The Helm chart automatically generates self-signed TLS certificates for the webhook service and injects the caBundle into the webhook configurations. Certificates are valid for 10 years and persist across helm upgrade operations.

For environments that require certificate rotation or integration with an organizational PKI, you can use cert-manager or another certificate management solution alongside the operator.

The operator supports running multiple replicas with built-in Kubernetes leader election. Leader election is automatically enabled when deploying more than one replica:

values-prod.yaml
replicaCount: 2

Only one replica is the active leader at any time. Passive replicas take over within seconds of leader failure. There is no warm cache to warm up — the operator is stateless between reconciles.

The operator uses a 5-minute requeue on success (drift detection) and a 30-second requeue on error. This means:

  • Spec changes are applied within seconds
  • Transient errors (e.g. Kubernetes API throttling) self-heal quickly
  • Continuous reconciliation catches out-of-band changes to child resources

The operator does not yet export Prometheus metrics directly. You can monitor the operator through:

  • Pod logs: The operator logs reconcile cycles, errors, and phase transitions.
  • Kubernetes events: Check kubectl get events -n mc-operator-system.
  • MinecraftServer status: kubectl get mcs -n minecraft shows Phase and Ready columns.
Terminal window
# Watch all servers across all namespaces
kubectl get minecraftservers -A -w
# Inspect a specific server's status
kubectl describe minecraftserver paper-survival -n minecraft

mc-operator does not implement automated backups in v1. Recommended approaches:

Velero can take VolumeSnapshot-backed PVC backups on a schedule:

Terminal window
velero backup create minecraft-backup \
--include-namespaces minecraft \
--snapshot-volumes=true

For a running server, you can copy world data out while the server is paused:

Terminal window
# Pause the server first
kubectl patch mcs paper-survival -n minecraft \
--type merge -p '{"spec": {"replicas": 0}}'
# Copy world data
kubectl exec -n minecraft data-paper-survival-0 -c minecraft -- \
tar czf - /data > backup-$(date +%Y%m%d).tar.gz
# Resume
kubectl patch mcs paper-survival -n minecraft \
--type merge -p '{"spec": {"replicas": 1}}'
PlayersCPU RequestMemory RequestJVM Heap
1–5250m1.5Gi1G max
5–20500m2.5Gi2G max
20–501–25Gi4G max
50+2–48Gi+6–8G max

These are rough guidelines. Profile your specific server (plugins, world size, chunk loading patterns) for accurate sizing.

Use a high-throughput, low-latency StorageClass for Minecraft world data. World I/O is write-heavy (chunk saving).

By default the operator creates PVCs with ReadWriteOnce access mode. When spec.prePull: true is set, the PVC is created with ReadWriteMany so a short-lived pre-pull Job can mount the data volume simultaneously with the running server pod during version upgrades. If you enable pre-pull, ensure your StorageClass supports ReadWriteMany:

prePull: true
storage:
storageClassName: "premium-rwx" # Cloud-provider SSD class with RWX support
size: "30Gi"

Common StorageClass options with ReadWriteMany support:

  • GKE: Filestore (ReadWriteMany) or standard Persistent Disk in single-node clusters
  • EKS: EFS (with the EFS CSI driver)
  • AKS: Azure Files (azurefile-csi)
  • On-prem: Longhorn, Rook-Ceph (CephFS), NFS-based CSI drivers

Deploy each server environment to its own namespace:

Terminal window
kubectl create namespace minecraft-production
kubectl create namespace minecraft-staging

This provides:

  • Clear resource isolation
  • Namespace-scoped RBAC if needed
  • Easy cost attribution via namespace labels
Terminal window
helm upgrade mc-operator oci://ghcr.io/danihengeveld/charts/mc-operator \
--version <new-version> \
--namespace mc-operator-system

Operator upgrades do not affect running Minecraft servers. The reconciler is non-destructive by default — it updates child resources only when the spec changes.

CRD upgrades may include schema changes. Always apply the new CRD before upgrading the operator:

Terminal window
kubectl apply -f https://raw.githubusercontent.com/danihengeveld/mc-operator/v<version>/manifests/crd/minecraftservers.yaml
helm upgrade mc-operator ...
Terminal window
kubectl patch minecraftserver paper-survival -n minecraft \
--type merge -p '{"spec": {"server": {"version": "1.21.0"}}}'

When spec.prePull: true is set, the operator detects the image change and begins a zero-downtime upgrade sequence:

  1. A short-lived batch/v1 pre-pull Job is created on the server’s current node. It mounts the data PVC and runs the itzg startup scripts with a fake java stub — this downloads the new server jar to the PVC while the old server is still running, then exits 0.
  2. Once the Job completes, the StatefulSet rolling update is applied. Because both the OCI image layers and the server jar are already present, the new pod starts almost immediately.

The server status message shows "Pre-pulling image: <image>" during step 1. You can watch the upgrade:

Terminal window
kubectl get minecraftserver paper-survival -n minecraft -w