- Troubleshooting CloudBees Jenkins Enterprise 2.x on GKE
- Enable kubectl to Interact with GKE-Deployed Cluster
- Consult the Knowledge Base
- Instances provisioning
- CloudBees Jenkins Enterprise basic operations
- CloudBees Jenkins Enterprise Cluster Resources
-
Accessing
$JENKINS_HOME
- Operations Center Setup Customization
- Performance Issues - High CPU / Blocked Threads
Troubleshooting CloudBees Jenkins Enterprise 2.x on GKE
Caution
|
This guide is an old version of Troubleshooting CloudBees Jenkins Enterprise 2.x on GKE, and is superseded by Troubleshooting CloudBees Core on GKE. Please refer to Troubleshooting CloudBees Core on GKE for updated content. |
If your laptop is already set up to interact with the GKE Kubernetes cluster, you may go directly to the general troubleshooting section. Otherwise, follow the next few sections to set up your laptop.
Enable kubectl to Interact with GKE-Deployed Cluster
If the Kubernetes cluster was created using the GKE UI or the CLI gcloud, you will need to perform the following commands on your laptop so that you can interact with that cluster using kubectl.
Authenticate Google Account with gcloud
Work with the GKE admin to grant your Google account with access to the following Google API’s:
-
Kubernetes Engine Admin
-
Kubernetes Engine Cluster Admin
Use the following command to authenticate your laptop. By default, the command will launch a
browser to facilitate the Google login process. You may also use the --no-launch-browser
option
to accomplish the same if you are using a headless system.
> gcloud auth application-default login [--no-launch-browser]
Retrieve Name of Container
Work with your GKE admin to get the name of the Kubernetes cluster. Or you can perform the following command to list all cluster names and choose the correct one to use.
> gcloud container clusters list
Here is an example of the output:
NAME LOCATION MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
test20180410c us-central1-a 1.8.8-gke.0 35.194.43.68 custom-2-5120 1.8.8-gke.0 2 RUNNING
team3-cluster us-east1-b 1.9.4-gke.1 35.196.148.64 n1-standard-2 1.9.4-gke.1 3 RUNNING
team2-cluster us-east4-b 1.9.6-gke.0 35.199.21.233 custom-4-16384 1.9.6-gke.0 2 RUNNING
team1-cluster us-west1-a 1.9.6-gke.0 104.199.113.196 custom-2-8192 1.9.6-gke.0 1 RUNNING
Make a note of the LOCATION of the cluster. In day-to-day interaction, it will be most convenient if you specify a default zone so that you do not have to specify it for every gcloud command against the cluster. Here is an example of setting the zone for the team1-cluster, i.e., "us-west1-a".
gcloud config set compute/zone us-west1-a
Set Laptop Environment with Cluster Info
In this section, the cluster "team1-cluster" will be used in the examples.
To attach the gcloud environment to the cluster:
> gcloud config set container/cluster team1-cluster
To pass the cluster’s credentials to kubectl:
> gcloud container clusters get-credentials team1-cluster
Ensure that kubectl can use Application Default Credentials to authenticate to the cluster:
> gcloud auth application-default login [--no-launch-browser]
Provide admin capability to your Google account:
> kubectl create clusterrolebinding cluster-admin-binding --clusterrole cluster-admin --user $(gcloud config get-value account)
At this point, your should have an working kubectl environment. You can confirm that with the following command:
> kubectl config get-contexts
If this is the first time that you have done these instructions, there should be one active kubectl context, e.g.,
CURRENT NAME CLUSTER AUTHINFO NAMESPACE
* gke_account-name-west1-a_team1-cluster gke_account-name-west1-a_team1-cluster gke_account-name-west1-a_team1-cluster
Note that there is not a default namespace in the above example. Your CloudBees Jenkins Enterprise installation will likely be installed in a namespace, e.g., "cje". It is most convenient to set the default namespace so that you do not have to specify it every time that you perform a kubectl command against the CloudBees Jenkins Enterprise cluster.
kubectl config set-context $(kubectl config current-context) --namespace="cje"
kubectl is now ready to interact with the Kubernetes cluster.
There are a number of resources that you can use to troubleshoot a CloudBees Jenkins Enterprise failure.
In this section we will cover each of these approaches.
Consult the Knowledge Base
The Knowledge Base can be very helpful in troubleshooting problems with CloudBees Jenkins Enterprise and can be accessed on the CloudBees Support site.
Instances provisioning
Operations Center Provisioning
-
Check pod status
-
All associated objects are already created: pod, svc, statefulset (1 - 1), ingress, pvc and pv
-
Check the events related with the pod and associated objects: see table
-
Jenkins logs
CloudBees Jenkins Enterprise basic operations
Viewing Cluster Resources
# Gives you quick readable detail
$ kubectl get -a pod,statefulset,svc,ingress,pvc,pv -o wide
# Gives you high level of detail
$ kubectl get -a pod,statefulset,svc,ingress,pvc,pv -o yaml
# Describe commands with verbose output
$ kubectl describe <TYPE> <NAME>
Pod Access
# Access to the bash
$ kubectl exec <POD_NAME> -i -t -- bash -li
master2-0:/$ ps -ef
PID USER TIME COMMAND
1 jenkins 0:00 /sbin/tini -- /usr/local/bin/launch.sh
5 jenkins 1:53 java -Dhudson.slaves.NodeProvisioner.initialDelay=0 -Duser.home=/var/jenkins_home -Xmx1433m -Xms1433m -Djenkins.model.Jenkins.slaveAgentPortEnforce=true -Djenkins.model.Jenkins.slav
481 jenkins 0:00 bash -li
485 jenkins 0:00 ps -ef
# Bash execution command
$ kubectl exec <POD_NAME> -- ps -ef
PID USER TIME COMMAND
1 jenkins 0:00 /sbin/tini -- /usr/local/bin/launch.sh
5 jenkins 2:05 java -Dhudson.slaves.NodeProvisioner.initialDelay=0 -Duser.home=/var/jenkins_home -Xmx1433m -Xms1433m -Djenkins.model.Jenkins.slaveAgentPortEnforce=true -Djenkins.model.Jenkins.slaveAgentPort=50000 -DMASTER_GRANT_ID=270bd80c-3e5c-498c-88fe-35ac9e11f3d3 -Dcb.IMProp.warProfiles.cje=kubernetes.json -DMASTER_INDEX=1 -Dcb.IMProp.warProfiles=kubernetes.json -DMASTER_OPERATIONSCENTER_ENDPOINT=http://cjoc/cjoc -DMASTER_NAME=master2 -DMASTER_ENDPOINT=http://cje.support-cje2.beescloud.k8s.local/master2/ -jar -Dcb.distributable.name=Docker Common CJE -Dcb.distributable.commit_sha=888f01a54c12cfae5c66ec27fd4f2a7346097997 /usr/share/jenkins/jenkins.war --webroot=/tmp/jenkins/war --pluginroot=/tmp/jenkins/plugins --prefix=/master2/
645 jenkins 0:00 ps -ef
Pod Scale Down/Up
$ kubectl scale statefulset/master2 --replicas=0
statefulset "master2" scaled
$ kubectl get -a statefulset -o wide
NAME DESIRED CURRENT AGE CONTAINERS IMAGES
cjoc 1 1 1d jenkins cloudbees/cje-oc:2.121.3.1
master1 1 1 2h jenkins cloudbees/cje-mm:2.121.3.1
master2 0 0 36m jenkins cloudbees/cje-mm:2.121.3.1
CloudBees Jenkins Enterprise Cluster Resources
In the installation phase of CloudBees Jenkins Enterprise the following service accounts, roles and roles binding are created.
$ kubectl get sa,role,rolebinding
NAME SECRETS AGE
sa/cjoc 1 21h
sa/default 1 21h
sa/jenkins 1 21h
NAME AGE
roles/master-management 21h
roles/pods-all 21h
NAME AGE
rolebindings/cjoc 21h
rolebindings/jenkins 21h
Once the installation is done and the CloudBees Jenkins Enterprise cluster is already up and running, then we can easily check the status of the most important CloudBees Jenkins Enterprise resources: pod
,statefulset
,svc
,ingress
,pvc
and pv
.
$ kubectl get pod,statefulset,svc,ingress,pvc,pv
NAME READY STATUS RESTARTS AGE
po/cjoc-0 1/1 Running 0 21h
po/master1-0 1/1 Running 0 14h
NAME DESIRED CURRENT AGE
statefulsets/cjoc 1 1 21h
statefulsets/master1 1 1 14h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/cjoc ClusterIP 100.66.207.191 <none> 80/TCP,50000/TCP 21h
svc/master1 ClusterIP 100.67.1.49 <none> 80/TCP,50000/TCP 14h
NAME HOSTS ADDRESS PORTS AGE
ing/cjoc cje.support-cje2.beescloud.k8s.local af9463f6a2b68... 80 21h
ing/default cje.support-cje2.beescloud.k8s.local af9463f6a2b68... 80 21h
ing/master1 cje.support-cje2.beescloud.k8s.local af9463f6a2b68... 80 14h
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc/jenkins-home-cjoc-0 Bound pvc-c5cad012-2b69-11e8-80fc-12582571ed5c 20Gi RWO gp2 21h
pvc/jenkins-home-master1-0 Bound pvc-e4b5e473-2ba2-11e8-80fc-12582571ed5c 50Gi RWO gp2 14h
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv/pvc-c5cad012-2b69-11e8-80fc-12582571ed5c 20Gi RWO Delete Bound cje-on-support-cje2/jenkins-home-cjoc-0 gp2 21h
pv/pvc-e4b5e473-2ba2-11e8-80fc-12582571ed5c 50Gi RWO Delete Bound cje-on-support-cje2/jenkins-home-master1-0 gp2 14h
In the following sections the expected results of different Kubernetes resources are defined. The definition of each Kubernetes resource was taken from Kubernetes official documentation.
Pods
A pod is the smallest and simplest Kubernetes object, which represents a set of running containers on your cluster. A Pod is typically set up to run a single primary container, although a pod can also run optional sidecar containers that add supplementary features like logging. Pods are commonly managed by a Deployment.
The get pod
will provide you current applications running in the cluster. Applications which are currently stopped or not deployed will not appear as a pod
of the cluster.
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
po/cjoc-0 1/1 Running 0 21h
po/master1-0 1/1 Running 0 14h
Pods Events
Pod events provide you insights about why a specific pod is failing to start in the cluster. In other words, pod events will tell you the reason why a specific application cannot start or be deployed in the cluster.
The table below summarize the most common pods event which might happen in CloudBees Jenkins Enterprise.
To get the list of events associated with a given pod you will need to run:
$ kubectl describe pod the_pod_name
For example:
$ kubectl describe pod cjoc-0
Status | Events | Cause |
---|---|---|
|
The image you are using cannot be found in the Docker registry, or when using a private registry there is no secret configured |
|
Node issues |
See below. Get node info with |
|
|
|
Not enough memory, either increase the nodes or node size in the cluster or reduce the memory requirement of Operations Center (yaml file) or Master (under configuration) |
|
|
Not enough CPUs, either increase the nodes or node size in the cluster or reduce the CPU requirement of Operations Center (yaml file) or Master (under configuration) |
|
|
There are no nodes available in the zone where the persistent volume was created, start more nodes in that zone |
|
|
Find out why the Docker container crashes. The easiest and first check should be if there are any errors in the output of the previous startup, e.g.: |
|
|
The |
|
This usually indicates a bad node, if there are several pods in that node in the same state. Check with `kubectl get pods --all-namespaces -o wide |
StatefulSet
A StatefulSet manages the deployment and scaling of a set of Pods and provides guarantees about the ordering and uniqueness of these Pods.
Like a Deployment, a StatefulSet manages Pods that are based on an identical container spec. Unlike a Deployment, a StatefulSet maintains a sticky identity for each of their Pods. These pods are created from the same spec, but are not interchangeable: each has a persistent identifier that it maintains across any rescheduling.
A StatefulSet operates under the same pattern as any other Controller. You define your desired state in a StatefulSet object and the StatefulSet controller makes any necessary updates to get there from the current state.
$ kubectl get statefulset
NAME DESIRED CURRENT AGE
statefulsets/cjoc 1 1 21h
statefulsets/master1 1 1 14h
In CloudBees Jenkins Enterprise, the expected DESIRED
and CURRENT
status of any application should be 1
. Not Jenkins, neither build agents supports more than one instance running at the same time.
Service
A service is the API object that describes how to access applications (such as a set of Pods) and can describe ports and load-balancers.
The access point can be internal or external to the cluster.
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/cjoc ClusterIP 100.66.207.191 <none> 80/TCP,50000/TCP 21h
svc/master1 ClusterIP 100.67.1.49 <none> 80/TCP,50000/TCP 14h
A service must exist for each application running in the cluster. Otherwise, the service will not be accessible.
Ingress
Ingresses represent the routes to access the applications, where an ingress could be thought of as a Load Balancer.
$ kubectl get ingress
NAME HOSTS ADDRESS PORTS AGE
ing/cjoc cje.support-cje2.beescloud.k8s.local af9463f6a2b68... 80 21h
ing/default cje.support-cje2.beescloud.k8s.local af9463f6a2b68... 80 21h
ing/master1 cje.support-cje2.beescloud.k8s.local af9463f6a2b68... 80 14h
The required ingresses for CloudBees Jenkins Enterprise to work are:
-
A ing/default as the default entry point to the cluster
-
A ing/cjoc ingress for the access to the Operations Center
-
A ing/<MASTER_ID> ingress for the access to each master
Important
|
The product expects these ingresses to be present and so they must not be modified - even to reduce the complexity of scope. Modifying ingresses at the Kubernetes level might produce issues in the product, such as Managed Masters becoming unable to communicate correctly with the Operations Center. |
Persistent Volume Claims (PVC)
Persistent volume claims (PVCs) represent the volumes associated which each application running in the cluster.
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvc/jenkins-home-cjoc-0 Bound pvc-c5cad012-2b69-11e8-80fc-12582571ed5c 20Gi RWO gp2 21h
pvc/jenkins-home-master1-0 Bound pvc-e4b5e473-2ba2-11e8-80fc-12582571ed5c 50Gi RWO gp2 14h
PVCs events
The table below summarize the most common pods event associated with PVCs that might occur in CloudBees Jenkins Enterprise.
To obtain the list of events associated with a given pod, run:
$ kubectl describe pvc the_pvc_name
For example:
$ kubectl describe pvc jenkins-home-cjoc-0
Status | Events | Cause |
---|---|---|
|
|
There is no default storageclass - follow these instructions to set a default storageclass |
Persistent Volume (PV)
The persistent volume represents the volumes created in the cluster.
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pv/pvc-c5cad012-2b69-11e8-80fc-12582571ed5c 20Gi RWO Delete Bound cje-on-support-cje2/jenkins-home-cjoc-0 gp2 21h
pv/pvc-e4b5e473-2ba2-11e8-80fc-12582571ed5c 50Gi RWO Delete Bound cje-on-support-cje2/jenkins-home-master1-0 gp2 14h
Accessing $JENKINS_HOME
Accessing Jenkins Home Directory (Pod Running)
By running the following sequence of commands, you can ascertain the path of the $JENKINS_HOME
inside a given pod and a specific CloudBees Jenkins Enterprise instance.
# Get the location of the $JENKINS_HOME
$ kubectl describe pod master2-0 | grep " jenkins-home " | awk '{print $1}'
/var/jenkins_home
# Access the bash of a given pod
$ kubectl exec master2-0 -i -t -- bash -i -l
master2-0:/$ cd /var/jenkins_home/
master2-0:~$ ps -ef
PID USER TIME COMMAND
1 jenkins 0:00 /sbin/tini -- /usr/local/bin/launch.sh
5 jenkins 1:46 java -Dhudson.slaves.NodeProvisioner.initialDelay=0 -Duser.home=/var/jenkins_home -Xmx1433m -Xms1433m -Djenkins.model.Jenkins.slaveAgentPortEnforce=true -Djenkins.model.Jenkins.slav
516 jenkins 0:00 bash -i -l
524 jenkins 0:00 ps -ef
master2-0:~$ ps -ef | grep java
5 jenkins 1:46 java -Dhudson.slaves.NodeProvisioner.initialDelay=0 -Duser.home=/var/jenkins_home -Xmx1433m -Xms1433m -Djenkins.model.Jenkins.slaveAgentPortEnforce=true -Djenkins.model.Jenkins.slaveAgentPort=50000 -DMASTER_GRANT_ID=270bd80c-3e5c-498c-88fe-35ac9e11f3d3 -Dcb.IMProp.warProfiles.cje=kubernetes.json -DMASTER_INDEX=1 -Dcb.IMProp.warProfiles=kubernetes.json -DMASTER_OPERATIONSCENTER_ENDPOINT=http://cjoc/cjoc -DMASTER_NAME=master2 -DMASTER_ENDPOINT=http://cje.support-cje2.beescloud.k8s.local/master2/ -jar -Dcb.distributable.name=Docker Common CJE -Dcb.distributable.commit_sha=888f01a54c12cfae5c66ec27fd4f2a7346097997 /usr/share/jenkins/jenkins.war --webroot=/tmp/jenkins/war --pluginroot=/tmp/jenkins/plugins --prefix=/master2/
528 jenkins 0:00 grep java
# Operations to be done. This is an example
$ kubectl cp master2-0:/var/jenkins_home/jobs/ ./jobs/
tar: removing leading '/' from member names
Accessing Jenkins Home Directory (Pod Not Running)
# Stop a pod
$ kubectl scale statefulset/master2 --replicas=0
statefulset "master2" scaled
# Create a new rescue-pod running something with any effect
# in the $JENKINS_HOME
$ cat <<EOF | kubectl create -f -
kind: Pod
apiVersion: v1
metadata:
name: rescue-pod
spec:
volumes:
- name: rescue-storage
persistentVolumeClaim:
claimName: jenkins-home-master2-0
containers:
- name: rescue-container
image: nginx
volumeMounts:
- mountPath: "/tmp/jenkins-home"
name: rescue-storage
EOF
# Access to the bash of the rescue-pod
$ kubectl exec rescue-pod -i -t -- bash -i -l
mesg: ttyname failed: Success
root@rescue-pod:/# cd /tmp/jenkins-home/
root@rescue-pod:/tmp/jenkins-home#
# Operations to be done. This is an example
$ kubectl cp rescue-pod:/tmp/jenkins_home/jobs/ ./jobs/
tar: removing leading '/' from member names
# Delete the rescue pod
$ kubectl delete pod rescue-pod
pod "rescue-pod" deleted
# Start the pod
$ kubectl scale statefulset/master2 --replicas=1
statefulset "master2" scaled
Operations Center Setup Customization
The Operations Center instance could be configured by either editing {YAML-CONFIG-FILE}
or using the Kubernetes command line.
# Set the memory to 2G
$ kubectl patch statefulset cjoc -p '{"spec":{"template":{"spec":{"containers":[{"name":"jenkins","resources":{"limits":{"memory": "2G"}}}]}}}}'
statefulset "cjoc" patched
# Set initialDelay to 320 seconds
$ kubectl patch statefulset cjoc -p '{"spec":{"template":{"spec":{"containers":[{"name":"jenkins","livenessProbe":{"initialDelaySeconds":"320"}}]}}}}'
statefulset "cjoc" patched
# Set timeout to 10 seconds
$ kubectl patch statefulset cjoc -p '{"spec":{"template":{"spec":{"containers":[{"name":"jenkins","livenessProbe":{"timeoutSeconds":"10"}}]}}}}'
statefulset "cjoc" patched
Performance Issues - High CPU / Blocked Threads
# export cje2 cluster information
$ kubectl get pod,svc,endpoints,statefulset,ingress,pvc,pv,sa,role,rolebinding -o yaml > to-el-cluster.yml
# jenkinshangWithJstack
$ kubectl cp ~/Downloads/jenkinshangWithJstack.sh master1-0:/tmp/
$ kubectl exec master1-0 -- jps
5 jenkins.war
8807 Jps
$ kubectl exec master1-0 -- chmod u+x /tmp/jenkinshangWithJstack.sh
# currently I cannot make it work without login into the pod/container
$ kubectl exec master1-0 -it -- bash -il
master1-0:/$ /tmp/jenkinshangWithJstack.sh 5 60 5
$ kubectl cp master1-0:/tmp/jenkinshangWithJstack.5.output.tar ./
Online version published by CloudBees, Inc.
Oracle and Java are registered trademarks of Oracle and/or its affiliates.
Jenkins is a registered trademark of the non-profit Software in the Public Interest organization. Used with permission. See here for more info about the Jenkins project.
The registered trademark Jenkins® is used pursuant to a sublicense from the Jenkins project and Software in the Public Interest, Inc. Read more at www.cloudbees.com/jenkins/about.
Apache, Apache Ant, Apache Maven, Ant and Maven are trademarks of The Apache Software Foundation. Used with permission. No endorsement by The Apache Software Foundation is implied by the use of these marks.
Other names may be trademarks of their respective owners. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and CloudBees was aware of a trademark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein.