
Playing in the lab the today I came across a scenario in which a pks create-cluster failed;
1
2
3
4
5
6
7
8
9
10
11
12
|
pks cluster team
Name: team
Plan Name: small
UUID: 3ef60125-5f93-4b98-b50c-7050b6877fdc
Last Action: CREATE
Last Action State: failed
Last Action Description: Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3ef60125-5f93-4b98-b50c-7050b6877fdc, broker-request-id: dc11b0b4-2a76-4b01-ad42-fe40507b70e6, task-id: 561, operation: create
Kubernetes Master Host: team.pks.lab01.pcf.pw
Kubernetes Master Port: 8443
Worker Instances: 3
Kubernetes Master IP(s): In Progress
|
First of all I looked at why the cluster create task failed (task 561 from the error message) using bosh task 561. That highlighted a typo I had with regards to access keys. Now that I know what the issue is I want to delete the cluster;
1
2
3
|
pks delete-cluster team
Deletion of team in progress
|
1
2
3
4
5
6
7
8
9
10
11
12
|
pks cluster team
Name: team
Plan Name: small
UUID: 3ef60125-5f93-4b98-b50c-7050b6877fdc
Last Action: DELETE
Last Action State: failed
Last Action Description: Instance deletion failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3ef60125-5f93-4b98-b50c-7050b6877fdc, broker-request-id: cb5f0ad3-9aa1-4b88-a857-d14fe0a6860f, task-id: 610, operation: delete
Kubernetes Master Host: team.pks.lab01.pcf.pw
Kubernetes Master Port: 8443
Worker Instances: 3
Kubernetes Master IP(s): In Progress
|
However this failed! Looking at the bosh logs (task id in the error message) highlighted that the deployment was missing (not worked/completed etc.).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
bosh -e gcp task 610
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Task 610
Task 610 | 15:58:40 | Preparing deployment: Preparing deployment (00:00:00)
L Error: - Deployment '' doesn't exist
Task 610 | 15:58:40 | Error: - Deployment '' doesn't exist
Task 610 Started Tue Aug 14 15:58:40 UTC 2018
Task 610 Finished Tue Aug 14 15:58:40 UTC 2018
Task 610 Duration 00:00:00
Task 610 error
Capturing task '610' output:
Expected task '610' to succeed but state is 'error'
Exit code 1
|
As a result I decided to look at what Bosh thought it had deployed via a different command. As you can see deployment 3ef60125-5f93-4b98-b50c-7050b6877fdc has no VMs (and the UUID is the one related to the pks commands.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
bosh -e gcp vms
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Deployment 'pivotal-container-service-20a20b27578b472d13ce'
Instance Process State AZ IPs VM CID VM Type Active
pivotal-container-service/5cd61225-c054-49c7-8173-99bb4178b493 running europe-west2-a 192.168.101.11 vm-f37ec602-73e0-4205-5f17-94d609a24182 large true
1 vms
Deployment 'service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc'
Instance Process State AZ IPs VM CID VM Type Active
0 vms
Succeeded
|
So rather than typing to get the pks CLI to force delete the deployment I used bosh CLI to delete the deployment in question.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
|
bosh -e gcp -d service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc delete-deployment
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Using deployment 'service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc'
Continue? [yN]: y
Task 926
Task 926 | 08:01:28 | Deleting instances: apply-addons/41df5316-c626-4c24-837d-2261d7cd4bf7 (0)
Task 926 | 08:01:28 | Deleting instances: master/10b23cee-b223-45e0-964c-d3ffdf18abcc (0)
Task 926 | 08:01:28 | Deleting instances: worker/4ce63992-1886-4f99-a844-aabe675849e5 (1)
Task 926 | 08:01:28 | Deleting instances: worker/1fcfc532-7a9f-4799-b65d-e6bcfd94750e (0)
Task 926 | 08:01:28 | Deleting instances: worker/0ac0f25a-8738-4e55-9a62-789ffec98572 (2)
Task 926 | 08:01:29 | Deleting instances: master/10b23cee-b223-45e0-964c-d3ffdf18abcc (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/1fcfc532-7a9f-4799-b65d-e6bcfd94750e (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: apply-addons/41df5316-c626-4c24-837d-2261d7cd4bf7 (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/4ce63992-1886-4f99-a844-aabe675849e5 (1) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/0ac0f25a-8738-4e55-9a62-789ffec98572 (2) (00:00:01)
Task 926 | 08:01:29 | Removing deployment artifacts: Detaching stemcells (00:00:00)
Task 926 | 08:01:29 | Removing deployment artifacts: Detaching releases (00:00:00)
Task 926 | 08:01:29 | Deleting properties: Destroying deployment (00:00:00)
Task 926 Started Wed Aug 15 08:01:28 UTC 2018
Task 926 Finished Wed Aug 15 08:01:29 UTC 2018
Task 926 Duration 00:00:01
Task 926 done
Succeeded
|
Now that is complete lets make sure that the pks CLI reflects that the deployment (K8s cluster) has gone.
1
2
3
4
|
pks clusters
Name Plan Name UUID Status Action
team small 3ef60125-5f93-4b98-b50c-7050b6877fdc failed DELETE
|
Initially it still showed up in the list so to try and kick the CLI into querying the lasted bosh state I reissued the delete-cluster.
1
2
3
|
pks delete-cluster team
Error: Cluster has already been deleted
|
Which then cleared up the error’ed cluster and lets me start again once I have fixed the original typo with service accounts.
1
2
3
|
pks clusters
Name Plan Name UUID Status Action
|