Playing in the lab the today I came across a scenario in which a pks create-cluster
failed;
pks cluster team
Name: team
Plan Name: small
UUID: 3ef60125-5f93-4b98-b50c-7050b6877fdc
Last Action: CREATE
Last Action State: failed
Last Action Description: Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3ef60125-5f93-4b98-b50c-7050b6877fdc, broker-request-id: dc11b0b4-2a76-4b01-ad42-fe40507b70e6, task-id: 561, operation: create
Kubernetes Master Host: team.pks.lab01.pcf.pw
Kubernetes Master Port: 8443
Worker Instances: 3
Kubernetes Master IP(s): In Progress
First of all I looked at why the cluster create task failed (task 561 from the error message) using bosh task 561
. That highlighted a typo I had with regards to access keys. Now that I know what the issue is I want to delete the cluster;
pks delete-cluster team
Deletion of team in progress
pks cluster team
Name: team
Plan Name: small
UUID: 3ef60125-5f93-4b98-b50c-7050b6877fdc
Last Action: DELETE
Last Action State: failed
Last Action Description: Instance deletion failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3ef60125-5f93-4b98-b50c-7050b6877fdc, broker-request-id: cb5f0ad3-9aa1-4b88-a857-d14fe0a6860f, task-id: 610, operation: delete
Kubernetes Master Host: team.pks.lab01.pcf.pw
Kubernetes Master Port: 8443
Worker Instances: 3
Kubernetes Master IP(s): In Progress
However this failed! Looking at the bosh
logs (task id in the error message) highlighted that the deployment was missing (not worked/completed etc.).
bosh -e gcp task 610
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Task 610
Task 610 | 15:58:40 | Preparing deployment: Preparing deployment (00:00:00)
L Error: - Deployment '' doesn't exist
Task 610 | 15:58:40 | Error: - Deployment '' doesn't exist
Task 610 Started Tue Aug 14 15:58:40 UTC 2018
Task 610 Finished Tue Aug 14 15:58:40 UTC 2018
Task 610 Duration 00:00:00
Task 610 error
Capturing task '610' output:
Expected task '610' to succeed but state is 'error'
Exit code 1
As a result I decided to look at what Bosh thought it had deployed via a different command. As you can see deployment 3ef60125-5f93-4b98-b50c-7050b6877fdc
has no VMs (and the UUID is the one related to the pks
commands.
bosh -e gcp vms
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Deployment 'pivotal-container-service-20a20b27578b472d13ce'
Instance Process State AZ IPs VM CID VM Type Active
pivotal-container-service/5cd61225-c054-49c7-8173-99bb4178b493 running europe-west2-a 192.168.101.11 vm-f37ec602-73e0-4205-5f17-94d609a24182 large true
1 vms
Deployment 'service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc'
Instance Process State AZ IPs VM CID VM Type Active
0 vms
Succeeded
So rather than typing to get the pks
CLI to force delete the deployment I used bosh
CLI to delete the deployment in question.
bosh -e gcp -d service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc delete-deployment
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)
Using deployment 'service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc'
Continue? [yN]: y
Task 926
Task 926 | 08:01:28 | Deleting instances: apply-addons/41df5316-c626-4c24-837d-2261d7cd4bf7 (0)
Task 926 | 08:01:28 | Deleting instances: master/10b23cee-b223-45e0-964c-d3ffdf18abcc (0)
Task 926 | 08:01:28 | Deleting instances: worker/4ce63992-1886-4f99-a844-aabe675849e5 (1)
Task 926 | 08:01:28 | Deleting instances: worker/1fcfc532-7a9f-4799-b65d-e6bcfd94750e (0)
Task 926 | 08:01:28 | Deleting instances: worker/0ac0f25a-8738-4e55-9a62-789ffec98572 (2)
Task 926 | 08:01:29 | Deleting instances: master/10b23cee-b223-45e0-964c-d3ffdf18abcc (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/1fcfc532-7a9f-4799-b65d-e6bcfd94750e (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: apply-addons/41df5316-c626-4c24-837d-2261d7cd4bf7 (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/4ce63992-1886-4f99-a844-aabe675849e5 (1) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/0ac0f25a-8738-4e55-9a62-789ffec98572 (2) (00:00:01)
Task 926 | 08:01:29 | Removing deployment artifacts: Detaching stemcells (00:00:00)
Task 926 | 08:01:29 | Removing deployment artifacts: Detaching releases (00:00:00)
Task 926 | 08:01:29 | Deleting properties: Destroying deployment (00:00:00)
Task 926 Started Wed Aug 15 08:01:28 UTC 2018
Task 926 Finished Wed Aug 15 08:01:29 UTC 2018
Task 926 Duration 00:00:01
Task 926 done
Succeeded
Now that is complete lets make sure that the pks
CLI reflects that the deployment (K8s cluster) has gone.
pks clusters
Name Plan Name UUID Status Action
team small 3ef60125-5f93-4b98-b50c-7050b6877fdc failed DELETE
Initially it still showed up in the list so to try and kick the CLI into querying the lasted bosh
state I reissued the delete-cluster
.
pks delete-cluster team
Error: Cluster has already been deleted
Which then cleared up the error’ed cluster and lets me start again once I have fixed the original typo with service accounts.
pks clusters
Name Plan Name UUID Status Action