PKS force delete

Playing in the lab the today I came across a scenario in which a pks create-cluster failed;

pks cluster team

Name:                     team
Plan Name:                small
UUID:                     3ef60125-5f93-4b98-b50c-7050b6877fdc
Last Action:              CREATE
Last Action State:        failed
Last Action Description:  Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3ef60125-5f93-4b98-b50c-7050b6877fdc, broker-request-id: dc11b0b4-2a76-4b01-ad42-fe40507b70e6, task-id: 561, operation: create
Kubernetes Master Host:   team.pks.lab01.pcf.pw
Kubernetes Master Port:   8443
Worker Instances:         3
Kubernetes Master IP(s):  In Progress

First of all I looked at why the cluster create task failed (task 561 from the error message) using bosh task 561. That highlighted a typo I had with regards to access keys. Now that I know what the issue is I want to delete the cluster;

pks delete-cluster team

Deletion of team in progress

pks cluster team

Name:                     team
Plan Name:                small
UUID:                     3ef60125-5f93-4b98-b50c-7050b6877fdc
Last Action:              DELETE
Last Action State:        failed
Last Action Description:  Instance deletion failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3ef60125-5f93-4b98-b50c-7050b6877fdc, broker-request-id: cb5f0ad3-9aa1-4b88-a857-d14fe0a6860f, task-id: 610, operation: delete
Kubernetes Master Host:   team.pks.lab01.pcf.pw
Kubernetes Master Port:   8443
Worker Instances:         3
Kubernetes Master IP(s):  In Progress

However this failed! Looking at the bosh logs (task id in the error message) highlighted that the deployment was missing (not worked/completed etc.).

bosh -e gcp task 610
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Task 610

Task 610 | 15:58:40 | Preparing deployment: Preparing deployment (00:00:00)
                    L Error: - Deployment '' doesn't exist
Task 610 | 15:58:40 | Error: - Deployment '' doesn't exist

Task 610 Started  Tue Aug 14 15:58:40 UTC 2018
Task 610 Finished Tue Aug 14 15:58:40 UTC 2018
Task 610 Duration 00:00:00
Task 610 error

Capturing task '610' output:
  Expected task '610' to succeed but state is 'error'

Exit code 1

As a result I decided to look at what Bosh thought it had deployed via a different command. As you can see deployment 3ef60125-5f93-4b98-b50c-7050b6877fdc has no VMs (and the UUID is the one related to the pks commands.

bosh -e gcp vms
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Deployment 'pivotal-container-service-20a20b27578b472d13ce'

Instance                                                        Process State  AZ              IPs             VM CID                                   VM Type  Active
pivotal-container-service/5cd61225-c054-49c7-8173-99bb4178b493  running        europe-west2-a  192.168.101.11  vm-f37ec602-73e0-4205-5f17-94d609a24182  large    true

1 vms

Deployment 'service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc'

Instance  Process State  AZ  IPs  VM CID  VM Type  Active

0 vms

Succeeded

So rather than typing to get the pks CLI to force delete the deployment I used bosh CLI to delete the deployment in question.

bosh -e gcp -d service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc delete-deployment
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Using deployment 'service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc'

Continue? [yN]: y

Task 926

Task 926 | 08:01:28 | Deleting instances: apply-addons/41df5316-c626-4c24-837d-2261d7cd4bf7 (0)
Task 926 | 08:01:28 | Deleting instances: master/10b23cee-b223-45e0-964c-d3ffdf18abcc (0)
Task 926 | 08:01:28 | Deleting instances: worker/4ce63992-1886-4f99-a844-aabe675849e5 (1)
Task 926 | 08:01:28 | Deleting instances: worker/1fcfc532-7a9f-4799-b65d-e6bcfd94750e (0)
Task 926 | 08:01:28 | Deleting instances: worker/0ac0f25a-8738-4e55-9a62-789ffec98572 (2)
Task 926 | 08:01:29 | Deleting instances: master/10b23cee-b223-45e0-964c-d3ffdf18abcc (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/1fcfc532-7a9f-4799-b65d-e6bcfd94750e (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: apply-addons/41df5316-c626-4c24-837d-2261d7cd4bf7 (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/4ce63992-1886-4f99-a844-aabe675849e5 (1) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/0ac0f25a-8738-4e55-9a62-789ffec98572 (2) (00:00:01)
Task 926 | 08:01:29 | Removing deployment artifacts: Detaching stemcells (00:00:00)
Task 926 | 08:01:29 | Removing deployment artifacts: Detaching releases (00:00:00)
Task 926 | 08:01:29 | Deleting properties: Destroying deployment (00:00:00)

Task 926 Started  Wed Aug 15 08:01:28 UTC 2018
Task 926 Finished Wed Aug 15 08:01:29 UTC 2018
Task 926 Duration 00:00:01
Task 926 done

Succeeded

Now that is complete lets make sure that the pks CLI reflects that the deployment (K8s cluster) has gone.

pks clusters

Name     Plan Name  UUID                                  Status  Action
team     small      3ef60125-5f93-4b98-b50c-7050b6877fdc  failed  DELETE

Initially it still showed up in the list so to try and kick the CLI into querying the lasted bosh state I reissued the delete-cluster.

pks delete-cluster team

Error: Cluster has already been deleted

Which then cleared up the error’ed cluster and lets me start again once I have fixed the original typo with service accounts.

pks clusters

Name  Plan Name  UUID  Status  Action