PKS force delete

Playing in the lab the today I came across a scenario in which a pks create-cluster failed;

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


pks cluster team

Name:                     team
Plan Name:                small
UUID:                     3ef60125-5f93-4b98-b50c-7050b6877fdc
Last Action:              CREATE
Last Action State:        failed
Last Action Description:  Instance provisioning failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3ef60125-5f93-4b98-b50c-7050b6877fdc, broker-request-id: dc11b0b4-2a76-4b01-ad42-fe40507b70e6, task-id: 561, operation: create
Kubernetes Master Host:   team.pks.lab01.pcf.pw
Kubernetes Master Port:   8443
Worker Instances:         3
Kubernetes Master IP(s):  In Progress

First of all I looked at why the cluster create task failed (task 561 from the error message) using bosh task 561. That highlighted a typo I had with regards to access keys. Now that I know what the issue is I want to delete the cluster;

1
2
3


pks delete-cluster team

Deletion of team in progress

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12


pks cluster team

Name:                     team
Plan Name:                small
UUID:                     3ef60125-5f93-4b98-b50c-7050b6877fdc
Last Action:              DELETE
Last Action State:        failed
Last Action Description:  Instance deletion failed: There was a problem completing your request. Please contact your operations team providing the following information: service: p.pks, service-instance-guid: 3ef60125-5f93-4b98-b50c-7050b6877fdc, broker-request-id: cb5f0ad3-9aa1-4b88-a857-d14fe0a6860f, task-id: 610, operation: delete
Kubernetes Master Host:   team.pks.lab01.pcf.pw
Kubernetes Master Port:   8443
Worker Instances:         3
Kubernetes Master IP(s):  In Progress

However this failed! Looking at the bosh logs (task id in the error message) highlighted that the deployment was missing (not worked/completed etc.).

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


bosh -e gcp task 610
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Task 610

Task 610 | 15:58:40 | Preparing deployment: Preparing deployment (00:00:00)
                    L Error: - Deployment '' doesn't exist
Task 610 | 15:58:40 | Error: - Deployment '' doesn't exist

Task 610 Started  Tue Aug 14 15:58:40 UTC 2018
Task 610 Finished Tue Aug 14 15:58:40 UTC 2018
Task 610 Duration 00:00:00
Task 610 error

Capturing task '610' output:
  Expected task '610' to succeed but state is 'error'

Exit code 1

As a result I decided to look at what Bosh thought it had deployed via a different command. As you can see deployment 3ef60125-5f93-4b98-b50c-7050b6877fdc has no VMs (and the UUID is the one related to the pks commands.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


bosh -e gcp vms
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Deployment 'pivotal-container-service-20a20b27578b472d13ce'

Instance                                                        Process State  AZ              IPs             VM CID                                   VM Type  Active
pivotal-container-service/5cd61225-c054-49c7-8173-99bb4178b493  running        europe-west2-a  192.168.101.11  vm-f37ec602-73e0-4205-5f17-94d609a24182  large    true

1 vms

Deployment 'service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc'

Instance  Process State  AZ  IPs  VM CID  VM Type  Active

0 vms

Succeeded

So rather than typing to get the pks CLI to force delete the deployment I used bosh CLI to delete the deployment in question.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


bosh -e gcp -d service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc delete-deployment
Using environment '192.168.101.10' as user 'director' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Using deployment 'service-instance_3ef60125-5f93-4b98-b50c-7050b6877fdc'

Continue? [yN]: y

Task 926

Task 926 | 08:01:28 | Deleting instances: apply-addons/41df5316-c626-4c24-837d-2261d7cd4bf7 (0)
Task 926 | 08:01:28 | Deleting instances: master/10b23cee-b223-45e0-964c-d3ffdf18abcc (0)
Task 926 | 08:01:28 | Deleting instances: worker/4ce63992-1886-4f99-a844-aabe675849e5 (1)
Task 926 | 08:01:28 | Deleting instances: worker/1fcfc532-7a9f-4799-b65d-e6bcfd94750e (0)
Task 926 | 08:01:28 | Deleting instances: worker/0ac0f25a-8738-4e55-9a62-789ffec98572 (2)
Task 926 | 08:01:29 | Deleting instances: master/10b23cee-b223-45e0-964c-d3ffdf18abcc (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/1fcfc532-7a9f-4799-b65d-e6bcfd94750e (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: apply-addons/41df5316-c626-4c24-837d-2261d7cd4bf7 (0) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/4ce63992-1886-4f99-a844-aabe675849e5 (1) (00:00:01)
Task 926 | 08:01:29 | Deleting instances: worker/0ac0f25a-8738-4e55-9a62-789ffec98572 (2) (00:00:01)
Task 926 | 08:01:29 | Removing deployment artifacts: Detaching stemcells (00:00:00)
Task 926 | 08:01:29 | Removing deployment artifacts: Detaching releases (00:00:00)
Task 926 | 08:01:29 | Deleting properties: Destroying deployment (00:00:00)

Task 926 Started  Wed Aug 15 08:01:28 UTC 2018
Task 926 Finished Wed Aug 15 08:01:29 UTC 2018
Task 926 Duration 00:00:01
Task 926 done

Succeeded

Now that is complete lets make sure that the pks CLI reflects that the deployment (K8s cluster) has gone.

1
2
3
4


pks clusters

Name     Plan Name  UUID                                  Status  Action
team     small      3ef60125-5f93-4b98-b50c-7050b6877fdc  failed  DELETE

Initially it still showed up in the list so to try and kick the CLI into querying the lasted bosh state I reissued the delete-cluster.

1
2
3


pks delete-cluster team

Error: Cluster has already been deleted

Which then cleared up the error’ed cluster and lets me start again once I have fixed the original typo with service accounts.

1
2
3


pks clusters

Name  Plan Name  UUID  Status  Action

$ ls -l related_posts/