Skip to content

Conversation

@isabella-janssen
Copy link
Member

@isabella-janssen isabella-janssen commented Dec 18, 2025

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

  1. Launch a TechPreview cluster with this PR enabled.
launch 4.22 gcp,techpreview
  1. Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:
    a. Create a MC to deploy a dropin file
$ oc apply -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2025-12-11T15:09:37Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: drifted-dropins-test
  resourceVersion: "190655"
  uid: 63ecfe84-8149-40cb-82e3-9a35a3b37954
spec:
  config:
    ignition:
      version: 3.5.0
    passwd:
      users: []
    storage:
      files: []
    systemd:
      units:
      - dropins:
        - contents: |-
            [Service]
            Environment="FAKE_OPTS=fake-value"
          name: 10-chrony-drop-test.conf
        enabled: true
        name: chronyd.service
  extensions: []
  kernelArguments: []
  osImageURL: ""
EOF

b. When the update has completed, manually modify the deployed dropin file to force a configuration drift.

$ oc debug node/ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz
  # chroot /host
  # nano /etc/systemd/system/chronyd.servt /etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf
  # cat /etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf
[Service]
Environment="FAKE_OPTS=fake-value-new"
  1. Check that the MCP correctly reports as degraded.
$ oc get mcp worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-dd3f981617fd37b167db506d3a2cfd84   False     True       True       3              2                   2                     1                      112m
$ oc describe mcp worker
Name:         worker
...
Status:
...
  Conditions:
    Last Transition Time:  2025-12-19T14:32:56Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2025-12-19T14:32:56Z
    Message:               All nodes are updating to MachineConfig rendered-worker-dd3f981617fd37b167db506d3a2cfd84
    Reason:                
    Status:                True
    Type:                  Updating
    Last Transition Time:  2025-12-19T14:32:56Z
    Message:               Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz upgrade failure. unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\"", Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\""
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2025-12-19T14:32:56Z
    Message:               Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz upgrade failure. unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\"", Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\""
    Reason:                
    Status:                True
    Type:                  Degraded
...
  Degraded Machine Count:  1
  Machine Count:           3
...
  Ready Machine Count:          2
  Unavailable Machine Count:    1
  Updated Machine Count:        2
...

- Description for the changelog
OCPBUGS-67229: Update the flow to set a node's state annotation to Degraded to also set the NodeDegraded condition in the MCN

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 18, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: isabella-janssen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2025
@isabella-janssen isabella-janssen changed the title (WIP) OCPBUGS-67229 (WIP) OCPBUGS-67229: Set NodeDegraded MCN condition when node state annotation is set to Degraded Dec 18, 2025
@openshift-ci-robot
Copy link
Contributor

@isabella-janssen: An error was encountered adding this pull request to the external tracker bugs for bug OCPBUGS-67229 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. failed to add remote link: failed to add link: No Link Issue Permission for issue 'OCPBUGS-67229'.: request failed. Please analyze the request body for more details. Status code: 403:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

  1. Launch a TechPreview cluster with this PR enabled.

  2. Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:

  3. Check that the MCP correctly reports as degraded.

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

@isabella-janssen: An error was encountered adding this pull request to the external tracker bugs for bug OCPBUGS-67229 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. failed to add remote link: failed to add link: No Link Issue Permission for issue 'OCPBUGS-67229'.: request failed. Please analyze the request body for more details. Status code: 403:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

  1. Launch a TechPreview cluster with this PR enabled.

  2. Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:

  3. Check that the MCP correctly reports as degraded.

- Description for the changelog
OCPBUGS-67229: Update the flow to set a node's state annotation to Degraded to also set the NodeDegraded condition in the MCN

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

@isabella-janssen: An error was encountered updating to the POST state for bug OCPBUGS-67229 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message. No transition status with name `POST` could be found. Please select from the following list: []

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

  1. Launch a TechPreview cluster with this PR enabled.
launch 4.22 gcp,techpreview
  1. Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:

  1. Check that the MCP correctly reports as degraded.

- Description for the changelog
OCPBUGS-67229: Update the flow to set a node's state annotation to Degraded to also set the NodeDegraded condition in the MCN

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Dec 19, 2025
@openshift-ci-robot
Copy link
Contributor

@isabella-janssen: This pull request references Jira Issue OCPBUGS-67229, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.22.0) matches configured target version for branch (4.22.0)
  • bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

  1. Launch a TechPreview cluster with this PR enabled.
launch 4.22 gcp,techpreview
  1. Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:
    a. Create a MC to deploy a dropin file
$ oc apply -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
 creationTimestamp: "2025-12-11T15:09:37Z"
 generation: 1
 labels:
   machineconfiguration.openshift.io/role: worker
 name: drifted-dropins-test
 resourceVersion: "190655"
 uid: 63ecfe84-8149-40cb-82e3-9a35a3b37954
spec:
 config:
   ignition:
     version: 3.5.0
   passwd:
     users: []
   storage:
     files: []
   systemd:
     units:
     - dropins:
       - contents: |-
           [Service]
           Environment="FAKE_OPTS=fake-value"
         name: 10-chrony-drop-test.conf
       enabled: true
       name: chronyd.service
 extensions: []
 kernelArguments: []
 osImageURL: ""
EOF

b. When the update has completed, manually modify the deployed dropin file to force a configuration drift.

$ oc debug node/ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz
 # chroot /host
 # nano /etc/systemd/system/chronyd.servt /etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf
 # cat /etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf
[Service]
Environment="FAKE_OPTS=fake-value-new"
  1. Check that the MCP correctly reports as degraded.
$ oc get mcp worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-dd3f981617fd37b167db506d3a2cfd84   False     True       True       3              2                   2                     1                      112m
$ oc describe mcp worker
Name:         worker
...
Status:
...
 Conditions:
   Last Transition Time:  2025-12-19T14:32:56Z
   Message:               
   Reason:                
   Status:                False
   Type:                  Updated
   Last Transition Time:  2025-12-19T14:32:56Z
   Message:               All nodes are updating to MachineConfig rendered-worker-dd3f981617fd37b167db506d3a2cfd84
   Reason:                
   Status:                True
   Type:                  Updating
   Last Transition Time:  2025-12-19T14:32:56Z
   Message:               Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz upgrade failure. unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\"", Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\""
   Reason:                1 nodes are reporting degraded status on sync
   Status:                True
   Type:                  NodeDegraded
   Last Transition Time:  2025-12-19T14:32:56Z
   Message:               Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz upgrade failure. unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\"", Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\""
   Reason:                
   Status:                True
   Type:                  Degraded
...
 Degraded Machine Count:  1
 Machine Count:           3
...
 Ready Machine Count:          2
 Unavailable Machine Count:    1
 Updated Machine Count:        2
...

- Description for the changelog
OCPBUGS-67229: Update the flow to set a node's state annotation to Degraded to also set the NodeDegraded condition in the MCN

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested a review from sergiordlr December 19, 2025 14:37
@isabella-janssen isabella-janssen changed the title (WIP) OCPBUGS-67229: Set NodeDegraded MCN condition when node state annotation is set to Degraded OCPBUGS-67229: Set NodeDegraded MCN condition when node state annotation is set to Degraded Dec 19, 2025
@isabella-janssen isabella-janssen marked this pull request as ready for review December 19, 2025 14:39
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 19, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 19, 2025

@isabella-janssen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-hypershift edd6fb7 link true /test e2e-hypershift
ci/prow/bootstrap-unit edd6fb7 link false /test bootstrap-unit

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-important Referenced Jira bug's severity is important for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants