OCPBUGS-67229: Set `NodeDegraded` MCN condition when node state annotation is set to `Degraded` #5509

isabella-janssen · 2025-12-18T20:38:49Z

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

Launch a TechPreview cluster with this PR enabled.

launch 4.22 gcp,techpreview

Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:
a. Create a MC to deploy a dropin file

$ oc apply -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  creationTimestamp: "2025-12-11T15:09:37Z"
  generation: 1
  labels:
    machineconfiguration.openshift.io/role: worker
  name: drifted-dropins-test
  resourceVersion: "190655"
  uid: 63ecfe84-8149-40cb-82e3-9a35a3b37954
spec:
  config:
    ignition:
      version: 3.5.0
    passwd:
      users: []
    storage:
      files: []
    systemd:
      units:
      - dropins:
        - contents: |-
            [Service]
            Environment="FAKE_OPTS=fake-value"
          name: 10-chrony-drop-test.conf
        enabled: true
        name: chronyd.service
  extensions: []
  kernelArguments: []
  osImageURL: ""
EOF

b. When the update has completed, manually modify the deployed dropin file to force a configuration drift.

$ oc debug node/ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz
  # chroot /host
  # nano /etc/systemd/system/chronyd.servt /etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf
  # cat /etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf
[Service]
Environment="FAKE_OPTS=fake-value-new"

Check that the MCP correctly reports as degraded.

$ oc get mcp worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-dd3f981617fd37b167db506d3a2cfd84   False     True       True       3              2                   2                     1                      112m
$ oc describe mcp worker
Name:         worker
...
Status:
...
  Conditions:
    Last Transition Time:  2025-12-19T14:32:56Z
    Message:               
    Reason:                
    Status:                False
    Type:                  Updated
    Last Transition Time:  2025-12-19T14:32:56Z
    Message:               All nodes are updating to MachineConfig rendered-worker-dd3f981617fd37b167db506d3a2cfd84
    Reason:                
    Status:                True
    Type:                  Updating
    Last Transition Time:  2025-12-19T14:32:56Z
    Message:               Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz upgrade failure. unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\"", Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\""
    Reason:                1 nodes are reporting degraded status on sync
    Status:                True
    Type:                  NodeDegraded
    Last Transition Time:  2025-12-19T14:32:56Z
    Message:               Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz upgrade failure. unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\"", Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\""
    Reason:                
    Status:                True
    Type:                  Degraded
...
  Degraded Machine Count:  1
  Machine Count:           3
...
  Ready Machine Count:          2
  Unavailable Machine Count:    1
  Updated Machine Count:        2
...

- Description for the changelog
OCPBUGS-67229: Update the flow to set a node's state annotation to Degraded to also set the NodeDegraded condition in the MCN

openshift-ci · 2025-12-18T20:38:54Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2025-12-18T20:39:22Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: isabella-janssen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [isabella-janssen]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2025-12-18T20:41:04Z

@isabella-janssen: An error was encountered adding this pull request to the external tracker bugs for bug OCPBUGS-67229 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message.


failed to add remote link: failed to add link: No Link Issue Permission for issue 'OCPBUGS-67229'.: request failed. Please analyze the request body for more details. Status code: 403:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

Launch a TechPreview cluster with this PR enabled.

Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:

Check that the MCP correctly reports as degraded.

- Description for the changelog

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-18T20:42:08Z

@isabella-janssen: An error was encountered adding this pull request to the external tracker bugs for bug OCPBUGS-67229 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message.


failed to add remote link: failed to add link: No Link Issue Permission for issue 'OCPBUGS-67229'.: request failed. Please analyze the request body for more details. Status code: 403:

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

Launch a TechPreview cluster with this PR enabled.

Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:

Check that the MCP correctly reports as degraded.

- Description for the changelog
OCPBUGS-67229: Update the flow to set a node's state annotation to Degraded to also set the NodeDegraded condition in the MCN

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-18T20:59:40Z

@isabella-janssen: An error was encountered updating to the POST state for bug OCPBUGS-67229 on the Jira server at https://issues.redhat.com/. No known errors were detected, please see the full error message for details.

Full error message.


No transition status with name `POST` could be found. Please select from the following list: []

Please contact an administrator to resolve this issue, then request a bug refresh with /jira refresh.

Details

In response to this:

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

Launch a TechPreview cluster with this PR enabled.
launch 4.22 gcp,techpreview
Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:
Check that the MCP correctly reports as degraded.

- Description for the changelog
OCPBUGS-67229: Update the flow to set a node's state annotation to Degraded to also set the NodeDegraded condition in the MCN

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci-robot · 2025-12-19T14:37:39Z

@isabella-janssen: This pull request references Jira Issue OCPBUGS-67229, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.22.0) matches configured target version for branch (4.22.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @sergiordlr

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Closes: OCPBUGS-67229

- What I did
This updates the set degraded flow to include populating the degrade condition in the MCN of a node. This prevents discrepancies in the degrades being reported in node annotations and the conditions in the MCN resource and further discrepancies in the MCP reporting in TechPreview where degraded machine counts are determined by MCN conditions.

- How to verify it

Launch a TechPreview cluster with this PR enabled.

launch 4.22 gcp,techpreview

Force a config drift to degrade a node. I did this through the following flow outlined in the original bug:
a. Create a MC to deploy a dropin file

$ oc apply -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
 creationTimestamp: "2025-12-11T15:09:37Z"
 generation: 1
 labels:
   machineconfiguration.openshift.io/role: worker
 name: drifted-dropins-test
 resourceVersion: "190655"
 uid: 63ecfe84-8149-40cb-82e3-9a35a3b37954
spec:
 config:
   ignition:
     version: 3.5.0
   passwd:
     users: []
   storage:
     files: []
   systemd:
     units:
     - dropins:
       - contents: |-
           [Service]
           Environment="FAKE_OPTS=fake-value"
         name: 10-chrony-drop-test.conf
       enabled: true
       name: chronyd.service
 extensions: []
 kernelArguments: []
 osImageURL: ""
EOF

b. When the update has completed, manually modify the deployed dropin file to force a configuration drift.

$ oc debug node/ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz
 # chroot /host
 # nano /etc/systemd/system/chronyd.servt /etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf
 # cat /etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf
[Service]
Environment="FAKE_OPTS=fake-value-new"

Check that the MCP correctly reports as degraded.

$ oc get mcp worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-dd3f981617fd37b167db506d3a2cfd84   False     True       True       3              2                   2                     1                      112m
$ oc describe mcp worker
Name:         worker
...
Status:
...
 Conditions:
   Last Transition Time:  2025-12-19T14:32:56Z
   Message:               
   Reason:                
   Status:                False
   Type:                  Updated
   Last Transition Time:  2025-12-19T14:32:56Z
   Message:               All nodes are updating to MachineConfig rendered-worker-dd3f981617fd37b167db506d3a2cfd84
   Reason:                
   Status:                True
   Type:                  Updating
   Last Transition Time:  2025-12-19T14:32:56Z
   Message:               Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz upgrade failure. unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\"", Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\""
   Reason:                1 nodes are reporting degraded status on sync
   Status:                True
   Type:                  NodeDegraded
   Last Transition Time:  2025-12-19T14:32:56Z
   Message:               Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz upgrade failure. unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\"", Node ci-ln-tgf5wlt-72292-5l6mx-worker-a-jgwqz is reporting: "unexpected on-disk state validating against rendered-worker-dd3f981617fd37b167db506d3a2cfd84: content mismatch for file \"/etc/systemd/system/chronyd.service.d/10-chrony-drop-test.conf\""
   Reason:                
   Status:                True
   Type:                  Degraded
...
 Degraded Machine Count:  1
 Machine Count:           3
...
 Ready Machine Count:          2
 Unavailable Machine Count:    1
 Updated Machine Count:        2
...

- Description for the changelog
OCPBUGS-67229: Update the flow to set a node's state annotation to Degraded to also set the NodeDegraded condition in the MCN

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

…ition in the MCN

openshift-ci · 2025-12-19T18:27:49Z

@isabella-janssen: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-hypershift	`edd6fb7`	link	true	`/test e2e-hypershift`
ci/prow/bootstrap-unit	`edd6fb7`	link	false	`/test bootstrap-unit`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 18, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 18, 2025

isabella-janssen changed the title ~~(WIP) OCPBUGS-67229~~ (WIP) OCPBUGS-67229: Set NodeDegraded MCN condition when node state annotation is set to Degraded Dec 18, 2025

openshift-ci bot requested a review from sergiordlr December 19, 2025 14:37

isabella-janssen changed the title ~~(WIP) OCPBUGS-67229: Set NodeDegraded MCN condition when node state annotation is set to Degraded~~ OCPBUGS-67229: Set NodeDegraded MCN condition when node state annotation is set to Degraded Dec 19, 2025

isabella-janssen marked this pull request as ready for review December 19, 2025 14:39

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 19, 2025

openshift-ci bot requested review from RishabhSaini and yuqi-zhang December 19, 2025 14:39

isabella-janssen force-pushed the ocpbugs-67229 branch from 6e93926 to 2867a27 Compare December 19, 2025 14:51

daemon: update 'SetDegrade' flow to include updating the degrade cond…

edd6fb7

…ition in the MCN

isabella-janssen force-pushed the ocpbugs-67229 branch from 2867a27 to edd6fb7 Compare December 19, 2025 15:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCPBUGS-67229: Set `NodeDegraded` MCN condition when node state annotation is set to `Degraded` #5509

OCPBUGS-67229: Set `NodeDegraded` MCN condition when node state annotation is set to `Degraded` #5509

isabella-janssen commented Dec 18, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 19, 2025

Uh oh!

openshift-ci bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OCPBUGS-67229: Set NodeDegraded MCN condition when node state annotation is set to Degraded #5509

Are you sure you want to change the base?

OCPBUGS-67229: Set NodeDegraded MCN condition when node state annotation is set to Degraded #5509

Conversation

isabella-janssen commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

openshift-ci bot commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 18, 2025

Uh oh!

openshift-ci-robot commented Dec 19, 2025

Uh oh!

openshift-ci bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OCPBUGS-67229: Set `NodeDegraded` MCN condition when node state annotation is set to `Degraded` #5509

OCPBUGS-67229: Set `NodeDegraded` MCN condition when node state annotation is set to `Degraded` #5509

isabella-janssen commented Dec 18, 2025 •

edited

Loading