Appframeworks s1 tests stability fix by gabrielm-splunk · Pull Request #1712 · splunk/splunk-operator

gabrielm-splunk · 2026-02-19T16:22:36Z

Description

Small fix to appframeworksS1 tests as the standalone specs that reference the MC cause some issues (as identified by claude). Will post claude explanation of bug and fix in comments

Key Changes

Just adding Replicas: 1 to standalone specs with MC ref

Testing and Verification

Ran tests locally

Related Issues

Stemmed from 10.2 certification: https://splunk.atlassian.net/browse/CSPL-4531

PR Checklist

Code changes adhere to the project's coding standards.
Relevant unit and integration tests are included.
Documentation has been updated accordingly.
All tests pass locally.
The PR description follows the project's guidelines.

… was able to identify while running CI for 10.2 certification

gabrielm-splunk · 2026-02-19T16:24:58Z

Claude analysis

Bug Explanation: MonitoringConsole Restart Loop Due to Empty SPLUNK_STANDALONE_URL

The Problem

When a Standalone CR with a MonitoringConsoleRef was deployed, the MonitoringConsole pod would enter a restart loop, timing out on its startup probe after ~6.7 minutes and continuously restarting.

Root Cause

The bug stems from a race condition in the operator's reconciliation logic combined with Go's zero-value semantics:

Go Zero Values: In Go, when you create a struct without specifying a field, numeric types default to 0. So when the test created a StandaloneSpec without setting Replicas, it defaulted to 0:

spec := enterpriseApi.StandaloneSpec{
CommonSplunkSpec: enterpriseApi.CommonSplunkSpec{...},
AppFrameworkConfig: appFrameworkSpec,
// Replicas is not set, so it defaults to int32(0)
}
2. ConfigMap Creation Order: During reconciliation, the operator creates the MonitoringConsole's ConfigMap before applying the default replica count. The flow is:
- pkg/splunk/enterprise/standalone.go:228 calls ApplyMonitoringConsoleEnvConfigMap()
- This happens before line 291-292 where the default is applied:
if cr.Spec.Replicas == 0 {
cr.Spec.Replicas = 1
}
3. URL Generation with 0 Replicas: The function GetSplunkStatefulsetUrls() in pkg/splunk/enterprise/names.go:267-272 generates URLs based on replica count:
func GetSplunkStatefulsetUrls(..., replicas int32, ...) string {
urls := make([]string, replicas) // With replicas=0, this creates an empty slice
for i := int32(0); i < replicas; i++ {
urls[i] = GetSplunkStatefulsetURL(...)
}
return strings.Join(urls, ",") // Returns empty string ""
}
4. MonitoringConsole Startup Failure: The MonitoringConsole pod starts with an empty SPLUNK_STANDALONE_URL in its ConfigMap. Inside the container, an Ansible playbook tries to configure peer connections to the standalones listed in this URL. With an
empty/invalid URL list, the playbook hangs indefinitely waiting for non-existent hosts.
5. Startup Probe Timeout: After ~6.7 minutes (400 seconds), the Kubernetes startup probe kills the container, causing a restart. This creates an infinite restart loop.

Why the Fix Works

By explicitly setting Replicas: 1 in the test spec:

spec := enterpriseApi.StandaloneSpec{
CommonSplunkSpec: enterpriseApi.CommonSplunkSpec{...},
Replicas: 1, // Explicitly set
AppFrameworkConfig: appFrameworkSpec,
}

The Replicas field is already 1 when ApplyMonitoringConsoleEnvConfigMap() is called, so GetSplunkStatefulsetUrls() correctly generates SPLUNK_STANDALONE_URL=splunk--standalone-0...svc.cluster.local instead of an empty string.

Potential Operator-Level Fix

The proper fix in the operator code would be to apply default values before creating dependent resources like ConfigMaps. This would involve reordering the logic in pkg/splunk/enterprise/standalone.go to apply defaults before line 228.

coveralls · 2026-02-19T16:33:43Z

Pull Request Test Coverage Report for Build 22190232827

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 85.924%

Totals
Change from base Build 22145448988:	0.0%
Covered Lines:	11287
Relevant Lines:	13136

💛 - Coveralls

gabrielm-splunk added 2 commits February 19, 2026 11:16

small fix related to appframeworksS1 smoke test stability that claude…

38298fd

… was able to identify while running CI for 10.2 certification

adding this branch to run integration tests to test out this fix

e9cdf29

gabrielm-splunk requested review from Igor-splunk, kasiakoziol, kubabuczak, minjieqiu, patrykw-splunk, qingw-splunk, rlieberman-splunk and vivekr-splunk February 19, 2026 16:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appframeworks s1 tests stability fix#1712

Appframeworks s1 tests stability fix#1712
gabrielm-splunk wants to merge 2 commits intodevelopfrom
appframeworksS1-tests-stability-fix

gabrielm-splunk commented Feb 19, 2026 •

edited

Loading

Uh oh!

gabrielm-splunk commented Feb 19, 2026

Uh oh!

coveralls commented Feb 19, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

gabrielm-splunk commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Key Changes

Testing and Verification

Related Issues

PR Checklist

Uh oh!

gabrielm-splunk commented Feb 19, 2026

** Claude analysis **

Uh oh!

coveralls commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 22190232827

Details

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

gabrielm-splunk commented Feb 19, 2026 •

edited

Loading

Claude analysis

coveralls commented Feb 19, 2026 •

edited

Loading