Skip to content

Appframeworks s1 tests stability fix#1712

Open
gabrielm-splunk wants to merge 2 commits intodevelopfrom
appframeworksS1-tests-stability-fix
Open

Appframeworks s1 tests stability fix#1712
gabrielm-splunk wants to merge 2 commits intodevelopfrom
appframeworksS1-tests-stability-fix

Conversation

@gabrielm-splunk
Copy link
Collaborator

@gabrielm-splunk gabrielm-splunk commented Feb 19, 2026

Description

Small fix to appframeworksS1 tests as the standalone specs that reference the MC cause some issues (as identified by claude). Will post claude explanation of bug and fix in comments

Key Changes

Just adding Replicas: 1 to standalone specs with MC ref

Testing and Verification

Ran tests locally

Related Issues

Stemmed from 10.2 certification: https://splunk.atlassian.net/browse/CSPL-4531

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

@gabrielm-splunk
Copy link
Collaborator Author

** Claude analysis **

Bug Explanation: MonitoringConsole Restart Loop Due to Empty SPLUNK_STANDALONE_URL

The Problem

When a Standalone CR with a MonitoringConsoleRef was deployed, the MonitoringConsole pod would enter a restart loop, timing out on its startup probe after ~6.7 minutes and continuously restarting.

Root Cause

The bug stems from a race condition in the operator's reconciliation logic combined with Go's zero-value semantics:

  1. Go Zero Values: In Go, when you create a struct without specifying a field, numeric types default to 0. So when the test created a StandaloneSpec without setting Replicas, it defaulted to 0:

spec := enterpriseApi.StandaloneSpec{
CommonSplunkSpec: enterpriseApi.CommonSplunkSpec{...},
AppFrameworkConfig: appFrameworkSpec,
// Replicas is not set, so it defaults to int32(0)
}
2. ConfigMap Creation Order: During reconciliation, the operator creates the MonitoringConsole's ConfigMap before applying the default replica count. The flow is:
- pkg/splunk/enterprise/standalone.go:228 calls ApplyMonitoringConsoleEnvConfigMap()
- This happens before line 291-292 where the default is applied:
if cr.Spec.Replicas == 0 {
cr.Spec.Replicas = 1
}
3. URL Generation with 0 Replicas: The function GetSplunkStatefulsetUrls() in pkg/splunk/enterprise/names.go:267-272 generates URLs based on replica count:
func GetSplunkStatefulsetUrls(..., replicas int32, ...) string {
urls := make([]string, replicas) // With replicas=0, this creates an empty slice
for i := int32(0); i < replicas; i++ {
urls[i] = GetSplunkStatefulsetURL(...)
}
return strings.Join(urls, ",") // Returns empty string ""
}
4. MonitoringConsole Startup Failure: The MonitoringConsole pod starts with an empty SPLUNK_STANDALONE_URL in its ConfigMap. Inside the container, an Ansible playbook tries to configure peer connections to the standalones listed in this URL. With an
empty/invalid URL list, the playbook hangs indefinitely waiting for non-existent hosts.
5. Startup Probe Timeout: After ~6.7 minutes (400 seconds), the Kubernetes startup probe kills the container, causing a restart. This creates an infinite restart loop.

Why the Fix Works

By explicitly setting Replicas: 1 in the test spec:

spec := enterpriseApi.StandaloneSpec{
CommonSplunkSpec: enterpriseApi.CommonSplunkSpec{...},
Replicas: 1, // Explicitly set
AppFrameworkConfig: appFrameworkSpec,
}

The Replicas field is already 1 when ApplyMonitoringConsoleEnvConfigMap() is called, so GetSplunkStatefulsetUrls() correctly generates SPLUNK_STANDALONE_URL=splunk--standalone-0...svc.cluster.local instead of an empty string.

Potential Operator-Level Fix

The proper fix in the operator code would be to apply default values before creating dependent resources like ConfigMaps. This would involve reordering the logic in pkg/splunk/enterprise/standalone.go to apply defaults before line 228.

@coveralls
Copy link
Collaborator

coveralls commented Feb 19, 2026

Pull Request Test Coverage Report for Build 22190232827

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 85.924%

Totals Coverage Status
Change from base Build 22145448988: 0.0%
Covered Lines: 11287
Relevant Lines: 13136

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments