feat(metrics): add comprehensive submission debugging metrics with env control #29

auricom · 2025-12-22T15:44:32Z

Problem

The ev_metrics_submission_* metrics were sometimes not displayed even when Celestia RPC endpoints appeared to be configured properly. This made it difficult to debug submission issues.

Solution

This PR adds comprehensive debugging metrics and logging to provide complete visibility into the DA submission process, with environment variable control for debug logging.

Changes Made

New Metrics

submission_attempts_total - Counter tracking all submission attempts (successful + failed)
submission_failures_total - Counter tracking only failed submission attempts
last_submission_attempt_time - Gauge recording timestamp of last attempt
last_successful_submission_time - Gauge recording timestamp of last success

Environment Variable Control

Added EVMETRICS_DEBUG environment variable to control debug logging
Set EVMETRICS_DEBUG=true to enable verbose debug logging
Set EVMETRICS_DEBUG=false or unset to disable debug logging
Default behavior: debug logging is disabled unless explicitly enabled

Enhanced Debug Logging

Added detailed debug logs in RecordSubmissionAttempt method
Added debug logs in RecordSubmissionDaHeight method
Logs include chain ID, submission type, heights, and timestamps
Separate logging for successful vs failed attempts
All debug logging respects the EVMETRICS_DEBUG environment variable

Code Changes

pkg/metrics/metrics.go:
- Added isDebugEnabled() helper function
- Added new metric fields to Metrics struct
- Initialized new metrics in NewWithRegistry
- Added RecordSubmissionAttempt method
- Enhanced existing debug logging with environment variable control
pkg/exporters/verifier/verifier.go:
- Updated onVerified method to call RecordSubmissionAttempt
- Both successful and failed submissions now tracked
pkg/metrics/metrics_test.go:
- Added comprehensive tests for new RecordSubmissionAttempt functionality
- Tests cover both successful and failed submission scenarios

Benefits

Before: When submissions failed, there was no visibility into:

Whether submissions were being attempted at all
How many attempts were made
When the last attempt occurred
Why submissions might be failing

After: Now you get complete visibility:

See all submission attempts, even if they fail
Track failure rates and patterns
Monitor timestamps to identify stalls or delays
Debug logs provide immediate feedback (when enabled)
Environment variable control for production vs debugging modes

Usage

Enable debug logging:

export EVMETRICS_DEBUG=true
./ev-metrics

Disable debug logging (default):

unset EVMETRICS_DEBUG
./ev-metrics

Example Metrics

Working submissions:

ev_metrics_submission_attempts_total{chain_id="yourchain",type="header"} 10
ev_metrics_submission_failures_total{chain_id="yourchain",type="header"} 0
ev_metrics_last_submission_attempt_time{chain_id="yourchain",type="header"} 1766418175
ev_metrics_last_successful_submission_time{chain_id="yourchain",type="header"} 1766418175

Failing submissions:

ev_metrics_submission_attempts_total{chain_id="yourchain",type="data"} 5
ev_metrics_submission_failures_total{chain_id="yourchain",type="data"} 5
ev_metrics_last_submission_attempt_time{chain_id="yourchain",type="data"} 1766418176
ev_metrics_last_successful_submission_time{chain_id="yourchain",type="data"} 0

Debug Log Examples (when EVMETRICS_DEBUG=true)

DEBUG: Successful submission - chain: yourchain, type: header, timestamp: 1766418175
DEBUG: Failed submission attempt - chain: yourchain, type: data, timestamp: 1766418176
DEBUG: Recorded header submission DA height - chain: yourchain, height: 100
DEBUG: Data DA height 50 not higher than previous 75 for chain yourchain

Testing

All existing tests continue to pass
Added comprehensive tests for new functionality
Test coverage includes both success and failure scenarios
Edge cases handled (metrics with zero values, etc.)

Breaking Changes

None - This is a purely additive change that enhances observability without modifying existing behavior.

gemini-code-assist · 2025-12-22T15:44:50Z

Summary of Changes

Hello @auricom, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the monitoring capabilities for the data availability (DA) submission process by introducing a set of new Prometheus metrics. These metrics provide granular insights into the total number of submission attempts, distinguish between successful and failed submissions, and track the timestamps of these critical events. The changes aim to improve system observability, making it easier to monitor performance and troubleshoot issues related to DA submissions.

Highlights

New Submission Attempt Metrics: Introduced new Prometheus metrics to track total submission attempts, failed attempts, and the timestamps of the last attempt and last successful attempt, enhancing observability for the data availability (DA) submission process.
Integration into Verifier: The onVerified function in the verifier (pkg/exporters/verifier/verifier.go) now records submission attempts (success or failure) using the newly added metrics.
Enhanced Debug Logging: Added debug logging for DA height recording and submission attempts within pkg/metrics/metrics.go to provide more detailed insights into the system's behavior.
Comprehensive Testing: Included a new test suite (TestMetrics_RecordSubmissionAttempt in pkg/metrics/metrics_test.go) to thoroughly validate the functionality of the new submission attempt metrics and their recording logic, covering both successful and failed scenarios.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces new Prometheus metrics to track DA submission attempts, including total attempts, failures, and timestamps for the last attempt and last successful submission. The changes are logical and the new metrics will provide valuable insights. My review focuses on improving logging consistency and test code quality.

Specifically, I've pointed out that the new debug logs use the standard log package, which is inconsistent with the project's use of zerolog. I've also suggested refactoring the new test function in pkg/metrics/metrics_test.go to improve its structure and reduce code duplication, which will make it more maintainable.

gemini-code-assist · 2025-12-22T15:46:05Z

pkg/metrics/metrics.go

+	if success {
+		m.LastSuccessfulSubmissionTime.WithLabelValues(chainID, submissionType).Set(float64(now.Unix()))
+		log.Printf("DEBUG: Successful submission - chain: %s, type: %s, timestamp: %d", chainID, submissionType, now.Unix())
+	} else {
+		log.Printf("DEBUG: Failed submission attempt - chain: %s, type: %s, timestamp: %d", chainID, submissionType, now.Unix())
+	}


The use of log.Printf for debugging introduces an inconsistency with the project's logging approach, which appears to favor the structured logger zerolog (as seen in pkg/exporters/verifier/verifier.go). Using the standard log package can be problematic as it lacks levels and structured output.

For better consistency and maintainability, please consider one of the following:

Replace these log.Printf calls with a proper zerolog logger instance. You might need to add a logger field to the Metrics struct or pass it into the methods.

If these logs are for temporary debugging purposes, they should be removed before this PR is merged.

This comment applies to all new log.Printf calls added in this file.

gemini-code-assist · 2025-12-22T15:46:05Z

pkg/metrics/metrics_test.go

+func TestMetrics_RecordSubmissionAttempt(t *testing.T) {
+	reg := prometheus.NewRegistry()
+	m := NewWithRegistry("test", reg)
+
+	// Test successful submission
+	beforeAttempt := time.Now()
+	m.RecordSubmissionAttempt("testchain", "header", true)
+	afterAttempt := time.Now()
+
+	// Verify counters
+	attempts := getMetricValue(t, reg, "test_submission_attempts_total", map[string]string{
+		"chain_id": "testchain",
+		"type":     "header",
+	})
+	require.Equal(t, float64(1), attempts, "should have 1 attempt")
+
+	// For successful submission, failures should be 0 (metric may not be exported if 0)
+	// We'll check if the metric exists and has value 0, or doesn't exist (both are valid)
+	failuresMetricFound := false
+	failures := float64(0)
+	metrics, err := reg.Gather()
+	require.NoError(t, err)
+	for _, mf := range metrics {
+		if mf.GetName() == "test_submission_failures_total" {
+			for _, m := range mf.GetMetric() {
+				// check if labels match
+				match := true
+				for _, label := range m.GetLabel() {
+					if expectedVal, ok := map[string]string{"chain_id": "testchain", "type": "header"}[label.GetName()]; ok {
+						if label.GetValue() != expectedVal {
+							match = false
+							break
+						}
+					}
+				}
+				if match && len(m.GetLabel()) == 2 {
+					failuresMetricFound = true
+					if m.GetCounter() != nil {
+						failures = m.GetCounter().GetValue()
+					}
+					break
+				}
+			}
+		}
+	}
+	if failuresMetricFound {
+		require.Equal(t, float64(0), failures, "should have 0 failures")
+	} // else: metric not exported because value is 0, which is expected behavior
+
+	// Verify timestamps are within expected range
+	lastAttemptTime := getMetricValue(t, reg, "test_last_submission_attempt_time", map[string]string{
+		"chain_id": "testchain",
+		"type":     "header",
+	})
+	require.GreaterOrEqual(t, lastAttemptTime, float64(beforeAttempt.Unix()))
+	require.LessOrEqual(t, lastAttemptTime, float64(afterAttempt.Unix()))
+
+	lastSuccessTime := getMetricValue(t, reg, "test_last_successful_submission_time", map[string]string{
+		"chain_id": "testchain",
+		"type":     "header",
+	})
+	require.GreaterOrEqual(t, lastSuccessTime, float64(beforeAttempt.Unix()))
+	require.LessOrEqual(t, lastSuccessTime, float64(afterAttempt.Unix()))
+
+	// Test failed submission
+	beforeFailure := time.Now()
+	m.RecordSubmissionAttempt("testchain", "data", false)
+	afterFailure := time.Now()
+
+	// Verify counters
+	attempts = getMetricValue(t, reg, "test_submission_attempts_total", map[string]string{
+		"chain_id": "testchain",
+		"type":     "data",
+	})
+	require.Equal(t, float64(1), attempts, "should have 1 attempt for data")
+
+	failures = getMetricValue(t, reg, "test_submission_failures_total", map[string]string{
+		"chain_id": "testchain",
+		"type":     "data",
+	})
+	require.Equal(t, float64(1), failures, "should have 1 failure for data")
+
+	// Verify timestamps - should have attempt time but not success time for failed submission
+	lastAttemptTime = getMetricValue(t, reg, "test_last_submission_attempt_time", map[string]string{
+		"chain_id": "testchain",
+		"type":     "data",
+	})
+	require.GreaterOrEqual(t, lastAttemptTime, float64(beforeFailure.Unix()))
+	require.LessOrEqual(t, lastAttemptTime, float64(afterFailure.Unix()))
+
+	// Last successful submission time should still be 0 for data type (never succeeded)
+	// Gauge metrics with 0 values may not be exported, so we need to check if it exists
+	var lastSuccessTimeData float64
+	var successMetricFoundData bool
+	metrics, err = reg.Gather()
+	require.NoError(t, err)
+	for _, mf := range metrics {
+		if mf.GetName() == "test_last_successful_submission_time" {
+			for _, m := range mf.GetMetric() {
+				// check if labels match
+				match := true
+				for _, label := range m.GetLabel() {
+					if expectedVal, ok := map[string]string{"chain_id": "testchain", "type": "data"}[label.GetName()]; ok {
+						if label.GetValue() != expectedVal {
+							match = false
+							break
+						}
+					}
+				}
+					if match && len(m.GetLabel()) == 2 {
+					successMetricFoundData = true
+					if m.GetGauge() != nil {
+						lastSuccessTimeData = m.GetGauge().GetValue()
+					}
+					break
+				}
+			}
+		}
+	}
+	if successMetricFoundData {
+		require.Equal(t, float64(0), lastSuccessTimeData, "should have no successful submission time for data")
+	} // else: metric not exported because value is 0, which is expected behavior
+}


This test function is quite long and contains significant code duplication, especially for verifying metrics that might not be present (e.g., when their value is 0). This makes the test harder to read and maintain.

I suggest refactoring it to improve clarity and reusability:

Use Sub-tests: Split the logic for successful and failed submissions into separate sub-tests using t.Run(). This will improve test organization and make test failures easier to diagnose.

Create a Helper for Optional Metrics: The logic to check for metrics that might be zero (and thus not exported by Prometheus) is repeated. You could extract this into a helper function, for example getMetricValueOptional, that returns (float64, bool) to indicate if the metric was found. This would greatly simplify the test body.

Here's a sketch of how the refactored test could look:

func TestMetrics_RecordSubmissionAttempt(t *testing.T) { reg := prometheus.NewRegistry() m := NewWithRegistry("test", reg) t.Run("successful submission", func(t *testing.T) { beforeAttempt := time.Now() m.RecordSubmissionAttempt("testchain", "header", true) afterAttempt := time.Now() attempts := getMetricValue(t, reg, "test_submission_attempts_total", map[string]string{"chain_id": "testchain", "type": "header"}) require.Equal(t, float64(1), attempts) // Using a new helper for optional metrics failures, found := getMetricValueOptional(t, reg, "test_submission_failures_total", map[string]string{"chain_id": "testchain", "type": "header"}) if found { require.Equal(t, float64(0), failures) } // ... timestamp checks }) t.Run("failed submission", func(t *testing.T) { // ... similar logic for failed case }) }

…v control (#29)

gemini-code-assist bot reviewed Dec 22, 2025

View reviewed changes

wip

5ecb0d0

auricom force-pushed the claude/submission_fix branch from 7527aa3 to 5ecb0d0 Compare December 22, 2025 16:17

auricom changed the title ~~wip~~ feat(metrics): add comprehensive submission debugging metrics with env control Dec 22, 2025

readme

341f245

auricom merged commit d9c4ae2 into main Dec 22, 2025
3 checks passed

auricom added a commit that referenced this pull request Dec 22, 2025

feat(metrics): add comprehensive submission debugging metrics with en…

32db946

…v control (#29)

auricom added a commit that referenced this pull request Dec 22, 2025

feat(metrics): add comprehensive submission debugging metrics with en…

0129d4e

…v control (#29)

auricom mentioned this pull request Dec 23, 2025

[BUG] ev_metrics_submission_* not displaying #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(metrics): add comprehensive submission debugging metrics with env control #29

feat(metrics): add comprehensive submission debugging metrics with env control #29

Uh oh!

auricom commented Dec 22, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

gemini-code-assist bot Dec 22, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(metrics): add comprehensive submission debugging metrics with env control #29

feat(metrics): add comprehensive submission debugging metrics with env control #29

Uh oh!

Conversation

auricom commented Dec 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes Made

New Metrics

Environment Variable Control

Enhanced Debug Logging

Code Changes

Benefits

Usage

Example Metrics

Debug Log Examples (when EVMETRICS_DEBUG=true)

Testing

Breaking Changes

Uh oh!

gemini-code-assist bot commented Dec 22, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

auricom commented Dec 22, 2025 •

edited

Loading