Skip to content

Conversation

@wk989898
Copy link
Collaborator

@wk989898 wk989898 commented Dec 24, 2025

What problem does this PR solve?

Issue Number: close #3294 close #4184

What is changed and how it works?

  • Batch DML for Virtual Columns: Batch Data Manipulation Language (DML) operations are now supported for tables that include virtual columns, removing a previous limitation.
  • Updated Batch SQL Generation Logic: The shouldGenBatchSQL function has been modified to no longer consider the presence of virtual columns as a disqualifier for generating batch DML, streamlining the process.
  • Refined Generated Column Handling: Generated columns (including virtual ones) are now explicitly skipped when preparing SQL arguments and constructing WHERE clauses, ensuring correct DML execution without interference from computed values.
  • Simplified Batching Logic: The SameTypeTargetAndColumns function has been removed, which may indicate a simplification or refactoring of the logic used to determine if multiple row changes can be merged into a multi-value DML.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Summary by CodeRabbit

  • Bug Fixes

    • Prevented writing nil-derived data during partition indexing.
    • Excluded virtual/generated columns from WHERE and index-key construction to avoid incorrect DMLs.
    • Batch DML generation now skips multi-row batches when handle-key columns are virtual.
    • Stricter handling of unexpected nil column values (now fails loudly via panic).
  • New Features

    • Improved handling for unique indexes involving stored and generated columns.
  • Refactor

    • Removed obsolete multi-row mergability logic.
  • Tests

    • Refactored and expanded tests to cover stored and virtual generated column scenarios.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note Denotes a PR that will be considered when it comes time to generate release notes. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 24, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @wk989898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces crucial enhancements to support batch DML operations on tables containing virtual columns. The changes primarily involve adjusting the internal logic that determines when batch DML can be utilized, specifically by removing the restriction imposed by virtual columns. Furthermore, the handling of generated columns during the construction of SQL statements has been refined to correctly exclude them, ensuring compatibility and preventing errors, thereby enabling more efficient data synchronization.

Highlights

  • Batch DML for Virtual Columns: Batch Data Manipulation Language (DML) operations are now supported for tables that include virtual columns, removing a previous limitation.
  • Updated Batch SQL Generation Logic: The shouldGenBatchSQL function has been modified to no longer consider the presence of virtual columns as a disqualifier for generating batch DML, streamlining the process.
  • Refined Generated Column Handling: Generated columns (including virtual ones) are now explicitly skipped when preparing SQL arguments and constructing WHERE clauses, ensuring correct DML execution without interference from computed values.
  • Simplified Batching Logic: The SameTypeTargetAndColumns function has been removed, which may indicate a simplification or refactoring of the logic used to determine if multiple row changes can be merged into a multi-value DML.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@wk989898
Copy link
Collaborator Author

/test pull-cdc-mysql-integration-heavy

@wk989898
Copy link
Collaborator Author

/test pull-cdc-mysql-integration-light

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to add support for batch DML with virtual columns. The main changes involve removing the restriction on batching for tables with virtual columns, and ensuring that generated columns are skipped when preparing DML arguments. My review has found a couple of critical issues in pkg/sink/sqlmodel/row_change.go that could lead to panics. There's an inconsistency in how generated columns are handled between ColumnCount and getArgsWithGeneratedColumn, and a bug in whereColumnsAndValues where column names are filtered but their corresponding values are not. I've also pointed out an outdated comment and a misleading test case name that should be addressed for clarity. Overall, the direction is good, but the implementation needs to be more careful about consistency and correctness when handling generated columns.

c := 0
for _, col := range r.sourceTableInfo.GetColumns() {
if !col.Hidden {
if !col.Hidden && !col.IsVirtualGenerated() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The method IsVirtualGenerated does not seem to exist on *model.ColumnInfo. This will likely cause a compilation error.

Also, there's a potential inconsistency. This function uses IsVirtualGenerated (assuming it checks for virtual generated columns), but getArgsWithGeneratedColumn in pkg/sink/mysql/sql_builder.go uses IsGenerated(), which checks for both virtual and stored generated columns. This can lead to a mismatch between the number of columns counted here and the number of arguments prepared, causing a panic in NewRowChange if the table has stored generated columns.

To handle all generated columns consistently, you should probably use IsGenerated() here.

Suggested change
if !col.Hidden && !col.IsVirtualGenerated() {
if !col.Hidden && !col.IsGenerated() {

Comment on lines 234 to 237
if !column.IsGenerated() {
columnNames = append(columnNames, column.Name.O)
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There seems to be a bug here. You are filtering out generated columns when building columnNames, but the corresponding values in the values slice are not filtered. If columns contains any generated columns (for example, if a unique index used for the WHERE clause contains a generated column), len(columnNames) will be different from len(values). This will trigger the panic you've added on line 248. You should filter values in sync with columnNames.

@wk989898
Copy link
Collaborator Author

/test all

@wk989898
Copy link
Collaborator Author

/retest

@wk989898 wk989898 changed the title [WIP]: support batch dml with virtual column sink: support batch dml with virtual column Dec 29, 2025
@wk989898 wk989898 marked this pull request as ready for review December 29, 2025 07:10
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 29, 2025
@@ -230,8 +230,12 @@ func (r *RowChange) whereColumnsAndValues() ([]string, []interface{}) {
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems only for Update statment, how about insert statement?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The insert statement has filtered the generated column by generatedColumnsNameSet

@hongyunyan
Copy link
Collaborator

Do we have enough tests with virtual column scenarios? Do we need to add some related integration tests to ensure correctness?

Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898
Copy link
Collaborator Author

Do we have enough tests with virtual column scenarios? Do we need to add some related integration tests to ensure correctness?

There exists a generated column test. https://github.com/pingcap/ticdc/blob/master/tests/integration_tests/generate_column/data/virtual.sql

Signed-off-by: wk989898 <nhsmwk@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 21, 2026
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the comment here

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Feb 10, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hongyunyan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 10, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-02-10 09:09:14.411895658 +0000 UTC m=+262370.106035508: ☑️ agreed by hongyunyan.

@ti-chi-bot ti-chi-bot bot added the approved label Feb 10, 2026
Signed-off-by: wk989898 <nhsmwk@gmail.com>
@coderabbitai
Copy link

coderabbitai bot commented Feb 10, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Per-table batch-SQL gating now uses TableInfo and inspects handle-key columns; virtual/generated columns are excluded from WHERE/index construction. Nil handling for column values was tightened. Tests and integration fixtures for generated columns were updated.

Changes

Cohort / File(s) Summary
Batch SQL Generation
pkg/sink/mysql/mysql_writer_dml.go, pkg/sink/mysql/mysql_writer_dml_test.go
shouldGenBatchSQL signature changed to accept tableInfo *common.TableInfo; gating now inspects per-table handle-key columns (detects all-virtual handle-keys) and chooses batch vs normal SQL paths. Tests refactored to build TableInfo fixtures and isolate writer instances.
WHERE / Index Construction
pkg/sink/sqlmodel/row_change.go, pkg/sink/sqlmodel/utils.go, pkg/sink/sqlmodel/multi_row.go, pkg/sink/sqlmodel/where_handle.go, pkg/sink/mysql/sql_builder.go
Virtual/generated columns are excluded from WHERE clauses, index column/value lists, and handle-key computations. SameTypeTargetAndColumns removed. allColsNotNull treats virtual columns as not satisfying NOT NULL. Added a runtime consistency check for column/value lengths.
Common utilities & nil handling
pkg/common/column.go, downstreamadapter/sink/eventrouter/partition/index_value.go
ColumnValueString now panics on nil rather than returning "null". Partition index code adds a nil-check before extracting/writing column values to avoid writing nil-derived hashed values.
Integration test fixtures & scripts
tests/integration_tests/generate_column/data/virtual.sql, tests/integration_tests/generate_column/data/stored.sql, tests/integration_tests/generate_column/run.sh
Replaced/expanded test tables to cover multiple virtual/stored generated-column variants and unique indexes; updated run script to reference new table names and downstream checks.

Sequence Diagram(s)

sequenceDiagram
  participant Writer as Writer (w)
  participant TableInfo as TableInfo
  participant SQLBuilder as SQLBuilder
  participant DB as Database

  Writer->>TableInfo: Inspect columns & flags (isVirtual, isNotNull, has PK/UK)
  TableInfo-->>Writer: Return column metadata
  alt Not eligible for batch (no PK/UK, all handle-keys virtual, single-event, or safe-mode mismatch)
    Writer->>SQLBuilder: generateNormalSQLs(events)
  else Eligible for batch
    Writer->>SQLBuilder: generateBatchSQL(events)
  end
  SQLBuilder->>DB: Execute SQL(s)
  DB-->>Writer: Execution result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

lgtm

Suggested reviewers

  • hongyunyan
  • lidezhu
  • 3AceShowHand

Poem

🐰 I hopped through schemas, checked each column's state,
Skipped the sly generated bits before it grew late,
Batches now gather when the keys play fair,
Tests hum in order, tidy as my lair,
A carrot for CI — crunchy, brisk, and great.

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: enabling batch DML support for tables with virtual columns, which is the primary objective across both linked issues.
Description check ✅ Passed The description includes the required issue reference (close #3294 close #4184), explains what is changed and how it works with four key changes, and references both unit and integration tests.
Linked Issues check ✅ Passed The changes comprehensively address both linked issues: #3294 (enabling batch SQL for virtual columns via shouldGenBatchSQL refactoring) and #4184 (ensuring virtual columns are excluded from WHERE clauses and SQL arguments to maintain correctness).
Out of Scope Changes check ✅ Passed All changes are directly related to the stated objective of supporting batch DML with virtual columns, including generated column handling, batch SQL logic, tests, and integration test data updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
pkg/sink/mysql/mysql_writer_dml_test.go (2)

34-37: ⚠️ Potential issue | 🔴 Critical

Test struct fields hasPK and hasVirtualCols are stale after the signature change.

hasVirtualCols is declared (line 37) but never used in the test invocations. hasPK is used incorrectly as noted above. Both fields should be replaced with a tableInfo *common.TableInfo field that the test constructs per case.


553-560: ⚠️ Potential issue | 🟡 Minor

newDMLEvent creates events with nil TableInfo, which will cause nil-pointer dereference in the updated shouldGenBatchSQL.

The production code at line 139 accesses tableInfo.HasPKOrNotNullUK and at line 147 calls tableInfo.GetOrderedHandleKeyColumnIDs(). Since newDMLEvent doesn't set TableInfo, passing these events to shouldGenBatchSQL (even with a correct *common.TableInfo first arg) would work — but the mock function should be updated if TestShouldGenBatchSQL is fixed to pass *common.TableInfo.

🤖 Fix all issues with AI agents
In `@pkg/sink/mysql/mysql_writer_dml_test.go`:
- Around line 108-113: The test is passing a bool to shouldGenBatchSQL but the
function expects a *common.TableInfo; update the test table-driven cases to
build a *common.TableInfo for each tt (use the test's tt.hasPK to set
TableInfo.HasPKOrNotNullUK accordingly) and pass that pointer as the first arg
to shouldGenBatchSQL while leaving tt.events as the second arg; ensure the
constructed TableInfo is created inside the t.Run so parallel tests get their
own instance.

In `@pkg/sink/mysql/mysql_writer_dml.go`:
- Around line 146-159: The code sets allVirtualGenerated to true and checks
columns from tableInfo.GetOrderedHandleKeyColumnIDs(), but if that call returns
an empty slice the variable remains true and wrongly disables batch SQL; modify
the logic in mysql_writer_dml.go to explicitly handle an empty colIDs result:
after calling GetOrderedHandleKeyColumnIDs(), if len(colIDs) == 0 then treat as
"not all virtual-generated" (i.e., do not flip allVirtualGenerated to true) or
return early based on the intended semantics; update the block around
allVirtualGenerated, colIDs, and the loop over tableInfo.GetColumnInfo(colID) so
empty slices don't cause batch SQL to be disabled.
- Around line 139-160: The loop over tableInfo.GetOrderedHandleKeyColumnIDs()
currently silently continues when tableInfo.GetColumnInfo(colID) returns !exist,
which can mask bugs; change the behavior so that when a handle-key column is
missing you immediately return false (and optionally emit a warning including
the missing colID and table identifier) instead of continuing. Update the loop
in the block that sets allVirtualGenerated to check existence: if !exist ->
return false (or log + return false), otherwise proceed to check
info.IsVirtualGenerated(); keep the existing allVirtualGenerated logic unchanged
for present columns.

In `@pkg/sink/mysql/sql_builder.go`:
- Around line 283-291: The fallback loop over tableInfo.GetColumns() (in the
block that builds colNames/args) calls col.IsVirtualGenerated() without ensuring
col is non-nil, causing a potential nil-pointer panic; update the loop that
iterates tableInfo.GetColumns() to skip nil entries (e.g., if col == nil {
continue }) before calling col.IsVirtualGenerated(), and ensure
ExtractColVal(row, col, i) and col.Name.O are only used after that nil check so
col is always valid when referenced.
🧹 Nitpick comments (3)
pkg/sink/sqlmodel/row_change.go (1)

233-254: Double-filtering of virtual columns when uniqueIndex != nil.

When uniqueIndex is not nil, getColsAndValuesOfIdx (in utils.go) already filters out virtual generated columns. The subsequent filter at lines 235-238 is therefore redundant for that code path — it only has effect when uniqueIndex == nil and the full column set is used. This works correctly but is worth noting for clarity.

The consistency check at lines 250-253 is always true by construction since both slices are built in lockstep, but it's a reasonable defensive guard.

pkg/sink/mysql/mysql_writer_dml.go (2)

128-134: Doc comment is missing condition 3 (all-virtual-generated handle key).

The numbered list in the doc comment now only mentions conditions 1–4 but doesn't describe the new "all handle-key columns are virtual generated → no batch" rule. This is the key behavioral change of the PR and should be documented.


141-141: Unnecessary else after early return.

The else block after return false on line 140 is redundant. The code inside can be de-nested for readability.

Suggested improvement
 	if !tableInfo.HasPKOrNotNullUK {
 		return false
-	} else {
-		// if the table has pk or uk,
-		// but all the columns in the pk or uk are virtual generated columns,
-		// we also don't generate batch sql,
-		// because the virtual generated column can't be used to detect the row identity.
-		allVirtualGenerated := true
-		colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
-		for _, colID := range colIDs {
-			info, exist := tableInfo.GetColumnInfo(colID)
-			if !exist {
-				continue
-			}
-			if !info.IsVirtualGenerated() {
-				allVirtualGenerated = false
-			}
-		}
-		if allVirtualGenerated {
-			return false
-		}
 	}
+
+	// if the table has pk or uk,
+	// but all the columns in the pk or uk are virtual generated columns,
+	// we also don't generate batch sql,
+	// because the virtual generated column can't be used to detect the row identity.
+	allVirtualGenerated := true
+	colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
+	for _, colID := range colIDs {
+		info, exist := tableInfo.GetColumnInfo(colID)
+		if !exist {
+			continue
+		}
+		if !info.IsVirtualGenerated() {
+			allVirtualGenerated = false
+			break
+		}
+	}
+	if allVirtualGenerated {
+		return false
+	}

Note: also added break after setting allVirtualGenerated = false — once we find a non-virtual column, there's no need to continue iterating.

Comment on lines 139 to 160
if !tableInfo.HasPKOrNotNullUK {
return false
} else {
// if the table has pk or uk,
// but all the columns in the pk or uk are virtual generated columns,
// we also don't generate batch sql,
// because the virtual generated column can't be used to detect the row identity.
allVirtualGenerated := true
colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
for _, colID := range colIDs {
info, exist := tableInfo.GetColumnInfo(colID)
if !exist {
continue
}
if !info.IsVirtualGenerated() {
allVirtualGenerated = false
}
}
if allVirtualGenerated {
return false
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Silent continue on missing column info may mask bugs or produce incorrect results.

On Line 151, when GetColumnInfo(colID) returns !exist, the loop continues without updating allVirtualGenerated. This treats a missing column the same as a virtual generated column — if all handle-key columns are missing from column info, batch SQL would be incorrectly disabled. More importantly, a missing handle-key column likely indicates a bug elsewhere, and silently skipping it could mask the root cause.

Consider logging a warning or returning false directly when a handle-key column is not found:

Suggested improvement
 		for _, colID := range colIDs {
 			info, exist := tableInfo.GetColumnInfo(colID)
 			if !exist {
-				continue
+				log.Warn("handle key column not found in table info",
+					zap.Int64("colID", colID))
+				continue
 			}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if !tableInfo.HasPKOrNotNullUK {
return false
} else {
// if the table has pk or uk,
// but all the columns in the pk or uk are virtual generated columns,
// we also don't generate batch sql,
// because the virtual generated column can't be used to detect the row identity.
allVirtualGenerated := true
colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
for _, colID := range colIDs {
info, exist := tableInfo.GetColumnInfo(colID)
if !exist {
continue
}
if !info.IsVirtualGenerated() {
allVirtualGenerated = false
}
}
if allVirtualGenerated {
return false
}
}
if !tableInfo.HasPKOrNotNullUK {
return false
} else {
// if the table has pk or uk,
// but all the columns in the pk or uk are virtual generated columns,
// we also don't generate batch sql,
// because the virtual generated column can't be used to detect the row identity.
allVirtualGenerated := true
colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
for _, colID := range colIDs {
info, exist := tableInfo.GetColumnInfo(colID)
if !exist {
log.Warn("handle key column not found in table info",
zap.Int64("colID", colID))
continue
}
if !info.IsVirtualGenerated() {
allVirtualGenerated = false
}
}
if allVirtualGenerated {
return false
}
}
🤖 Prompt for AI Agents
In `@pkg/sink/mysql/mysql_writer_dml.go` around lines 139 - 160, The loop over
tableInfo.GetOrderedHandleKeyColumnIDs() currently silently continues when
tableInfo.GetColumnInfo(colID) returns !exist, which can mask bugs; change the
behavior so that when a handle-key column is missing you immediately return
false (and optionally emit a warning including the missing colID and table
identifier) instead of continuing. Update the loop in the block that sets
allVirtualGenerated to check existence: if !exist -> return false (or log +
return false), otherwise proceed to check info.IsVirtualGenerated(); keep the
existing allVirtualGenerated logic unchanged for present columns.

Comment on lines 146 to 159
allVirtualGenerated := true
colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
for _, colID := range colIDs {
info, exist := tableInfo.GetColumnInfo(colID)
if !exist {
continue
}
if !info.IsVirtualGenerated() {
allVirtualGenerated = false
}
}
if allVirtualGenerated {
return false
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Edge case: empty colIDs silently disables batch SQL.

If GetOrderedHandleKeyColumnIDs() returns an empty slice (even though HasPKOrNotNullUK is true), allVirtualGenerated stays true and batch SQL is incorrectly disabled. An explicit guard would make the intent clearer:

Suggested improvement
 		allVirtualGenerated := true
 		colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
+		if len(colIDs) == 0 {
+			return false
+		}
 		for _, colID := range colIDs {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
allVirtualGenerated := true
colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
for _, colID := range colIDs {
info, exist := tableInfo.GetColumnInfo(colID)
if !exist {
continue
}
if !info.IsVirtualGenerated() {
allVirtualGenerated = false
}
}
if allVirtualGenerated {
return false
}
allVirtualGenerated := true
colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
if len(colIDs) == 0 {
return false
}
for _, colID := range colIDs {
info, exist := tableInfo.GetColumnInfo(colID)
if !exist {
continue
}
if !info.IsVirtualGenerated() {
allVirtualGenerated = false
}
}
if allVirtualGenerated {
return false
}
🤖 Prompt for AI Agents
In `@pkg/sink/mysql/mysql_writer_dml.go` around lines 146 - 159, The code sets
allVirtualGenerated to true and checks columns from
tableInfo.GetOrderedHandleKeyColumnIDs(), but if that call returns an empty
slice the variable remains true and wrongly disables batch SQL; modify the logic
in mysql_writer_dml.go to explicitly handle an empty colIDs result: after
calling GetOrderedHandleKeyColumnIDs(), if len(colIDs) == 0 then treat as "not
all virtual-generated" (i.e., do not flip allVirtualGenerated to true) or return
early based on the intended semantics; update the block around
allVirtualGenerated, colIDs, and the loop over tableInfo.GetColumnInfo(colID) so
empty slices don't cause batch SQL to be disabled.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/sink/mysql/mysql_writer_dml.go (1)

115-120: ⚠️ Potential issue | 🔴 Critical

Build failure: genActiveActiveSQL still calls shouldGenBatchSQL with the old 3-argument signature.

The pipeline confirms this: too many arguments in call to w.shouldGenBatchSQL. Line 116 must be updated to match the new signature.

🐛 Proposed fix
 func (w *Writer) genActiveActiveSQL(tableInfo *common.TableInfo, eventsInGroup []*commonEvent.DMLEvent) ([]string, [][]interface{}) {
-	if !w.shouldGenBatchSQL(tableInfo.HasPKOrNotNullUK, tableInfo.HasVirtualColumns(), eventsInGroup) {
+	if !w.shouldGenBatchSQL(tableInfo, eventsInGroup) {
 		return w.generateActiveActiveNormalSQLs(eventsInGroup)
 	}
 	return w.generateActiveActiveBatchSQL(eventsInGroup)
 }
🧹 Nitpick comments (1)
pkg/sink/mysql/mysql_writer_dml.go (1)

133-154: Unnecessary else after early return — simplify control flow.

The else block on line 135 is redundant since the if branch on line 133 returns early. Removing the else reduces nesting and improves readability.

♻️ Suggested simplification
 	if !tableInfo.HasPKOrNotNullUK {
 		return false
-	} else {
-		// if the table has pk or uk,
-		// but all the columns in the pk or uk are virtual generated columns,
-		// we also don't generate batch sql,
-		// because the virtual generated column can't be used to detect the row identity.
-		allVirtualGenerated := true
-		colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
-		for _, colID := range colIDs {
-			info, exist := tableInfo.GetColumnInfo(colID)
-			if !exist {
-				continue
-			}
-			if !info.IsVirtualGenerated() {
-				allVirtualGenerated = false
-			}
-		}
-		if allVirtualGenerated {
-			return false
+	}
+	// if all handle-key columns are virtual generated,
+	// we can't use them for row identity in batch SQL.
+	allVirtualGenerated := true
+	colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
+	for _, colID := range colIDs {
+		info, exist := tableInfo.GetColumnInfo(colID)
+		if !exist {
+			continue
+		}
+		if !info.IsVirtualGenerated() {
+			allVirtualGenerated = false
+			break
 		}
 	}
+	if allVirtualGenerated {
+		return false
+	}

Note: also added break once a non-virtual column is found — no need to continue iterating.

Signed-off-by: wk989898 <nhsmwk@gmail.com>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 10, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/sink/mysql/mysql_writer_dml.go (1)

60-69: ⚠️ Potential issue | 🟡 Minor

Stale comment: still references "have virtual column" as a batching disqualifier.

Line 66 says "if the table doesn't have a handle key or have virtual column, we just generate the sqls for each event row". After this PR, virtual columns no longer unconditionally prevent batch SQL. The comment should be updated to reflect the new behavior (only all-virtual-generated handle keys disable batch).

📝 Suggested comment update
-//     if the table does't have a handle key or have virtual column, we just generate the sqls for each event row.
+//     if the table doesn't have a handle key, or all handle key columns are virtual generated, we just generate the sqls for each event row.
🤖 Fix all issues with AI agents
In `@pkg/sink/mysql/mysql_writer_dml_test.go`:
- Around line 72-163: Add unit tests for the virtual generated-column logic in
shouldGenBatchSQL: extend or add variants of buildTableInfo to create columns
with GeneratedExprString set and GeneratedStored=false (virtual generated
columns), then add two test cases in mysql_writer_dml_test.go that exercise
shouldGenBatchSQL — (1) table with a non-virtual PK plus a virtual generated
column expecting want=true, and (2) table whose handle-key columns are all
virtual generated expecting want=false; ensure tests use the existing
newDMLEvent and adjustWriter patterns so the tests hit the code path in
mysql_writer_dml.go that checks GeneratedExprString/GeneratedStored.
🧹 Nitpick comments (2)
pkg/sink/mysql/mysql_writer_dml.go (1)

133-154: Unnecessary else after early return; also has stale comment on Line 155.

The else block after return false on Line 134 adds unnecessary nesting. The logic can be flattened. Also, Line 155 contains a leftover comment (// if tableInfo.HasVirtualColumns()) that appears to be a remnant of the old implementation — it should be removed.

♻️ Flatten and clean up
 	if !tableInfo.HasPKOrNotNullUK {
 		return false
-	} else {
-		// if the table has pk or uk,
-		// but all the columns in the pk or uk are virtual generated columns,
-		// we also don't generate batch sql,
-		// because the virtual generated column can't be used to detect the row identity.
-		allVirtualGenerated := true
-		colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
-		for _, colID := range colIDs {
-			info, exist := tableInfo.GetColumnInfo(colID)
-			if !exist {
-				continue
-			}
-			if !info.IsVirtualGenerated() {
-				allVirtualGenerated = false
-			}
-		}
-		if allVirtualGenerated {
-			return false
+	}
+
+	// If all handle-key columns are virtual generated,
+	// they cannot identify a row, so skip batch SQL.
+	allVirtualGenerated := true
+	colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
+	for _, colID := range colIDs {
+		info, exist := tableInfo.GetColumnInfo(colID)
+		if !exist {
+			continue
+		}
+		if !info.IsVirtualGenerated() {
+			allVirtualGenerated = false
+			break
 		}
 	}
-	// if tableInfo.HasVirtualColumns()
+	if allVirtualGenerated {
+		return false
+	}
+
 	if len(events) == 1 && events[0].Len() == 1 {

Note: also added an early break once a non-virtual column is found, since there's no need to keep iterating.

pkg/sink/mysql/mysql_writer_dml_test.go (1)

36-67: buildTableInfo is a useful helper, but could be extended to cover virtual column scenarios.

The helper currently only builds tables with a plain id (int) and name (varchar). For testing the virtual-generated-column gating logic, consider adding a parameter (or a sibling helper) that can create a column with GeneratedExprString and GeneratedStored = false to represent a virtual generated column.

Signed-off-by: wk989898 <nhsmwk@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pkg/sink/mysql/mysql_writer_dml_test.go`:
- Around line 226-235: Add a new unit test variant that mirrors the existing
"handle key contains virtual generated column should not use batch SQL" case but
uses a table schema whose primary key or not-null unique key is composed
entirely of virtual generated columns (so HasPKOrNotNullUK returns true and
allVirtualGenerated is exercised); create a new tableInfo (e.g.,
virtualOnlyHandleKeyTableInfo) with only virtual generated columns in the
PK/unique key, reuse newDMLEvent for events and the same adjustWriter to set
SafeMode=false and BatchDMLEnable=true, and assert want:false so the code path
in Writer that checks allVirtualGenerated (the branch that returns false for
batch SQL) is covered.
- Around line 72-133: The test currently misasserts shouldGenBatchSQL for
virtualHandleKeyTableInfo: update the existing expectation to want:true (the
current table has a NOT NULL normal column 'a' so batch SQL should be allowed),
and add a new test case that uses a builder (e.g., extend or add
buildTableInfoWithAllVirtualHandleKey) that makes all handle-index columns (both
'a' and 'b') virtual/generated (set GeneratedExprString and GeneratedStored
appropriately) and assert want:false to exercise the allVirtualGenerated code
path in shouldGenBatchSQL.
🧹 Nitpick comments (2)
pkg/sink/mysql/mysql_writer_dml.go (2)

136-155: Flatten the unnecessary else after early return.

Since !tableInfo.HasPKOrNotNullUK already returns, the else block at Line 136 can be un-indented. This improves readability and aligns with idiomatic Go style (avoid else after return).

Suggested refactor
 	if !tableInfo.HasPKOrNotNullUK {
 		return false
-	} else {
-		// if the table has pk or uk,
-		// but all the columns in the pk or uk are virtual generated columns,
-		// we also don't generate batch sql,
-		// because the virtual generated column can't be used to detect the row identity.
-		allVirtualGenerated := true
-		colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
-		for _, colID := range colIDs {
-			info, exist := tableInfo.GetColumnInfo(colID)
-			if !exist {
-				continue
-			}
-			if !info.IsVirtualGenerated() {
-				allVirtualGenerated = false
-			}
-		}
-		if allVirtualGenerated {
-			return false
-		}
 	}
+
+	// If all handle-key columns are virtual generated,
+	// batch SQL cannot be used because virtual columns can't identify rows.
+	allVirtualGenerated := true
+	colIDs := tableInfo.GetOrderedHandleKeyColumnIDs()
+	for _, colID := range colIDs {
+		info, exist := tableInfo.GetColumnInfo(colID)
+		if !exist {
+			continue
+		}
+		if !info.IsVirtualGenerated() {
+			allVirtualGenerated = false
+			break
+		}
+	}
+	if allVirtualGenerated {
+		return false
+	}

Also added break on Line 149 once a non-virtual column is found — no need to keep iterating.


156-156: Remove stale comment.

// if tableInfo.HasVirtualColumns() appears to be a leftover from removed code. It doesn't serve as documentation and could confuse future readers. Remove it.

Comment on lines 72 to 133
buildTableInfoWithVirtualGeneratedHandleKey := func(tableName string) *common.TableInfo {
aFieldType := types.NewFieldType(tidbmysql.TypeLong)
aFieldType.AddFlag(tidbmysql.NotNullFlag)
bFieldType := types.NewFieldType(tidbmysql.TypeLong)
bFieldType.AddFlag(tidbmysql.NotNullFlag)
cFieldType := types.NewFieldType(tidbmysql.TypeLong)
cFieldType.AddFlag(tidbmysql.NotNullFlag)

ti := &tidbmodel.TableInfo{
ID: 101,
Name: ast.NewCIStr(tableName),
Columns: []*tidbmodel.ColumnInfo{
{
ID: 1,
Name: ast.NewCIStr("a"),
Offset: 0,
FieldType: *aFieldType,
State: tidbmodel.StatePublic,
},
{
ID: 2,
Name: ast.NewCIStr("b"),
Offset: 1,
FieldType: *bFieldType,
State: tidbmodel.StatePublic,
GeneratedExprString: "a + c",
GeneratedStored: false,
},
{
ID: 3,
Name: ast.NewCIStr("c"),
Offset: 2,
FieldType: *cFieldType,
State: tidbmodel.StatePublic,
},
},
Indices: []*tidbmodel.IndexInfo{
{
ID: 1,
Name: ast.NewCIStr("idx1"),
Unique: true,
State: tidbmodel.StatePublic,
Columns: []*tidbmodel.IndexColumn{
{
Name: ast.NewCIStr("a"),
Offset: 0,
Length: types.UnspecifiedLength,
},
{
Name: ast.NewCIStr("b"),
Offset: 1,
Length: types.UnspecifiedLength,
},
},
},
},
}
info := common.NewTableInfo4Decoder("test", ti)
require.NotNil(t, info)
return info
}
virtualHandleKeyTableInfo := buildTableInfoWithVirtualGeneratedHandleKey("t_virtual_uk")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

echo "=== Check HasPKOrNotNullUK computation in common/table_info.go ==="
rg -n "HasPKOrNotNullUK\s*=" pkg/common/table_info.go -B 5 -A 5

echo ""
echo "=== Check GetOrderedHandleKeyColumnIDs implementation ==="
rg -n "func.*GetOrderedHandleKeyColumnIDs" pkg/common/ -A 25

echo ""
echo "=== Check NewTableInfo4Decoder for virtual column handling ==="
rg -n "func NewTableInfo4Decoder" pkg/common/ -A 30

Repository: pingcap/ticdc

Length of output: 4332


🏁 Script executed:

#!/bin/bash

echo "=== Search for newColumnSchema4Decoder implementation ==="
rg -n "func newColumnSchema4Decoder" pkg/common/ -A 40

echo ""
echo "=== Search for HandleKeyIDList assignment ==="
rg -n "HandleKeyIDList\s*=" pkg/common/ -B 3 -A 3

echo ""
echo "=== Check mysql_writer_dml.go lines 130-160 for the actual logic ==="
sed -n '130,160p' pkg/sink/mysql/mysql_writer_dml.go

echo ""
echo "=== Check if virtual columns are excluded from HandleKeyIDList ==="
rg -n "IsVirtualGenerated" pkg/common/table_info_helper.go -B 5 -A 5

Repository: pingcap/ticdc

Length of output: 6833


🏁 Script executed:

#!/bin/bash

echo "=== Search for HasPKOrNotNullUK definition and computation ==="
rg -n "HasPKOrNotNullUK" pkg/common/ -B 3 -A 3

echo ""
echo "=== Get full newColumnSchema function ==="
sed -n '489,750p' pkg/common/table_info_helper.go | head -150

Repository: pingcap/ticdc

Length of output: 6952


🏁 Script executed:

#!/bin/bash

echo "=== Get full OriginalHasPKOrNotNullUK function ==="
sed -n '406,450p' pkg/common/table_info.go

echo ""
echo "=== Check if GetPrimaryKey is used and what happens with virtual columns ==="
rg -n "GetPrimaryKey" pkg/common/table_info.go -B 2 -A 5

echo ""
echo "=== Specifically check the implicit PK logic for virtual columns ==="
sed -n '559,620p' pkg/common/table_info_helper.go

Repository: pingcap/ticdc

Length of output: 3957


🏁 Script executed:

#!/bin/bash

echo "=== Get the full shouldGenBatchSQL function ==="
sed -n '126,225p' pkg/sink/mysql/mysql_writer_dml.go

Repository: pingcap/ticdc

Length of output: 2229


🏁 Script executed:

#!/bin/bash

echo "=== Check the test case expectations for virtual column ==="
sed -n '220,245p' pkg/sink/mysql/mysql_writer_dml_test.go

echo ""
echo "=== Check how newDMLEvent creates events and what Len() returns ==="
rg -n "func newDMLEvent" pkg/sink/mysql/ -A 15

echo ""
echo "=== Verify the test parameters for the virtual column case ==="
sed -n '225,240p' pkg/sink/mysql/mysql_writer_dml_test.go

Repository: pingcap/ticdc

Length of output: 2523


🏁 Script executed:

#!/bin/bash

echo "=== Check IsVirtualGenerated implementation ==="
rg -n "func.*IsVirtualGenerated" -A 5

echo ""
echo "=== Verify GetColumnInfo works correctly ==="
rg -n "func.*GetColumnInfo" pkg/common/table_info.go -A 5

Repository: pingcap/ticdc

Length of output: 878


🏁 Script executed:

#!/bin/bash

echo "=== Search for IsVirtualGenerated in all files ==="
rg -n "IsVirtualGenerated" . --type go 2>/dev/null | head -20

echo ""
echo "=== Check model.ColumnInfo structure (from TiDB import) ==="
grep -r "type ColumnInfo" . --include="*.go" | grep -v "TableColumnInfo\|RowColumnInfo" | head -5

echo ""
echo "=== Verify test table structure - check if column b is correctly marked virtual ==="
sed -n '72,105p' pkg/sink/mysql/mysql_writer_dml_test.go

Repository: pingcap/ticdc

Length of output: 3301


Test expectation is incorrect; test case does not exercise the all-virtual-columns code path.

The test at line 234 expects want: false, but the production logic should return true for this table. The test table has a unique index on columns (a, b) where column a is normal (not virtual) and column b is virtual generated. When shouldGenBatchSQL evaluates this:

  1. HasPKOrNotNullUK is true (unique index on NOT NULL columns)
  2. The loop at lines 141–150 sets allVirtualGenerated = false because column a is not virtual
  3. The condition at line 152 is not triggered
  4. With 2 events and both having the same safe mode (CommitTs > ReplicatingTs), the function returns true

The test name suggests it validates behavior when a handle key contains virtual columns, but the code only disables batch SQL when all handle-key columns are virtual. To properly test lines 141–154, add a case with a unique index where all columns are virtual generated.

🤖 Prompt for AI Agents
In `@pkg/sink/mysql/mysql_writer_dml_test.go` around lines 72 - 133, The test
currently misasserts shouldGenBatchSQL for virtualHandleKeyTableInfo: update the
existing expectation to want:true (the current table has a NOT NULL normal
column 'a' so batch SQL should be allowed), and add a new test case that uses a
builder (e.g., extend or add buildTableInfoWithAllVirtualHandleKey) that makes
all handle-index columns (both 'a' and 'b') virtual/generated (set
GeneratedExprString and GeneratedStored appropriately) and assert want:false to
exercise the allVirtualGenerated code path in shouldGenBatchSQL.

Comment on lines 226 to 235
{
name: "batch dml is disabled",
hasPK: true,
hasVirtualCols: false,
events: []*commonEvent.DMLEvent{newDMLEvent(t, 1, 1, 1)},
config: &Config{SafeMode: false, BatchDMLEnable: false},
want: false,
name: "handle key contains virtual generated column should not use batch SQL",
tableInfo: virtualHandleKeyTableInfo,
events: []*commonEvent.DMLEvent{newDMLEvent(t, 2, 1, 1), newDMLEvent(t, 3, 1, 1)},
adjustWriter: func(w *Writer) {
w.cfg.SafeMode = false
w.cfg.BatchDMLEnable = true
},
want: false,
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Test coverage gap: no case exercises the all-virtual-generated handle-key path.

The virtual column test case here has a mix of virtual and non-virtual columns in the unique index. If HasPKOrNotNullUK ends up false for this table (see comment above), then Lines 141–154 of the production code — the core new logic — are never tested.

Consider adding a test where the table has a primary key or not-null unique key composed entirely of virtual generated columns, so the allVirtualGeneratedreturn false path at Line 152 is exercised.

🤖 Prompt for AI Agents
In `@pkg/sink/mysql/mysql_writer_dml_test.go` around lines 226 - 235, Add a new
unit test variant that mirrors the existing "handle key contains virtual
generated column should not use batch SQL" case but uses a table schema whose
primary key or not-null unique key is composed entirely of virtual generated
columns (so HasPKOrNotNullUK returns true and allVirtualGenerated is exercised);
create a new tableInfo (e.g., virtualOnlyHandleKeyTableInfo) with only virtual
generated columns in the PK/unique key, reuse newDMLEvent for events and the
same adjustWriter to set SafeMode=false and BatchDMLEnable=true, and assert
want:false so the code path in Writer that checks allVirtualGenerated (the
branch that returns false for batch SQL) is covered.

Signed-off-by: wk989898 <nhsmwk@gmail.com>
@wk989898
Copy link
Collaborator Author

/test mysql

Signed-off-by: wk989898 <nhsmwk@gmail.com>
@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 11, 2026

@wk989898: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cdc-mysql-integration-heavy 6e2d832 link true /test pull-cdc-mysql-integration-heavy
pull-error-log-review 692b30a link true /test pull-error-log-review

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

data inconsistency when the virtual column is not null unique index Support Batch in MysqlSink when the table has virtual column

2 participants