Skip to content

Conversation

@JNSimba
Copy link
Member

@JNSimba JNSimba commented Dec 10, 2025

What problem does this PR solve?

Issue Number: close #58896

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 10, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@JNSimba JNSimba changed the title [Proposal] Extend streaming job to support MySQL synchronization [Feature] Extend streaming job to support MySQL synchronization Dec 10, 2025
@JNSimba JNSimba changed the title [Feature] Extend streaming job to support MySQL synchronization [Feature](Streaming Job) Extend streaming job to support MySQL synchronization Dec 10, 2025
@JNSimba
Copy link
Member Author

JNSimba commented Dec 10, 2025

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends streaming jobs to support MySQL synchronization via CDC (Change Data Capture), enabling users to sync data from MySQL databases to Doris in real-time. The implementation includes a new CDC client service and modifications to the streaming job framework.

Key Changes:

  • Introduces a CDC client Spring Boot application that interfaces with MySQL using Flink CDC connectors
  • Adds support for FROM MySQL TO Database syntax in job creation
  • Implements split-based data reading for both snapshot and binlog phases
  • Adds RPC endpoints for BE-FE communication to handle CDC operations

Reviewed changes

Copilot reviewed 85 out of 85 changed files in this pull request and generated no comments.

Show a summary per file
File Description
regression-test/suites/job_p0/streaming_job/cdc/test_streaming_mysql_job.groovy Regression test for MySQL streaming job with CDC
gensrc/proto/internal_service.proto Adds RPC interface for CDC client communication
fs_brokers/cdc_client/** Complete CDC client implementation using Spring Boot
fe/fe-core/.../streaming/** Extends streaming job framework with multi-table task support
fe/fe-core/.../offset/jdbc/** JDBC offset provider for tracking MySQL binlog positions
fe/fe-core/.../util/StreamingJobUtils.java Utility functions for streaming job management
docker/thirdparties/docker-compose/mysql/my.cnf Enables MySQL binlog for CDC

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.82% (1728/2165)
Line Coverage 65.92% (30578/46390)
Region Coverage 66.58% (15247/22900)
Branch Coverage 56.95% (8116/14250)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 1.32% (13/984) 🎉
Increment coverage report
Complete coverage report

@JNSimba
Copy link
Member Author

JNSimba commented Dec 10, 2025

run external

@JNSimba
Copy link
Member Author

JNSimba commented Dec 10, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 35374 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f4e1a9eceb169b2b2201482d22410bc26c1e1db9, data reload: false

------ Round 1 ----------------------------------
q1	17687	4424	4080	4080
q2	2029	361	231	231
q3	10196	1334	761	761
q4	10217	884	313	313
q5	7556	2177	1898	1898
q6	184	175	139	139
q7	1025	891	717	717
q8	9380	1403	1295	1295
q9	7069	5325	5353	5325
q10	6785	2386	1971	1971
q11	540	315	294	294
q12	666	747	571	571
q13	17777	3753	3059	3059
q14	290	298	278	278
q15	580	517	510	510
q16	959	954	870	870
q17	698	865	551	551
q18	7355	7090	6945	6945
q19	907	971	595	595
q20	425	372	242	242
q21	4213	3947	3754	3754
q22	1056	993	975	975
Total cold run time: 107594 ms
Total hot run time: 35374 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4120	4067	4058	4058
q2	326	387	321	321
q3	2177	2680	2309	2309
q4	1334	1774	1313	1313
q5	4254	4756	4767	4756
q6	236	176	131	131
q7	2079	1989	1860	1860
q8	2752	2541	2630	2541
q9	7753	7626	7521	7521
q10	3044	3477	2822	2822
q11	611	511	495	495
q12	634	714	589	589
q13	3706	4022	3454	3454
q14	283	301	282	282
q15	543	513	504	504
q16	905	911	877	877
q17	1250	1484	1446	1446
q18	7951	7565	7505	7505
q19	856	880	868	868
q20	2046	2040	1974	1974
q21	5048	4772	4539	4539
q22	1158	1052	971	971
Total cold run time: 53066 ms
Total hot run time: 51136 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 181298 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f4e1a9eceb169b2b2201482d22410bc26c1e1db9, data reload: false

query5	5498	648	495	495
query6	330	235	229	229
query7	4219	479	286	286
query8	306	253	243	243
query9	8776	2577	2587	2577
query10	558	394	332	332
query11	15472	14761	14583	14583
query12	188	120	118	118
query13	1273	514	414	414
query14	6664	3422	3065	3065
query14_1	2960	2936	2939	2936
query15	208	198	184	184
query16	966	480	488	480
query17	1167	720	602	602
query18	2724	450	361	361
query19	234	238	211	211
query20	120	116	112	112
query21	225	147	119	119
query22	3986	4002	3855	3855
query23	16602	16188	15911	15911
query23_1	16001	16017	16055	16017
query24	7292	1679	1259	1259
query24_1	1270	1261	1244	1244
query25	588	497	442	442
query26	1262	265	166	166
query27	2746	483	322	322
query28	4456	2195	2169	2169
query29	832	576	471	471
query30	322	252	223	223
query31	838	703	606	606
query32	88	71	70	70
query33	548	350	303	303
query34	940	926	561	561
query35	790	816	748	748
query36	838	910	827	827
query37	146	93	77	77
query38	3869	3870	3817	3817
query39	757	742	707	707
query39_1	698	705	682	682
query40	228	133	124	124
query41	78	61	61	61
query42	108	107	107	107
query43	424	439	412	412
query44	1344	757	789	757
query45	195	187	183	183
query46	888	979	631	631
query47	1646	1677	1589	1589
query48	318	358	249	249
query49	632	433	359	359
query50	670	313	224	224
query51	3820	3829	3847	3829
query52	110	111	98	98
query53	323	354	294	294
query54	289	264	256	256
query55	78	79	71	71
query56	305	307	297	297
query57	1120	1141	1062	1062
query58	313	260	248	248
query59	2416	2454	2346	2346
query60	316	310	292	292
query61	166	159	157	157
query62	702	645	634	634
query63	329	295	305	295
query64	4906	1297	1007	1007
query65	3987	3955	3984	3955
query66	1363	440	327	327
query67	15221	14928	14831	14831
query68	8524	1037	751	751
query69	492	349	316	316
query70	1099	991	1021	991
query71	399	316	286	286
query72	6039	5013	5240	5013
query73	730	689	318	318
query74	8929	8953	8646	8646
query75	3598	3532	3156	3156
query76	4045	1161	760	760
query77	578	451	299	299
query78	9383	9690	8813	8813
query79	1629	886	620	620
query80	710	654	564	564
query81	535	269	244	244
query82	224	129	106	106
query83	260	250	239	239
query84	261	117	106	106
query85	885	501	454	454
query86	355	273	305	273
query87	4043	4099	3990	3990
query88	4373	2327	2295	2295
query89	466	431	389	389
query90	2206	157	154	154
query91	172	163	136	136
query92	77	69	60	60
query93	1638	934	584	584
query94	463	300	269	269
query95	543	327	361	327
query96	585	482	217	217
query97	2571	2657	2566	2566
query98	224	192	195	192
query99	1323	1284	1211	1211
Total cold run time: 266440 ms
Total hot run time: 181298 ms

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.82% (1728/2165)
Line Coverage 65.91% (30577/46390)
Region Coverage 66.60% (15251/22900)
Branch Coverage 56.93% (8112/14250)

@doris-robot
Copy link

ClickBench: Total hot run time: 27.81 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f4e1a9eceb169b2b2201482d22410bc26c1e1db9, data reload: false

query1	0.05	0.06	0.05
query2	0.10	0.05	0.04
query3	0.27	0.09	0.08
query4	1.64	0.11	0.11
query5	0.26	0.25	0.26
query6	1.16	0.64	0.64
query7	0.03	0.03	0.02
query8	0.06	0.04	0.04
query9	0.57	0.51	0.52
query10	0.56	0.56	0.56
query11	0.15	0.10	0.11
query12	0.14	0.10	0.11
query13	0.62	0.61	0.60
query14	0.97	0.97	0.98
query15	0.83	0.82	0.82
query16	0.38	0.41	0.40
query17	1.07	1.10	1.01
query18	0.22	0.21	0.21
query19	1.88	1.90	1.79
query20	0.02	0.01	0.01
query21	15.45	0.31	0.13
query22	4.85	0.05	0.05
query23	16.03	0.30	0.10
query24	2.08	0.74	0.74
query25	0.09	0.07	0.05
query26	0.13	0.13	0.14
query27	0.06	0.08	0.05
query28	5.39	1.21	1.02
query29	12.60	4.02	3.30
query30	0.28	0.14	0.11
query31	2.82	0.63	0.39
query32	3.24	0.55	0.46
query33	3.04	3.00	3.00
query34	16.94	5.25	4.56
query35	4.53	4.55	4.60
query36	0.65	0.49	0.51
query37	0.10	0.06	0.06
query38	0.07	0.05	0.05
query39	0.04	0.03	0.03
query40	0.17	0.14	0.14
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 99.71 s
Total hot run time: 27.81 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments.

@JNSimba
Copy link
Member Author

JNSimba commented Dec 11, 2025

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.82% (1728/2165)
Line Coverage 65.92% (30581/46390)
Region Coverage 66.62% (15256/22900)
Branch Coverage 56.95% (8115/14250)

@JNSimba
Copy link
Member Author

JNSimba commented Dec 11, 2025

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.82% (1728/2165)
Line Coverage 65.93% (30586/46390)
Region Coverage 66.62% (15255/22900)
Branch Coverage 56.91% (8110/14250)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 1.30% (13/1002) 🎉
Increment coverage report
Complete coverage report

@JNSimba
Copy link
Member Author

JNSimba commented Dec 11, 2025

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.83% (1730/2167)
Line Coverage 65.89% (30622/46476)
Region Coverage 66.59% (15272/22934)
Branch Coverage 56.91% (8119/14266)

@JNSimba
Copy link
Member Author

JNSimba commented Dec 11, 2025

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.83% (1730/2167)
Line Coverage 65.90% (30626/46476)
Region Coverage 66.63% (15280/22934)
Branch Coverage 56.94% (8123/14266)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 61.47% (584/950) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Proposal] Extend streaming job to support MySQL synchronization

5 participants