Drop codegen support of gather (but not takeAlongAxis) #5907

naoyam · 2026-01-31T04:45:33Z

Gather allows non-gathered indices to have smaller output dimensions, which complicates indexing and is not yet supported by TensorIndexer. Note that takeAlongAxis, which is a limited case of gather, is supported.

One way to support it is to decompose it into a takeAlongAxis and slice. For now, this PR disables codegen of gather and delegates to ExprEval.

naoyam · 2026-01-31T04:45:57Z

!test

github-actions · 2026-01-31T04:46:30Z

Description

Dropped codegen support for non-exact gather operations
Added validation in TensorIndexer to ensure fusion support
Updated scheduler to reject fusions with non-exact gather ops
Modified tests to use ExprEval scheduler or disabled non-exact gather tests

Changes walkthrough

Relevant files

Enhancement

indexing.cpp `Add validation check for fusion support` csrc/id_model/indexing.cpp Added NVF_ERROR validation check in TensorIndexer constructor Ensures fusion is supported before building loop index map	+2/-0
expr_eval_sched.cpp `Block GatherOp from ExprEval scheduling` csrc/scheduler/expr_eval_sched.cpp Added GatherOp to list of unsupported operations for ExprEvalScheduler Prevents gather operations from using expression evaluation scheduler	+1/-0
registry.cpp `Add non-exact gather validation in scheduler` csrc/scheduler/registry.cpp Added check for non-exact gather operations in scheduler validation Rejects fusions with non-exact gather ops with specific error message Maintains support for exact gather operations	+10/-0

Tests

test_gather.cpp `Update and disable gather-related tests` tests/cpp/test_gather.cpp Updated test segmentation validation to use ExprEval scheduler Disabled two tests for non-exact gather operations Added explanatory comments about dropped codegen support	+5/-3

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review
Gather validation logic The check for non-exact gather operations uses `gather->exactSizes()` method. Need to verify this method correctly identifies all cases where gather operations would have non-gathered indices with smaller output dimensions, as mentioned in the PR description. if (std::ranges::any_of( ir_utils::getOpsOfType<GatherOp>(fusion), [](GatherOp* gather) { return !gather->exactSizes(); })) { scheduler_debug_utils::canScheduleRejectReason( scheduler_type, "Non-exact gather ops"); return false; } Test coverage for disabled gather tests Two tests are disabled (DISABLED_GatherIterGoupedReduction and DISABLED_SameTvUsedAsLookupAndIndex). Ensure these tests are properly documented and that there are alternative tests or validation mechanisms to verify the functionality still works through ExprEval delegation. TEST_F(GatherTest, DISABLED_GatherIterGoupedReduction) { const int max_dim_size = 128; auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA, 0); auto options_i = at::TensorOptions().dtype(at::kLong).device(at::kCUDA, 0); int rank = 3; int dim = 2; auto fusion_ptr = std::make_unique<Fusion>(); Fusion& fusion = fusion_ptr.get(); FusionGuard fg(&fusion); TensorView tv1 = makeContigTensor(rank); TensorView* tv_idx = makeContigTensor(rank, DataType::Int); fusion.addInput(tv1); fusion.addInput(tv_idx); auto tv_gather = gather(tv1, dim, tv_idx); auto tv_sum = sum(tv_gather, {0}, false); fusion.addOutput(tv_sum); // simply gather all elements auto input_dims = std::vector<int64_t>({max_dim_size, max_dim_size, max_dim_size}); auto index_dims = input_dims; std::vector<int64_t> input2_dims(rank - 1, 0); for (int idim = 0; idim < rank - 1; ++idim) { input2_dims[idim] = index_dims[idim + 1]; } at::Tensor t0 = at::randn(input_dims, options); at::Tensor idx = at::randint(0, input_dims[dim], index_dims, options_i); auto reduction_scheduler = SchedulerEntry::makeSchedulerInstance(SchedulerType::Reduction); SchedulerRuntimeInfo runtime_info(&fusion, {t0, idx}); auto heuristic_params = reduction_scheduler->computeHeuristics(&fusion, runtime_info); auto rparams = heuristic_params->as<ReductionParams>(); // Enforce vectorization so we can group them const int vect_factor = 2; rparams->vectorize_iter_dom = true; rparams->unroll_factor_iter_dom = vect_factor; // Enforce grid reduction, which requires a determined BIDy // If the heuristic does not have a BIDy, bind it to 2 rparams->cross_grid_inner_reduction = true; rparams->split_grid_dim_inner_reduction = true; rparams->grid_dim_inner_reduction = ParallelType::BIDy; if (!rparams->lparams.hasDim(ParallelType::BIDy)) { rparams->lparams.bind(2L, ParallelType::BIDy); } reduction_scheduler->schedule(&fusion, rparams); // lowering & check iteration grouped reductions GpuLower gpulw(&fusion); gpulw.run(); NVF_CHECK( gpulw.kernel()->summary().has_iter_grouped_reductions, "There must be iter domain grouped reductions."); NVF_CHECK( gpulw.kernel()->summary().num_grouped_iterations == vect_factor, "Expected ", vect_factor, " grouped iterations, found ", gpulw.kernel()->summary().num_grouped_iterations); KernelExecutor ke; auto lparams = rparams->lparams; ke.compile(&fusion, {t0, idx}, lparams); auto cg_outputs = ke.run({t0, idx}, {}, lparams); auto t_gather = at::gather(t0, dim, idx); testValidate( &fusion, cg_outputs, {t0, idx}, {t_gather.sum(0)}, __LINE__, __FILE__, "", lparams); } // Codegen support of non-exact gather dropped TEST_F(GatherTest, DISABLED_SameTvUsedAsLookupAndIndex) {

Test failures

(Medium, 6) NVFuser internal assert (TensorIndexer isSupported) in PersistentBufferTest.BufferGatherLookupTv and ReductionTest.CrossEntropyGatherPattern

Test Name A100 GB200 H100 Source

PersistentBufferTest.BufferGatherLookupTv ❌ ❌ ❌ Link

ReductionTest.CrossEntropyGatherPattern ❌ ❌ ❌ Link
(Medium, 2) Thunder nvFuser nanoGPT autograd returns zero scalar on CUDA

Test Name A100 GB200 Source

thunder.tests.test_networks.test_nanogpt_complete_autograd_nvfuser_cuda_thunder.dtypes.float32 ❌ ❌

cleanup

c4b50f5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop codegen support of gather (but not takeAlongAxis) #5907

Drop codegen support of gather (but not takeAlongAxis) #5907

Uh oh!

naoyam commented Jan 31, 2026

Uh oh!

naoyam commented Jan 31, 2026

Uh oh!

github-actions bot commented Jan 31, 2026 •

edited by xwang233

Loading

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Drop codegen support of gather (but not takeAlongAxis) #5907

Are you sure you want to change the base?

Drop codegen support of gather (but not takeAlongAxis) #5907

Uh oh!

Conversation

naoyam commented Jan 31, 2026

Uh oh!

naoyam commented Jan 31, 2026

Uh oh!

github-actions bot commented Jan 31, 2026 • edited by xwang233 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Jan 31, 2026 •

edited by xwang233

Loading