Skip to content

feat: GPU support for building DISKANN based indexes.#1460

Open
liorf95 wants to merge 4 commits intozilliztech:mainfrom
liorf95:add_gpu
Open

feat: GPU support for building DISKANN based indexes.#1460
liorf95 wants to merge 4 commits intozilliztech:mainfrom
liorf95:add_gpu

Conversation

@liorf95
Copy link

@liorf95 liorf95 commented Feb 8, 2026

See issue #1422.

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
@sre-ci-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: liorf95
To complete the pull request process, please assign marcelo-cjl after the PR has been reviewed.
You can assign the PR to them by writing /assign @marcelo-cjl in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify
Copy link

mergify bot commented Feb 8, 2026

@liorf95 🔍 Important: PR Classification Needed!

For efficient project management and a seamless review process, it's essential to classify your PR correctly. Here's how:

  1. If you're fixing a bug, label it as kind/bug.
  2. For small tweaks (less than 20 lines without altering any functionality), please use kind/improvement.
  3. Significant changes that don't modify existing functionalities should be tagged as kind/enhancement.
  4. Adjusting APIs or changing functionality? Go with kind/feature.

For any PR outside the kind/improvement category, ensure you link to the associated issue using the format: “issue: #”.

Thanks for your efforts and contribution to the community!.

constexpr uint32_t kK = 10;
constexpr uint32_t kK = 64;
#ifdef KNOWHERE_WITH_CUVS
constexpr uint32_t defaultMaxDegree = 64;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there a difference for the value of defaultMaxDegree for CUVS and non-CUVS versions for the testing? Any particular reasons?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default value of max_degree for diskann based indexes is 56, so we did not want to modify UT for those basic scenarios.
However, cuVS related limitations demand max_degree to be a power of 2 (e.g., 32, 64, 128, etc.)

constexpr uint32_t kLargeDim = 256;
constexpr uint32_t kK = 10;
constexpr uint32_t kK = 64;
#ifdef KNOWHERE_WITH_CUVS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the requirements for the GPU memory for a cuVS-based version of this test? It would be nice to have it as a comment in the beginning of the file, including a minimum sifficient GPU, if appropriate. Something like This test is expected to run on a GPU with at least 2 GB of RAM, such as NVIDIA A10 or NVIDIA T4

diskann::get_bin_metadata(base_file, base_num, base_dim);
#ifdef KNOWHERE_WITH_CUVS
raft::device_resources dev_resources;
if(compareMetric == diskann::L2 && is_gpu_available()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and what is going to happen if (compareMetric != diskann::L2)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the cuvs Vamana build only supports the L2Expanded metric, so that the index will be built with cpu as usual.

" Gib free memory out of " << gpu_total_mem/(1024*1024*1024L) << " Gib total";
ram_budget = std::min<double>(ram_budget,(double)0.9*(gpu_free_mem/(1024*1024*1024)));
shard_r = std::max<uint32_t>((uint32_t)32,(uint32_t)R/2);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
    // add to log why GPU is not going to be used
}

bool built_with_gpu=false;
#ifdef KNOWHERE_WITH_CUVS
//currently cuvs vamana build only supports L2Expanded metric
if (compareMetric == diskann::L2 && is_gpu_available() &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the following calls imply that if the metric is right, the library is built with #define KNOWHERE_WITH_CUVS and is_gpu_available(), then the code will ALWAYS be built using a GPU.
Is it possible to introduce a configurable boolean flag that indicates whether a GPU is expected to be used by a user? Say, the default value is true. At least, this would be helpful for running unit tests, so that it would be possible to validate both GPU and non-GPU branches of the code on the same machine without tricks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this and the following calls imply that if the metric is right, the library is built with #define KNOWHERE_WITH_CUVS and is_gpu_available(), then the code will ALWAYS be built using a GPU. Is it possible to introduce a configurable boolean flag that indicates whether a GPU is expected to be used by a user? Say, the default value is true. At least, this would be helpful for running unit tests, so that it would be possible to validate both GPU and non-GPU branches of the code on the same machine without tricks.

you can already control that with env varibale NVIDIA_VISIBLE_DEVICES - set it to none and the index will be built with CPU

1);
}
built_with_gpu=true;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

} else {
     // add to log why GPU is not going to be used
}

Same comment to other possible else branches, if appropriate.

#ifdef KNOWHERE_WITH_CUVS
if(is_gpu_available()) {
if (R != 32 && R != 64 && R != 128) {
LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be only 32 or 64 or 128";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LOG_KNOWHERE_ERROR_ << "Invalid R value (" << R << ") for cuvs - should be only 32 or 64 or 128";

Same comment applies for the following lines as well

int iters);
bool is_gpu_available();

void kmeans_gpu(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personally, I would prefer FAISS based names here. So, that would be train (kmeans_gpu) and assign (predict_gpu).
Alternatively, please add comments on what these functions do, because it looks confusing for those who are not aware.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will add comments on what these functions do.

raft::resources res;
LOG_KNOWHERE_INFO_ << "Running k-means with " << k << " clusters...using GPU!";
kmeans_gpu(res,train_data_float.get(), num_train, train_dim,
k, 10, centroids.data());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to extract 10 (number of iterations?) as a constexpr value nearly, please? Same for the CPU branch


void gpu_get_mem_info(raft::resources &dev_resources, size_t &gpu_free_mem,size_t &gpu_total_mem);

#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about selecting the right GPU to deal with? It is expected to be done by a caller using raft::resources object, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about selecting the right GPU to deal with? It is expected to be done by a caller using raft::resources object, correct?

The current implementation doesnt handle multi GPU logic.IT is assumed that each process use one GPU. however with multi GPU platform in k8s with milvus you can assign each datanode a different GPU exclusively and accelerate the build further.

@alexanderguzhva
Copy link
Collaborator

/kind improvement

@alexanderguzhva
Copy link
Collaborator

issue: #1422

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
@mergify mergify bot removed the ci-passed label Feb 10, 2026
LOG_KNOWHERE_ERROR_ << "Invalid R value for cuvs - should be only 32 or 64 or 128";
return -1;
}
if (L != 32 && L != 64 && L != 128 && L != 256) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the specific reason to choose the seven magic numbers? can it be more flexible?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with cuvs 25.10 which is the version used, cuvs has validation for visted size which must be a power of 2.
This list is reasnoble considering this constraint as we see that higher L than 256 doesnt benfit too much

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe restrict it to 2^k with a larger boundary like L \in [32, 2048] instead of some magic number is better. maybe some larger or harder datasets or in higher recall requirement need higher parameter :)

Signed-off-by: Lior Friedman <lior.friedman@il.kioxia.com>
Signed-off-by: Hiroshi Murayama <hiroshi4.murayama@kioxia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants