Skip to content

[WIP] feat: filter files on mount#27

Open
aagumin wants to merge 2 commits intomodelpack:mainfrom
aagumin:feature/super-exclude
Open

[WIP] feat: filter files on mount#27
aagumin wants to merge 2 commits intomodelpack:mainfrom
aagumin:feature/super-exclude

Conversation

@aagumin
Copy link
Contributor

@aagumin aagumin commented Feb 11, 2026

implement #17

Signed-off-by: Arsen Gumin <gumin@live.ru>
@gemini-code-assist
Copy link

Summary of Changes

Hello @aagumin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the model CSI driver by introducing a file exclusion mechanism. Users can now specify patterns to prevent certain files or directories from being pulled and mounted with a model, providing greater control over storage and potentially optimizing resource usage. This feature is integrated into the CSI volume creation and node publishing workflows, ensuring that only relevant model assets are deployed.

Highlights

  • File Exclusion Feature: Introduced a new capability to exclude specific files or patterns during model mounting, similar to .gitignore functionality.
  • Configuration and API Updates: Added a new configuration parameter exclude-files and updated the MountRequest API to accept an array of exclusion patterns.
  • Pattern Matching Logic: Implemented a FilePatternMatcher utility using go-git/go-git/v5/plumbing/format/gitignore for robust pattern matching and a filterFilesByPatterns function to remove excluded files and empty directories.
  • Pull Process Integration: Modified the model pulling logic to incorporate file exclusion, both by pre-filtering files to fetch and by post-processing to remove any remaining excluded files.
  • Security Validations: Added validation to prevent absolute paths and parent directory references in exclusion patterns, enhancing security.
  • Testing: Included new unit and integration tests to ensure the correctness and robustness of the file exclusion feature.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • pkg/config/config.go
    • Added ParameterKeyExcludeFiles function to retrieve the configuration key for file exclusion patterns.
  • pkg/server/server_test.go
    • Updated the mockPuller.Pull method signature to include excludeFilePatterns.
    • Modified the service.NewPuller mock to pass excludeFilePatterns.
  • pkg/service/controller_local.go
    • Imported encoding/json for JSON parsing.
    • Implemented logic to parse and validate the exclude-files parameter from CSI requests.
    • Passed the parsed excludeFilePatterns to the worker.PullModel function.
  • pkg/service/dynamic_server_handler.go
    • Imported encoding/json for JSON marshaling.
    • Added validation for ExcludeFilePatterns in the CreateVolume handler to prevent absolute paths and parent directory references.
    • Marshaled ExcludeFilePatterns to a JSON string and included it in the CSI CreateVolumeRequest parameters.
  • pkg/service/node.go
    • Imported encoding/json for JSON parsing.
    • Added logic to parse and validate exclude-files parameter from volume attributes.
    • Passed the parsed excludeFilePatterns to nodePublishVolumeStaticInlineVolume.
  • pkg/service/node_static_inline.go
    • Updated the nodePublishVolumeStaticInlineVolume function signature to accept excludeFilePatterns.
    • Passed excludeFilePatterns to the worker.PullModel function.
  • pkg/service/patterns.go
    • Added a new file defining FilePatternMatcher for gitignore-style pattern matching.
    • Implemented NewFilePatternMatcher with security validations for patterns.
    • Provided Match method to check if a path matches exclusion patterns.
    • Included filterFilesByPatterns to walk a directory, remove matched files, and clean up empty directories.
    • Added helper functions removeEmptyDirectories and isDirEmpty.
  • pkg/service/patterns_integration_test.go
    • Added a new integration test TestPullModel_WithExcludeFilePatterns to verify the end-to-end file exclusion functionality during model pulling.
  • pkg/service/patterns_test.go
    • Added a new file with comprehensive unit tests for FilePatternMatcher and filterFilesByPatterns functions, covering various pattern types and edge cases.
  • pkg/service/puller.go
    • Removed unused io import.
    • Updated the Puller interface and puller struct to include excludeFilePatterns.
    • Modified the Pull method to integrate file exclusion logic: it now pre-filters files to fetch based on patterns and performs a post-fetch cleanup to remove any remaining excluded files.
  • pkg/service/request.go
    • Added ExcludeFilePatterns field (as a slice of strings) to the MountRequest struct.
  • pkg/service/testdata/config.yaml
    • Added a new test data configuration file.
  • pkg/service/worker.go
    • Updated the newPuller field in the Worker struct to accept excludeFilePatterns.
    • Modified PullModel and pullModel function signatures to include excludeFilePatterns.
    • Passed excludeFilePatterns to the newPuller factory and the puller.Pull method.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@aagumin aagumin changed the title feat: filter files on mount [WIP] feat: filter files on mount Feb 11, 2026
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a feature to filter files during model mounting using gitignore-style patterns provided via an exclude-files parameter. The implementation adds logic to parse, validate, and apply these patterns. The core pattern matching is handled in a new patterns.go file. While the feature is a good addition, my review identifies several areas for improvement, including code duplication in validation logic, potential bugs in directory pattern matching, and some inefficiencies. Addressing these points will enhance the robustness and maintainability of the new feature.

path = filepath.ToSlash(path)
pathParts := strings.Split(path, "/")

return m.matcher.Match(pathParts, false)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The isDir argument to m.matcher.Match is hardcoded to false. This will cause incorrect behavior for gitignore patterns that are meant to match directories (e.g., logs/). The FilePatternMatcher.Match function should accept a boolean isDir argument and pass it to the underlying go-git matcher.

This will also require changes in filterFilesByPatterns to pass info.IsDir() and to correctly handle directory matches (e.g., by using os.RemoveAll and returning filepath.SkipDir). A similar issue exists in puller.go where Match is called without directory information.

Suggested change
return m.matcher.Match(pathParts, false)
return m.matcher.Match(pathParts, isDir)

Comment on lines +79 to +88
// Validate patterns for security
for _, p := range excludeFilePatterns {
if strings.HasPrefix(p, "/") && len(p) > 1 {
return nil, isStaticVolume, status.Errorf(codes.InvalidArgument, "invalid parameter:%s: absolute paths not allowed: %s", s.cfg.Get().ParameterKeyExcludeFiles(), p)
}
if strings.Contains(p, "..") {
return nil, isStaticVolume, status.Errorf(codes.InvalidArgument, "invalid parameter:%s: parent directory reference not allowed: %s", s.cfg.Get().ParameterKeyExcludeFiles(), p)
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This validation logic for excludeFilePatterns is duplicated in pkg/service/dynamic_server_handler.go, pkg/service/node.go, and pkg/service/patterns.go. To improve maintainability and ensure consistency, this logic should be extracted into a single, reusable function (e.g., ValidateExcludePatterns in the service package). This new function can then be called from all the places where validation is needed.

}

if info.IsDir() && path != targetDir {
isEmpty, _ := isDirEmpty(path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The error returned from isDirEmpty is being ignored. This function can return errors other than io.EOF (e.g., permission errors when opening the directory), which would be silently ignored. This could lead to incorrect behavior, such as not removing a directory that should be removed. The error should be checked and handled, for example by logging it and continuing the walk.

isEmpty, err := isDirEmpty(path)
if err != nil {
    logger.Logger().WithError(err).Warnf("Failed to check if directory is empty: %s", path)
    return nil
}
if isEmpty {

Comment on lines +164 to +166
if err.Error() == "EOF" {
return true, nil // Directory is empty
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Comparing the error string with "EOF" is brittle and not idiomatic Go. It's better to use errors.Is(err, io.EOF) or err == io.EOF for this check. This will make the code more robust against changes in the error message. Note that this will require importing the io package.

if err == io.EOF {
		return true, nil // Directory is empty
	}

Comment on lines +128 to +131
matcher, err := NewFilePatternMatcher(excludeFilePatterns)
if err != nil {
return errors.Wrap(err, "create file pattern matcher")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

NewFilePatternMatcher is called here and also earlier in this function (line 78) with the same excludeFilePatterns. This is inefficient. The FilePatternMatcher instance should be created once at the beginning of the function (if excludeFilePatterns is not empty) and then reused in both places.

Signed-off-by: Arsen Gumin <gumin@live.ru>
}

func (cfg *RawConfig) ParameterKeyExcludeFiles() string {
return cfg.ServiceName + "/exclude-files"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe return cfg.ServiceName + "/exclude-file-patterns".

excludeFilePatternsParam := strings.TrimSpace(parameters[s.cfg.Get().ParameterKeyExcludeFiles()])
var excludeFilePatterns []string
if excludeFilePatternsParam != "" {
if err := json.Unmarshal([]byte(excludeFilePatternsParam), &excludeFilePatterns); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a comma-separated format for the parameter value? For example: /foo,/bar.

}

// Validate exclude_file_patterns
for _, p := range req.ExcludeFilePatterns {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this path security check is unnecessary because, after using gitignore to match the results of modelArtifact.GetPatterns, a secure list of absolute paths can be obtained.

excludeFilePatternsParam := volumeAttributes[s.cfg.Get().ParameterKeyExcludeFiles()]
var excludeFilePatterns []string
if excludeFilePatternsParam != "" {
if err := json.Unmarshal([]byte(excludeFilePatternsParam), &excludeFilePatterns); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.


// NewFilePatternMatcher creates a new pattern matcher from a list of gitignore-compatible patterns
func NewFilePatternMatcher(patterns []string) (*FilePatternMatcher, error) {
// Validate patterns for security
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.


logger.WithContext(ctx).Infof("Applying file exclusion patterns: %v", excludeFilePatterns)

_, err = filterFilesByPatterns(targetDir, matcher)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file system operation should not be introduced, after we set up fetchPatterns, the files pulled by b.Fetch should already be what the end user needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants