Skip to content

Conversation

@gy-mate
Copy link
Contributor

@gy-mate gy-mate commented Aug 25, 2025

Describe your changes

  • Add --image/-i flag to include image files in prompts
  • Support common formats: PNG, JPEG, GIF, WebP (max 5MB, max 10 images)
  • OpenAI: use content arrays with text and image_url parts
  • Ollama: use native Images field for vision models like LLaVA
  • Error gracefully for non-vision APIs (Anthropic, Google, Cohere)
  • Validate file existence, format, and size limits
  • Works with any OpenAI-compatible endpoint in config

Authored-By: @claude, @anuramat, @gy-mate

Related issue

Resolves #364.

Checklist before requesting a review

  • I have read CONTRIBUTING.md
  • I have performed a self-review of my code. It works!

If this is a feature

  • I have created a discussion
  • A project maintainer has approved this feature request. Link to comment:

- Add --image/-i flag to include image files in prompts
- Support common formats: PNG, JPEG, GIF, WebP (max 5MB, max 10 images)
- OpenAI: use content arrays with text and image_url parts
- Ollama: use native Images field for vision models like LLaVA
- Error gracefully for non-vision APIs (Anthropic, Google, Cohere)
- Validate file existence, format, and size limits
- Works with any OpenAI-compatible endpoint in config

Authored-By: claude, @anuramat, @gy-mate
@gy-mate gy-mate requested a review from caarlos0 as a code owner August 25, 2025 10:50
@gy-mate gy-mate marked this pull request as draft August 25, 2025 10:53
@gy-mate gy-mate force-pushed the image-input branch 2 times, most recently from 71bb937 to 5a9d0d1 Compare August 25, 2025 19:39
@gy-mate gy-mate marked this pull request as ready for review August 25, 2025 20:01
@gy-mate
Copy link
Contributor Author

gy-mate commented Aug 25, 2025

The linter error refers to a line that is left unchanged by this PR.

@gy-mate
Copy link
Contributor Author

gy-mate commented Nov 7, 2025

@caarlos0 Could you please review this PR?

Many thanks in advance! :)

@caarlos0
Copy link
Member

caarlos0 commented Jan 5, 2026

i wonder how much of this is really needed, and how much of it is needed here.

afaik fantasy and crush already support passing image attachments, and we already have code handling mime types and stuff like that.

shouldn't we maybe have a new API for image models in fantasy? and probably eventually another one for audio etc? maybe @kujtimiihoxha and @andreynering have more thoughts on this

@gy-mate
Copy link
Contributor Author

gy-mate commented Jan 5, 2026

I'd like to use a CLI for this purpose. As far as I understand, crush only has a basic CLI with no piping or follow-up options and fantasy doesn't have one at all. That's why I'd love to see this feature in mods.

@andreynering
Copy link
Member

Hey @gy-mate,

Can you let us know what you miss from Crush that doesn't fit your use case? What do you mean by "follow-up options"?

We basically plan to retire Mods in favor of crush run, but we know we still have work to do once it has all the meaningful features.

@gy-mate
Copy link
Contributor Author

gy-mate commented Jan 8, 2026

Hi @andreynering! :)

Can you let us know what you miss from Crush that doesn't fit your use case? What do you mean by "follow-up options"?

I meant mods -c and mods -C that continues a / the previous conversation. (Although I would switch their syntax if it gets reimplemented in crush because -C is probably more commonly used than -c.)

@gy-mate
Copy link
Contributor Author

gy-mate commented Jan 21, 2026

@caarlos0 @andreynering Could you please review my PR in light of the above? Many thanks in advance! :)

@andreynering
Copy link
Member

Hi @gy-mate,

We plan to sunset Mods really soon and archive this repository.

From now on, Crush is our focus, and we do aknowledge how important non-interactive mode is! In fact, yesterday we pushed a release with --model flag support and we want to continue to make progress on that area.

If you want to contribute to this feature on Crush, that would be wonderful. Otherwise, we'll eventually do that ourselves.

In meantime, if your implementation on Mods works well, you can use your fork.

@gy-mate
Copy link
Contributor Author

gy-mate commented Jan 23, 2026

We plan to sunset Mods really soon and archive this repository.

Oh, I see. Thanks for the info! :)

From now on, Crush is our focus, and we do aknowledge how important non-interactive mode is! In fact, yesterday we pushed a release with --model flag support and we want to continue to make progress on that area.

Awesome, thank you! :)

If you want to contribute to this feature on Crush, that would be wonderful. Otherwise, we'll eventually do that ourselves.

Shall I open relevant issues in the Crush repo? Or do they already exist?

@andreynering
Copy link
Member

andreynering commented Jan 23, 2026

Shall I open relevant issues in the Crush repo? Or do they already exist?

I'm not sure. Worth searching if they exist already. Otherwise, feel free to open new issues.

@gy-mate
Copy link
Contributor Author

gy-mate commented Jan 26, 2026

Great, thanks! I've opened charmbracelet/crush#1982 and charmbracelet/crush#1983.

@andreynering
Copy link
Member

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support OpenAI vision models

3 participants