[Feature Request] Support Gemini’s New Multimodal Output #15814

laipz8200 · 2025-03-14T06:35:16Z

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
[FOR CHINESE USERS] 请务必使用英文提交 Issue，否则会被关闭。谢谢！:）
Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

Background

Google recently released the Gemini 2.0 Flash Experimental API, which includes support for generating images. This is an important step toward broader multimodal capabilities, and we anticipate future support for audio, video, and documents as well.

To keep Dify aligned with these advancements, we propose updating the backend to support Gemini’s new output formats while rendering them in Markdown format on the frontend.

Feature Scope

Backend Update Only: No UI/UX changes are required; the frontend will render responses using Markdown.
Support for Image Generation (Initial Focus): Since Gemini 2.0 Flash Exp already supports image generation, we should prioritize integrating this feature first.
Extendable for Future Media Outputs: The implementation should be flexible enough to accommodate audio, video, and documents once Gemini enables them.

2. Additional context or comments

No response

3. Can you help us with this feature?

I am interested in contributing to this feature.

zhangever · 2025-03-27T11:29:29Z

I need this feature, too. How's the cr going?

laipz8200 · 2025-04-01T03:34:56Z

@zhangever Thanks for your comment! We're currently testing this feature, and it shouldn't be too long before it's ready for you. 😌

JacoBezuidenhout · 2025-04-02T16:08:57Z

Heyy everyone! Thanks sooo much for the effort you guys put into this! Going to be amazing!

laipz8200 self-assigned this Mar 14, 2025

dosubot bot added the 💪 enhancement label Mar 14, 2025

laipz8200 linked a pull request Mar 18, 2025 that will close this issue

feat: support image generate for gemini #16085

Draft

5 tasks

QuantumGhost assigned QuantumGhost and unassigned laipz8200 Mar 20, 2025

QuantumGhost added the 🌊 feat:workflow label Mar 20, 2025 — with Linear

QuantumGhost linked a pull request Apr 3, 2025 that will close this issue

feat(api): Add image multimodal support for LLMNode #17372

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Support Gemini’s New Multimodal Output #15814

[Feature Request] Support Gemini’s New Multimodal Output #15814

laipz8200 commented Mar 14, 2025

zhangever commented Mar 27, 2025

laipz8200 commented Apr 1, 2025

JacoBezuidenhout commented Apr 2, 2025

[Feature Request] Support Gemini’s New Multimodal Output #15814

[Feature Request] Support Gemini’s New Multimodal Output #15814

Comments

laipz8200 commented Mar 14, 2025

Self Checks

1. Is this request related to a challenge you're experiencing? Tell me about your story.

Background

Feature Scope

2. Additional context or comments

3. Can you help us with this feature?

zhangever commented Mar 27, 2025

laipz8200 commented Apr 1, 2025

JacoBezuidenhout commented Apr 2, 2025