feat(api): Add image multimodal support for LLMNode #17372

QuantumGhost · 2025-04-03T00:44:53Z

Summary

Enhance LLMNode with multimodal capability, introducing support for
image outputs.

This implementation extracts base64-encoded images from LLM responses,
saves them to the storage service, and records the file metadata in the
ToolFile table. In conversations, these images are rendered as
markdown-based inline images.
Additionally, the images are included in the LLMNode's output as
file variables, enabling subsequent nodes in the workflow to utilize them.

To integrate file outputs into workflows, adjustments to the frontend code
are necessary.

For multimodal output functionality, updates to related model configurations
are required. Currently, this capability has been applied exclusively to
Google's Gemini models.

Close #15814.

Screenshots

Before	After

The image is showed twice. I don't know why. (maybe some issues in frontend code?)

To utilize multimodal output capability, updating to Gemini models is required. Related PR will be submitted later.

Checklist

This change requires a documentation update, included: Dify Document
I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.
I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods

Signed-off-by: -LAN- <[email protected]> # Conflicts: # api/core/model_runtime/entities/message_entities.py

…supported types Signed-off-by: -LAN- <[email protected]>

Enhance `LLMNode` with multimodal capability, introducing support for image outputs. This implementation extracts base64-encoded images from LLM responses, saves them to the storage service, and records the file metadata in the `ToolFile` table. In conversations, these images are rendered as markdown-based inline images. Additionally, the images are included in the LLMNode's output as file variables, enabling subsequent nodes in the workflow to utilize them. To integrate file outputs into workflows, adjustments to the frontend code are necessary. For multimodal output functionality, updates to related model configurations are required. Currently, this capability has been applied exclusively to Google's Gemini models.

Add a detailed notice to guide contributors on avoiding direct usage of the global variable `models.engine.db`. Instead, they are encouraged to apply dependency injection to improve code readability, testability, and maintainability.

…roxy` module

Clarify that `ToolFile` now stores not only metadata for files generated by agents but also metadata for files produced by various nodes in a workflow. For instance, it includes metadata for multimodal output files generated by an `LLMNode`.

laipz8200 and others added 7 commits March 27, 2025 11:23

feat: enhance prompt message validation and add content type mapping

e1e0ece

Signed-off-by: -LAN- <[email protected]> # Conflicts: # api/core/model_runtime/entities/message_entities.py

feat: enhance handling of prompt message content and add error for un…

de0a57d

…supported types Signed-off-by: -LAN- <[email protected]>

test(api): Add tests for multimodal output related code

3c50b52

types(api): Add type annotations for keyword arguments in the `ssrf_p…

8466a3a

…roxy` module

QuantumGhost force-pushed the feat/support-image-generate-for-gemini branch 2 times, most recently from f78a7db to b8672a6 Compare April 9, 2025 11:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): Add image multimodal support for LLMNode #17372

feat(api): Add image multimodal support for LLMNode #17372

QuantumGhost commented Apr 3, 2025 •

edited

Loading

feat(api): Add image multimodal support for LLMNode #17372

Are you sure you want to change the base?

feat(api): Add image multimodal support for LLMNode #17372

Conversation

QuantumGhost commented Apr 3, 2025 • edited Loading

Summary

Screenshots

Checklist

QuantumGhost commented Apr 3, 2025 •

edited

Loading