.Net: Add support for audio, pdf, doc, and docx to chat prompt parser #11919

glorious-beard · 2025-05-06T20:38:40Z

Motivation and Context

Why is this change required?

This template parsers like the YAML parser to embed content types other than just text and images for LLMs that support additional content types, like PDFs for OpenAI and DOCXs for Claude. Without this capability, functions with prompts that have attachments would have to manually build it's chat history in code.

What problem does it solve?

See above

What scenario does it contribute to?

Usage additional content types beyond visuals and audio for user messages

Open Issues Addressed

Fixes Expanding ChatPromptParser to handle other content types #11044

Description

Chat Prompt Parser

To preserve backward compatibility, rather than consolidating binary content types, I chose to go with adding additional content types so that LLM chat service providers could opt-in to new content types. It also reduces the chances of breaking existing code.

3 new content types are created:

PdfContent for PDF files. Uses the tag "<pdf>". Allows for Base64 data URIs or standard URIs, similar to ImageContent.
DocContent for MS Word .doc files. Uses the tag "<doc>". Allows for Base64 data URIs or standard URIs, similar to ImageContent.
DocxContent for MS Word .docx files. Uses the tag "<docx>". Allows for Base64 data URIs or standard URIs, similar to ImageContent.

(NOTE: DocContent and DocxContent are mainly separate because they have different MIME types and different content formats, though they could easily be consolidated into a single tag and just let the LLM provider handle distinguishing between "doc" and "docx" files. Alternately, I could also see the case for dropping ".doc" support and requiring the caller to only use ".docx".)

In addition, the following 2 contents are now parsed from the XML:

AudioContent - Parses the tag "<audio>" with either Base64 data URIs or standard URIs, similar to ImageContent.
BinaryContent - Parses the tag "<file>" with either Base64 data URIs or standard URIs, similar to ImageContent.

Here is a sample:

            
<message role='user'>
  This part will be discarded upon parsing
  <text>Make sense of this random assortment of stuff.</text>
  <image>https://fake-link-to-image/</image>
  <audio>data:audio/wav;base64,UklGRiQAAABXQVZFZm10IBAAAAABAAEAIlYAAACABAAZGF0YVgAAAAA</audio>
  <pdf>data:application/pdf;base64,JVBERi0xLjQKJeLjz9MKMyAwIG9iago8PC9UeXBlL1hSZWYvUGFnZXMgNiAwIFIKL1R5cGUvUGFnZS9NZWRpYUJveCBbMCAwIDQ4MCA1MF0KL0NvbnRlbnRzIDw8L0V4dEdTdGF0ZSA8PC9JRCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GMSA8PC9GMiA8PC9GMyA8PC9GNCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GNSA8PC9GNiA8PC9GNyBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GOCAvPj4KZW5kb2JqCjEwIDAgb2JqCjw8L1R5cGUvUGFnZS9NYWRlYUJveCBbMCAwIDQ4MCA1MF0KL0NvbnRlbnRzIDw8L0V4dEdTdGF0ZSA8PC9JRCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GMSA8PC9GMiA8PC9GMyA8PC9GNCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GNSA8PC9GNiA8PC9GNyBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GOCAvPj4KZW5kb2JqCjEwIDAgb2JqCjw8L1R5cGUvUGFnZS9NYWRlYUJveCBbMCAwIDQ4MCA1MF0KL0NvbnRlbnRzIDw8L0V4dEdTdGF0ZSA8PC9JRCBbPDwvTGVuZ3RoIDQ4XQovRm9udCA8PC9GMSA8PC9G</pdf>
  <pdf>https://fake-link-to-pdf/</pdf>  
 
 <doc>data:application/msword;base64,UEsDBBQAAAAIAI+Q1k5a2gAAABQAAAAIAAAAbmFtZS5kb2N4VVQJAAD9AAAACwAAAB4AAAAAA==</doc>
  <doc>https://fake-link-to-doc/</doc>
  <docx>data:application/vnd.openxmlformats-officedocument.wordprocessingml.document;base64,UEsDBBQAAAAIAI+Q1k5a2gAAABQAAAAIAAAAbmFtZS5kb2N4VVQJAAD9AAAACwAAAB4AAAAAA==</docx>
  <docx>https://fake-link-to-docx/</docx>
  <file>data:application/octet-stream;base64,UEsDBBQAAAAIAI+Q1k5a2gAAABQAAAAIAAAAbmFtZS5kb2N4VVQJAAD9AAAACwAAAB4AAAAAA==</file>
  <file>https://fake-link-to-binary/</file>
  This part will also be discarded upon parsing
</message>

Amazon Bedrock

Modified the Converse API request generator to handle the subset of binary content supported by Amazon Bedrock (PDF, DOC, DOCX, and Image), as documented here.

OpenAI

Modified the client to handle PDF content, audio content, and file references when generating a request to an OpenAI (or OpenAI compatible) client.

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

…ntent This is to support microsoft#11044.

RogerBarreto · 2025-05-30T09:45:52Z

@glorious-beard I updated the proposal to be abstract as this is applied to the SemanticKernel.Abstraction package.

As we will have many different types of documents and binary files, to be more abroad and less specific, is better not introduce any special content types and use the existing ones we already have that works.

Given that updated the logic to accept a mimetype attribute as part of the <binary mimetype="type/subtype"/> to solve the scenarios where you provide a Uri.

For dataUri content, the mimeType is picked automatically from the data:mimeType schema.

RogerBarreto · 2025-05-30T10:05:52Z

Updated PR Description

Motivation and Context

Enhance the Chat Prompt XML parsing capability to also support audio and documents.

Fixes Expanding ChatPromptParser to handle other content types #11044

Description

The following 2 contents are now supported from the Chat Prompt XML:

AudioContent - Parses the tag <audio mimetype="type/subtype"> with either Base64 data URIs or standard URIs, similar to ImageContent.
BinaryContent - Parses the tag <binary mimetype="type/subtype"> with either Base64 data URIs or standard URIs, similar to ImageContent.

The mimetype attribute is optional, and can be omitted for Base64 data URIs.

Here is a sample:

<message role='user'>
  This part will be discarded upon parsing
  <text>Summarize all the contents I provided in this message.</text>
  <image mimetype="image/png">https://fake-link-to-image/</image>
  <audio>data:audio/wav;base64,UklGRiQAAAB...</audio>
  <binary>data:application/pdf;base64,UklGRiQAAAB...</binary>
  <binary mimetype="application/pdf">https://fake-link-to-pdf/</binary>  
  <binary>data:application/msword;base64,UklGRiQAAAB...</binary>
  <binary mimetype="octet/stream">https://fake-link-to-binary/</binary>
</message>

Contribution Checklist

The code builds clean without any errors or warnings
The PR follows the SK Contribution Guidelines and the pre-submission formatting script raises no violations
All unit tests pass, and I have added new tests where possible
I didn't break anyone 😄

SergeyMenshykh · 2025-05-30T13:32:01Z

dotnet/src/SemanticKernel.Abstractions/AI/ChatCompletion/ChatPromptParser.cs

+    /// <param name="content">Base64 encoded content or URI.</param>
+    /// <param name="mimeType">Optional MIME type of the content.</param>
+    /// <returns>A new instance of <typeparamref name="T"/> with <paramref name="content"/></returns>
+    private static T CreateBinaryContent<T>(string content, string? mimeType) where T : BinaryContent, new()


nit: move this private method down the file after the public one so that all private methods are grouped together.

glorious-beard added 2 commits May 6, 2025 13:10

feat: added handling for audio, pdf, docx, doc, and generic binary co…

32d48eb

…ntent This is to support microsoft#11044.

fix: switched to factory function for creating kernel content.

3e7be45

glorious-beard requested a review from a team as a code owner May 6, 2025 20:38

glorious-beard changed the title ~~Glorious-beard/11044-expand-chat-prompt-parser~~ .Net: Add support for audio, pdf, doc, and docx to chat prompt parser May 6, 2025

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

b40bc84

RogerBarreto added ai connector Anything related to AI connectors needs discussion Issues that require discussion by the internal Semantic Kernel team before proceeding labels May 8, 2025

RogerBarreto assigned RogerBarreto and glorious-beard May 8, 2025

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

15d1751

markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel kernel.core labels May 8, 2025

glorious-beard and others added 4 commits May 9, 2025 11:45

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

b02ac5e

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

d0f7f2b

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

f7384d8

Abstract chat parser update

ab7def9

RogerBarreto removed the needs discussion Issues that require discussion by the internal Semantic Kernel team before proceeding label May 30, 2025

Merge branch 'main' into glorious-beard/11044-expand-chat-prompt-parser

4380df4

RogerBarreto temporarily deployed to integration May 30, 2025 09:41 — with GitHub Actions Inactive

SergeyMenshykh approved these changes May 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

.Net: Add support for audio, pdf, doc, and docx to chat prompt parser #11919

.Net: Add support for audio, pdf, doc, and docx to chat prompt parser #11919

Uh oh!

glorious-beard commented May 6, 2025 •

edited by RogerBarreto

Loading

Uh oh!

RogerBarreto commented May 30, 2025

Uh oh!

RogerBarreto commented May 30, 2025

Uh oh!

SergeyMenshykh May 30, 2025

Uh oh!

Uh oh!

.Net: Add support for audio, pdf, doc, and docx to chat prompt parser #11919

Are you sure you want to change the base?

.Net: Add support for audio, pdf, doc, and docx to chat prompt parser #11919

Uh oh!

Conversation

glorious-beard commented May 6, 2025 • edited by RogerBarreto Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Why is this change required?

What problem does it solve?

What scenario does it contribute to?

Open Issues Addressed

Description

Chat Prompt Parser

Amazon Bedrock

OpenAI

Contribution Checklist

Uh oh!

RogerBarreto commented May 30, 2025

Uh oh!

RogerBarreto commented May 30, 2025

Updated PR Description

Motivation and Context

Description

Contribution Checklist

Uh oh!

SergeyMenshykh May 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glorious-beard commented May 6, 2025 •

edited by RogerBarreto

Loading