aws-samples · tbrand · Apr 12, 2025 · Apr 13, 2025 · Apr 13, 2025 · Apr 14, 2025
diff --git a/README.md b/README.md
@@ -77,6 +77,10 @@ GenU provides a variety of standard use cases leveraging generative AI. These us
         <td>Diagram Generation</td>
         <td>Diagram generation visualizes text and content on any topic using optimal diagrams. It allows for easy text-based diagram creation, enabling efficient creation of flowcharts and other diagrams even for non-programmers and non-designers.</td>
       </tr>
+      <tr>
+        <td>Voice Chat</td>
+        <td>In Voice Chat, you can have a bidirectional voice chat with generative AI. Similar to natural conversation, you can also interrupt and speak while the AI is talking. Also, by setting a system prompt, you can have voice conversations with AI that has specific roles.</td>
+      </tr>
     </tbody>
   </table>
 </details>

diff --git a/README_ja.md b/README_ja.md
@@ -75,6 +75,10 @@ GenU は生成 AI を活用した多様なユースケースを標準で提供
         <td>ダイアグラム生成</td>
         <td>ダイアグラム生成は、あらゆるトピックに関する文章や内容を最適な図を用いて視覚化します。 テキストベースで簡単に図を生成でき、プログラマーやデザイナーでなくても効率的にフローチャートなどの図を作成できます。</td>
       </tr>
+      <tr>
+        <td>音声チャット</td>
+        <td>音声チャットでは生成 AI と双方向の音声によるチャットが可能です。自然な会話と同様、AI の発言中に割り込んで話すこともできます。また、システムプロンプトを設定することで、特定の役割を持った AI と音声で会話することもできます。</td>
+      </tr>
     </tbody>
   </table>
 </details>

diff --git a/docs/assets/images/usecase_voice_chat.gif b/docs/assets/images/usecase_voice_chat.gif
diff --git a/docs/en/DEPLOY_ON_CLOUDSHELL.md b/docs/en/DEPLOY_ON_CLOUDSHELL.md
@@ -79,4 +79,4 @@ When deployment is complete, the CloudFront URL will be displayed. You can acces
 2. cdk.json settings are applied next
 
 Note that to execute these steps, you also need to enable the models from [Amazon Bedrock Model access](https://console.aws.amazon.com/bedrock/home#/modelaccess).
-Confirm that the models specified in modelIds, imageGenerationModelIds, and videoGenerationModelIds in the modelRegion of the configuration file (parameter.ts or cdk.json) are enabled.
+Confirm that the models specified in modelIds, imageGenerationModelIds, videoGenerationModelIds, and speechToSpeechModelIds in the modelRegion of the configuration file (parameter.ts or cdk.json) are enabled.
diff --git a/docs/en/DEPLOY_OPTION.md b/docs/en/DEPLOY_OPTION.md
@@ -611,6 +611,15 @@ const envs: Record<string, Partial<StackInput>> = {
 }
 ```
 
+### Enabling Voice Chat Use Case
+
+> [!NOTE]
+> The response speed of voice chat is greatly affected by the application's region (the region where GenerativeAiUseCasesStack is deployed). If there is a delay in response, please check if the user is physically located close to the application's region.
+
+This is enabled when you define one or more models in `speechToSpeechModelIds`.
+For `speechToSpeechModelIds`, please refer to [Changing Amazon Bedrock Models](#change-amazon-bedrock-models).
+For default values, please refer to [packages/cdk/lib/stack-input.ts](/packages/cdk/lib/stack-input.ts).
+
 ### Enabling Image Generation Use Case
 
 This is enabled when you define one or more models in `imageGenerationModelIds`.
@@ -717,6 +726,7 @@ const envs: Record<string, Partial<StackInput>> = {
       video: true, // Hide video generation
       videoAnalyzer: true, // Hide video analysis
       diagram: true, // Hide diagram generation
+      voiceChat: true, // Hide voice chat
     },
   },
 };
@@ -737,7 +747,8 @@ const envs: Record<string, Partial<StackInput>> = {
       "image": true,
       "video": true,
       "videoAnalyzer": true,
-      "diagram": true
+      "diagram": true,
+      "voiceChat": true
     }
   }
 }
@@ -771,7 +782,7 @@ const envs: Record<string, Partial<StackInput>> = {
 
 ## Change Amazon Bedrock Models
 
-Specify the model region and models in `parameter.ts` or `cdk.json` using `modelRegion`, `modelIds`, `imageGenerationModelIds`, and `videoGenerationModelIds`. For `modelIds`, `imageGenerationModelIds`, and `videoGenerationModelIds`, specify a list of models you want to use from those available in the specified region. AWS documentation provides a [list of models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html) and [model support by region](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html).
+Specify the model region and models in `parameter.ts` or `cdk.json` using `modelRegion`, `modelIds`, `imageGenerationModelIds`, `videoGenerationModelIds`, and `speechToSpeechModelIds`. For `modelIds`, `imageGenerationModelIds`, `videoGenerationModelIds`, and `speechToSpeechModelIds`, specify a list of models you want to use from those available in the specified region. AWS documentation provides a [list of models](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html) and [model support by region](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html).
 
 The solution also supports [cross-region inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference-support.html) models. Cross-region inference models are represented as `{us|eu|apac}.{model-provider}.{model-name}` and must match the `{us|eu|apac}` prefix with the region specified in modelRegion.
 
@@ -838,6 +849,12 @@ This solution supports the following text generation models:
 "apac.amazon.nova-micro-v1:0"
 ```
 
+This solution supports the following speech-to-speech models:
+
+```
+amazon.nova-sonic-v1:0
+```
+
 This solution supports the following image generation models:
 
 ```
@@ -865,7 +882,7 @@ This solution supports the following video generation models:
 
 ### Using Models from Multiple Regions Simultaneously
 
-By default, GenU uses models from the `modelRegion`. If you want to use the latest models that are only available in certain regions, you can specify `{modelId: '<model name>', region: '<region code>'}` in `modelIds`, `imageGenerationModelIds`, or `videoGenerationModelIds` to call that specific model from the specified region.
+By default, GenU uses models from the `modelRegion`. If you want to use the latest models that are only available in certain regions, you can specify `{modelId: '<model name>', region: '<region code>'}` in `modelIds`, `imageGenerationModelIds`, `videoGenerationModelIds`, or `speechToSpeechModelIds` to call that specific model from the specified region.
 
 > [!NOTE]
 > When using both the [monitoring dashboard](#enabling-monitoring-dashboard) and models from multiple regions, the default dashboard settings will not display prompt logs for models outside the primary region (`modelRegion`).
@@ -913,6 +930,9 @@ const envs: Record<string, Partial<StackInput>> = {
       'amazon.nova-reel-v1:0',
       { modelId: 'luma.ray-v2:0', region: 'us-west-2' },
     ],
+    speechToSpeechModelIds: [
+      { modelId: 'amazon.nova-sonic-v1:0', region: 'us-east-1' },
+    ],
   },
 };
 ```
@@ -976,6 +996,12 @@ const envs: Record<string, Partial<StackInput>> = {
         "region": "us-west-2"
       }
     ]
+    "speechToSpeechModelIds": [
+      {
+        "modelId": "amazon.nova-sonic-v1:0",
+        "region": "us-east-1"
+      }
+    ]
   }
 }
 ```
@@ -1011,6 +1037,7 @@ const envs: Record<string, Partial<StackInput>> = {
       'stability.stable-diffusion-xl-v1',
     ],
     videoGenerationModelIds: ['amazon.nova-reel-v1:1'],
+    speechToSpeechModelIds: ['amazon.nova-sonic-v1:0'],
   },
 };
 ```
@@ -1042,7 +1069,8 @@ const envs: Record<string, Partial<StackInput>> = {
       "amazon.titan-image-generator-v1",
       "stability.stable-diffusion-xl-v1"
     ],
-    "videoGenerationModelIds": ["amazon.nova-reel-v1:1"]
+    "videoGenerationModelIds": ["amazon.nova-reel-v1:1"],
+    "speechToSpeechModelIds": ["amazon.nova-sonic-v1:0"]
   }
 }
 ```

diff --git a/docs/ja/DEPLOY_ON_CLOUDSHELL.md b/docs/ja/DEPLOY_ON_CLOUDSHELL.md
@@ -79,4 +79,4 @@ deploy.sh は以下のオプションをサポートしています：
 2. cdk.json の設定が次に適用されます
 
 なお、これらの手順を実行する場合も [Amazon Bedrock の Model access](https://console.aws.amazon.com/bedrock/home#/modelaccess) から利用するモデルの有効化が必要です。
-使用する設定ファイル（parameter.ts または cdk.json）の modelRegion において modelIds と imageGenerationModelIds と videoGenerationModelIds で指定されたモデルが有効化されているかを確認してください。
+使用する設定ファイル（parameter.ts または cdk.json）の modelRegion において modelIds と imageGenerationModelIds と videoGenerationModelIds と speechToSpeechModelIds で指定されたモデルが有効化されているかを確認してください。
diff --git a/docs/ja/DEPLOY_OPTION.md b/docs/ja/DEPLOY_OPTION.md
@@ -626,6 +626,15 @@ const envs: Record<string, Partial<StackInput>> = {
 }
 ```
 
+### 音声チャットユースケースの有効化
+
+> [!NOTE]
+> 音声チャットの反応速度はアプリケーションのリージョン (GenerativeAiUseCasesStack がデプロイされたリージョン) に大きく影響を受けます。反応が遅延する場合は、ユーザーがアプリケーションのリージョンと物理的に近い距離にいるかを確認してください。
+
+`speechToSpeechModelIds` にモデルを 1 つ以上定義すると有効化されます。
+`speechToSpeechModelIds` に関しては [Amazon Bedrock のモデルを変更する](#amazon-bedrock-のモデルを変更する) をご参照ください。
+デフォルト値は [packages/cdk/lib/stack-input.ts](/packages/cdk/lib/stack-input.ts) をご参照ください。
+
 ### 画像生成ユースケースの有効化
 
 `imageGenerationModelIds` にモデルを 1 つ以上定義すると有効化されます。
@@ -732,6 +741,7 @@ const envs: Record<string, Partial<StackInput>> = {
       video: true, // 動画生成を非表示
       videoAnalyzer: true, // 映像分析を非表示
       diagram: true, // ダイアグラム生成を非表示
+      voiceChat: true, // 音声チャットを非表示
     },
   },
 };
@@ -752,7 +762,8 @@ const envs: Record<string, Partial<StackInput>> = {
       "image": true,
       "video": true,
       "videoAnalyzer": true,
-      "diagram": true
+      "diagram": true,
+      "voiceChat": true
     }
   }
 }
@@ -786,7 +797,7 @@ const envs: Record<string, Partial<StackInput>> = {
 
 ## Amazon Bedrock のモデルを変更する
 
-`parameter.ts` もしくは `cdk.json` の `modelRegion`, `modelIds`, `imageGenerationModelIds`, `videoGenerationModelIds` でモデルとモデルのリージョンを指定します。`modelIds` と `imageGenerationModelIds` と `videoGenerationModelIds` は指定したリージョンで利用できるモデルの中から利用したいモデルのリストで指定してください。AWS ドキュメントに、[モデルの一覧](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html)と[リージョン別のモデルサポート一覧](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html)があります。
+`parameter.ts` もしくは `cdk.json` の `modelRegion`, `modelIds`, `imageGenerationModelIds`, `videoGenerationModelIds`, `speechToSpeechModelIds` でモデルとモデルのリージョンを指定します。`modelIds` と `imageGenerationModelIds` と `videoGenerationModelIds` と `speechToSpeechModelIds` は指定したリージョンで利用できるモデルの中から利用したいモデルのリストで指定してください。AWS ドキュメントに、[モデルの一覧](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html)と[リージョン別のモデルサポート一覧](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html)があります。
 
 また、[cross-region inference](https://docs.aws.amazon.com/bedrock/latest/userguide/cross-region-inference-support.html)のモデルに対応しています。cross-region inference のモデルは `{us|eu|apac}.{model-provider}.{model-name}` で表されるモデルで、設定した modelRegion で指定したリージョンの `{us|eu|apac}` と一致している必要があります。
 
@@ -853,6 +864,12 @@ const envs: Record<string, Partial<StackInput>> = {
 "apac.amazon.nova-micro-v1:0"
 ```
 
+このソリューションが対応している speech-to-speech モデルは以下です。
+
+```
+amazon.nova-sonic-v1:0
+```
+
 このソリューションが対応している画像生成モデルは以下です。
 
 ```
@@ -880,7 +897,7 @@ const envs: Record<string, Partial<StackInput>> = {
 
 ### 複数のリージョンのモデルを同時に利用する
 
-GenU では、特に指定がない限り`modelRegion`のモデルを使用します。一部リージョンのみで利用可能な最新モデル等を使いたい場合、`modelIds`または`imageGenerationModelIds`または`videoGenerationModelIds`に`{modelId: '<モデル名>', region: '<リージョンコード>'}`を指定することで、そのモデルのみ指定したリージョンから呼び出すことができます。
+GenU では、特に指定がない限り`modelRegion`のモデルを使用します。一部リージョンのみで利用可能な最新モデル等を使いたい場合、`modelIds`または`imageGenerationModelIds`または`videoGenerationModelIds`または`speechToSpeechModelIds`に`{modelId: '<モデル名>', region: '<リージョンコード>'}`を指定することで、そのモデルのみ指定したリージョンから呼び出すことができます。
 
 > [!NOTE] > [モニタリング用ダッシュボード](#モニタリング用のダッシュボードの有効化)と複数リージョンのモデル利用を併用する場合、デフォルトのダッシュボード設定では主リージョン（`modelRegion`で指定したリージョン）以外のモデルのプロンプトログが表示されません。
 >
@@ -927,6 +944,9 @@ const envs: Record<string, Partial<StackInput>> = {
       'amazon.nova-reel-v1:0',
       { modelId: 'luma.ray-v2:0', region: 'us-west-2' },
     ],
+    speechToSpeechModelIds: [
+      { modelId: 'amazon.nova-sonic-v1:0', region: 'us-east-1' },
+    ],
   },
 };
 ```
@@ -985,6 +1005,12 @@ const envs: Record<string, Partial<StackInput>> = {
         "modelId": "luma.ray-v2:0",
         "region": "us-west-2"
       }
+    ],
+    "speechToSpeechModelIds": [
+      {
+        "modelId": "amazon.nova-sonic-v1:0",
+        "region": "us-east-1"
+      }
     ]
   }
 }
@@ -1020,6 +1046,7 @@ const envs: Record<string, Partial<StackInput>> = {
       'stability.stable-diffusion-xl-v1',
     ],
     videoGenerationModelIds: ['amazon.nova-reel-v1:1'],
+    speechToSpeechModelIds: ['amazon.nova-sonic-v1:0'],
   },
 };
 ```
@@ -1051,7 +1078,8 @@ const envs: Record<string, Partial<StackInput>> = {
       "amazon.titan-image-generator-v1",
       "stability.stable-diffusion-xl-v1"
     ],
-    "videoGenerationModelIds": ["amazon.nova-reel-v1:1"]
+    "videoGenerationModelIds": ["amazon.nova-reel-v1:1"],
+    "speechToSpeechModelIds": ["amazon.nova-sonic-v1:0"]
   }
 }
 ```

diff --git a/docs/overrides/home_en.html b/docs/overrides/home_en.html
@@ -202,6 +202,22 @@ <h3 class="mb-2 text-xl font-semibold">Flow Chat</h3>
             </p>
           </div>
         </div>
+        <div class="swiper-slide">
+          <div class="rounded-lg bg-white p-6 shadow-lg">
+            <img
+              src="../assets/images/usecase_voice_chat.gif"
+              alt="Text Generation"
+              class="mb-4 w-full rounded-lg" />
+            <h3 class="mb-2 text-xl font-semibold">Voice Chat</h3>
+            <p class="text-sm text-gray-600">
+              In Voice Chat, you can have a bidirectional voice chat with
+              generative AI. Similar to natural conversation, you can also
+              interrupt and speak while the AI is talking. Also, by setting a
+              system prompt, you can have voice conversations with AI that has
+              specific roles.
+            </p>
+          </div>
+        </div>
         <div class="swiper-slide">
           <div class="rounded-lg bg-white p-6 shadow-lg">
             <img

diff --git a/docs/overrides/home_ja.html b/docs/overrides/home_ja.html
@@ -193,6 +193,21 @@ <h3 class="mb-2 text-xl font-semibold">Flow チャット</h3>
             </p>
           </div>
         </div>
+        <div class="swiper-slide">
+          <div class="rounded-lg bg-white p-6 shadow-lg">
+            <img
+              src="../assets/images/usecase_voice_chat.gif"
+              alt="Text Generation"
+              class="mb-4 w-full rounded-lg" />
+            <h3 class="mb-2 text-xl font-semibold">音声チャット</h3>
+            <p class="text-sm text-gray-600">
+              音声チャットでは生成 AI
+              と双方向の音声によるチャットが可能です。自然な会話と同様、AI
+              の発言中に割り込んで話すこともできます。また、システムプロンプトを設定することで、特定の役割を持った
+              AI と音声で会話することもできます。
+            </p>
+          </div>
+        </div>
         <div class="swiper-slide">
           <div class="rounded-lg bg-white p-6 shadow-lg">
             <img