Skip to content

Commit 79588c6

Browse files
committed
2 parents 56d14a7 + 9e831b9 commit 79588c6

File tree

1 file changed

+62
-44
lines changed

1 file changed

+62
-44
lines changed

main.ipynb

Lines changed: 62 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@
66
"provenance": [],
77
"gpuType": "T4",
88
"mount_file_id": "15KTDpG-Cy2JIQo_r4uFYGOYv3cuuySLE",
9-
"authorship_tag": "ABX9TyPcHpr7YSquXs5tHG6vgXBC"
9+
"authorship_tag": "ABX9TyOOFv7bxULf3jxYdyCciRs+",
10+
"include_colab_link": true
1011
},
1112
"kernelspec": {
1213
"name": "python3",
@@ -18,6 +19,16 @@
1819
"accelerator": "GPU"
1920
},
2021
"cells": [
22+
{
23+
"cell_type": "markdown",
24+
"metadata": {
25+
"id": "view-in-github",
26+
"colab_type": "text"
27+
},
28+
"source": [
29+
"<a href=\"https://colab.research.google.com/github/therohitdas/Youtube-Transcript-Generator/blob/main/main.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
30+
]
31+
},
2132
{
2233
"cell_type": "markdown",
2334
"source": [
@@ -48,7 +59,11 @@
4859
"\n",
4960
"## Environment Variables 🌐\n",
5061
"\n",
51-
"- `GOOGLE_API_KEY`: Set up your Google API key for video information retrieval. You will need to create a Project in the google cloud for this and enable the YouTube v3 API. This is optional, if you don't add it, the chapters will not be added.\n",
62+
"- `YOUTUBE_API_KEY`: Set up your Google API key for video information retrieval. You will need to create a Project in the google cloud for this and enable the YouTube v3 API. This is optional, if you don't add it, the chapters will not be added.\n",
63+
"\n",
64+
"## Runtime\n",
65+
"Please go to `Runtime > Change runtime type > Select T4 GPU`\n",
66+
"This will ensure best performance. Without a gpu, the punctuation will be very slow and can take minutes.\n",
5267
"\n",
5368
"## Script Parameters 📜\n",
5469
"```python\n",
@@ -96,12 +111,53 @@
96111
"execution_count": null,
97112
"outputs": []
98113
},
114+
{
115+
"cell_type": "markdown",
116+
"source": [
117+
"**Example Usage:**\n",
118+
"```python\n",
119+
"url = 'https://www.youtube.com/watch?v=YOUR_VIDEO_ID' # youtu.be link works too\n",
120+
"language = 'en'\n",
121+
"punctuated = True # Default False, takes significantly more time when enabled on CPU, use T4 GPU type in google collab.\n",
122+
"output_dir = '.' # add /content/drive/MyDrive/ to save content in You Google Drive\n",
123+
"filename = \"\" # Leave empty for default filename: Video Title or Video Id\n",
124+
"punctuation_model = '' # More info down below\n",
125+
"verbose = True # To get logs\n",
126+
"```\n",
127+
"`language` use the language code to get the video. By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated.\n",
128+
"\n",
129+
"`punctuation_model` values can be found at https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large#languages\n",
130+
"\n",
131+
"After filling the cell below, press `CMD+F9` / `CTRL+F9` to run all cells."
132+
],
133+
"metadata": {
134+
"id": "U5fmwoG6UFDd"
135+
}
136+
},
137+
{
138+
"cell_type": "code",
139+
"source": [
140+
"url = 'https://www.youtube.com/watch?v=YOUR_VIDEO_ID'\n",
141+
"language = 'en'\n",
142+
"punctuated = True # Default False, takes significantly more time when enabled on CPU, use T4 GPU type in google collab.\n",
143+
"output_dir = '.' # add /content/drive/MyDrive/ to save content in You Google Drive, In the cell below, Uncomment the mount line\n",
144+
"filename = \"\" # Leave empty for default filename: Video Title or Video Id\n",
145+
"punctuation_model = ''\n",
146+
"verbose = True"
147+
],
148+
"metadata": {
149+
"id": "5CT6UxWtUYOn"
150+
},
151+
"execution_count": null,
152+
"outputs": []
153+
},
99154
{
100155
"cell_type": "code",
101156
"source": [
102157
"# Run this if you want to mount and store generated files in google drive.\n",
103158
"from google.colab import drive\n",
104159
"\n",
160+
"# Uncomment this:\n",
105161
"# drive.mount(\"/content/drive\")"
106162
],
107163
"metadata": {
@@ -117,12 +173,12 @@
117173
"import logging\n",
118174
"import re\n",
119175
"import math\n",
176+
"import nltk\n",
120177
"import youtube_transcript_api\n",
121178
"from deepmultilingualpunctuation import PunctuationModel\n",
122-
"import nltk\n",
123-
"import warnings\n",
124179
"import googleapiclient.discovery\n",
125180
"import googleapiclient.errors\n",
181+
"\n",
126182
"from google.colab import userdata\n",
127183
"import warnings"
128184
],
@@ -298,7 +354,7 @@
298354
"def getVideoInfo (video_id):\n",
299355
" try:\n",
300356
" # Set up Google API credentials using API key\n",
301-
" api_key = userdata.get('GOOGLE_API_KEY') # Replace with your actual API key\n",
357+
" api_key = userdata.get('YOUTUBE_API_KEY') # Replace with your actual API key\n",
302358
" youtube = googleapiclient.discovery.build(\"youtube\", \"v3\", developerKey=api_key)\n",
303359
" request = youtube.videos().list(part=\"id,snippet\",\n",
304360
" id = video_id\n",
@@ -318,44 +374,6 @@
318374
"execution_count": null,
319375
"outputs": []
320376
},
321-
{
322-
"cell_type": "markdown",
323-
"source": [
324-
"## Example Usage:\n",
325-
"```python\n",
326-
"url = 'https://www.youtube.com/watch?v=YOUR_VIDEO_ID' # youtu.be link works too\n",
327-
"language = 'en'\n",
328-
"punctuated = True # Default False, takes significantly more time when enabled on CPU, use T4 GPU type in google collab.\n",
329-
"output_dir = '.' # add /content/drive/MyDrive/ to save content in You Google Drive\n",
330-
"filename = \"\" # Leave empty for default filename: Video Title or Video Id\n",
331-
"punctuation_model = '' # More info down below\n",
332-
"verbose = True # To get logs\n",
333-
"```\n",
334-
"`language` use the language code to get the video. By default this module always picks manually created transcripts over automatically created ones, if a transcript in the requested language is available both manually created and generated.\n",
335-
"\n",
336-
"`punctuation_model` values can be found at https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large#languages"
337-
],
338-
"metadata": {
339-
"id": "U5fmwoG6UFDd"
340-
}
341-
},
342-
{
343-
"cell_type": "code",
344-
"source": [
345-
"url = 'https://www.youtube.com/watch?v=CBYhVcO4WgI'\n",
346-
"language = 'en'\n",
347-
"punctuated = True # Default False, takes significantly more time when enabled on CPU, use T4 GPU type in google collab.\n",
348-
"output_dir = '.' # add /content/drive/MyDrive/ to save content in You Google Drive\n",
349-
"filename = \"\" # Leave empty for default filename: Video Title or Video Id\n",
350-
"punctuation_model = ''\n",
351-
"verbose = True"
352-
],
353-
"metadata": {
354-
"id": "5CT6UxWtUYOn"
355-
},
356-
"execution_count": null,
357-
"outputs": []
358-
},
359377
{
360378
"cell_type": "code",
361379
"source": [
@@ -394,4 +412,4 @@
394412
"outputs": []
395413
}
396414
]
397-
}
415+
}

0 commit comments

Comments
 (0)