Skip to content

Commit 23c144c

Browse files
committed
Updated Intro to LLMs
1 parent 21dc20f commit 23c144c

File tree

8 files changed

+71
-19
lines changed

8 files changed

+71
-19
lines changed

Module 9 - GenAI (LLMs and Prompt Engineering)/1. Intro to LLMs/Introduction to LLMs.ipynb renamed to Module 9 - GenAI (LLMs and Prompt Engineering)/2. Intro to LLMs/Introduction to LLMs.ipynb

+71-19
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,8 @@
11
{
22
"cells": [
33
{
4-
"attachments": {},
54
"cell_type": "markdown",
6-
"id": "b323bf88-cf1d-43b4-abb0-0b7c35bb5f42",
5+
"id": "37b379a0-1a90-4495-8fb2-0daa640c8dff",
76
"metadata": {},
87
"source": [
98
"# **Introduction to LLMs**\n",
@@ -16,24 +15,42 @@
1615
"1. Sequence to Sequence Model.\n",
1716
"2. Has two main components: Encoder and Decoder\n",
1817
"3. An **encoder** which is tasked with taking in raw text, splitting them up into its core components, convert them into vectors and using **self-attention** to understand the context of the text.\n",
19-
"4. Transformer's self attention mechanism allows each word to \"attend to\" al other words in the sequence which enables it to capture long-term dependencies and contextual relationships between words. The goal is to understand each word as it relates to the other tokens in the input text.\n",
20-
"5. A **decoder** excels at generating text by using a modified type of attention (i.e. **cross attention**) to predict the next best token.\n",
21-
"6. Transformers are **trained** to solve a specific NLP task called as **Language Modeling**.\n",
22-
"7. **Limitation:** Transformers are still limited to an input context window (i.e. maximum length og text it can process at any given moment)\n",
23-
"\n",
18+
"4. A **decoder** excels at generating text by using a modified type of attention (i.e. **cross attention**) to predict the next best token.\n",
19+
"5. Transformers are **trained** to solve a specific NLP task called as **Language Modeling**.\n",
20+
"6. **Why not RNNs? -** Transformer's self attention mechanism allows each word to \"attend to\" all other words in the sequence which enables it to capture long-term dependencies and contextual relationships between words. The goal is to understand each word as it relates to the other tokens in the input text.\n",
21+
"7. **Limitation:** Transformers are still limited to an input context window (i.e. maximum length og text it can process at any given moment)"
22+
]
23+
},
24+
{
25+
"cell_type": "markdown",
26+
"id": "7b00ce99-138d-4b96-ab80-f24928eb4029",
27+
"metadata": {},
28+
"source": [
2429
"### **Attention**\n",
2530
"1. It is a mechanism that assigns different weights to different parts of the input allowing the model to prioritize and emphasize the most important information while performing tasks like translation or summarization.\n",
26-
"2. Attention allows a model to focus on different parts of the input dynamically, leading to improved performance.\n",
27-
"\n",
31+
"2. Attention allows a model to focus on different parts of the input dynamically, leading to improved performance."
32+
]
33+
},
34+
{
35+
"cell_type": "markdown",
36+
"id": "d0f9ef33-5a42-4a32-8c70-d9016a509812",
37+
"metadata": {},
38+
"source": [
2839
"### **What is Language Modeling?**\n",
2940
"1. Language Modeling involves creation of statistical/deep learning models for predicting the likelyhood of a sequence of tokens in a specified vocabulary.\n",
3041
"2. Two types of Language Modeling Tasks are: \n",
3142
" a. Autoencoding Task \n",
3243
" b. Autoregressive Task \n",
3344
"3. **Autoregressive Language Models** are trained to predict the next token in a sentence, based on the previous tokens in the phrase. These models correspond to the **decoder** part of the transformer model. A mask is applied on the full sentence so that the attention head can only see the tokens that came before. These models are ideal for text generatation. For eg: **GPT**\n",
3445
"4. **Autoencoding Language Models** are trained to reconstruct the original sentence from a corrupted version of the input. These models correspond to the **encoder** part of the transformer model. Full input is passed. No mask is applied. Autoencoding models create a bidirectional representation of the whole sentence. They can be fine-tuned for a variety of tasks, but their main application is sentence classification or token classification. For eg: **BERT**\n",
35-
"5. **Combination of autoregressive and autoencoding language models** are more versatile and flexible in generating text. It has been shown that the combination models can generate more diverse and creative text in different context compared to pure decode-based autoregressive models due to their ability to capture additional context using the encoder. For eg: **T5**\n",
36-
"\n",
46+
"5. **Combination of autoregressive and autoencoding language models** are more versatile and flexible in generating text. It has been shown that the combination models can generate more diverse and creative text in different context compared to pure decode-based autoregressive models due to their ability to capture additional context using the encoder. For eg: **T5**"
47+
]
48+
},
49+
{
50+
"cell_type": "markdown",
51+
"id": "e904d47f-152a-48c9-9911-5e28a2add988",
52+
"metadata": {},
53+
"source": [
3754
"### **LLMs are:**\n",
3855
"1. Usually derived from Transformer architecture (but nor necesserily) by training on large amount of text data.\n",
3956
"2. Designed to understand and generate human language, code, and much more.\n",
@@ -42,7 +59,15 @@
4259
"5. Techniques like: Stop word removal, stemming, and truncation are not used nor are they necessary for LLMs. LLMs are designed to handle the inherent complexity and variability of human language, including the use of stop words and variations in word forms like tenses and misspellings.\n",
4360
"6. Every LLM on the market has been **pre-trained** on a large corpus of the text data and on a specific language modeling related tasks.\n",
4461
"7. **Remember:** How an LLM is **pre-trained** and **fine-tuned** makes all the difference.\n",
45-
"\n",
62+
"8. **How to decide whether to train our own embeddings or use pre-trained embeddings?** - A good rule of thumb is to compute the vocabulary overlap. If the overlap between the vocabulary of our custom domain and that of pre-trained word embeddings is significant, pre-trained word embeddings tends to give good results.\n",
63+
"9. **One more important factor to consider while deploying models with embeddings-based feature extraction approach:** - Remember that learned or pre-trained embedding models have to be stored and loaded into memory while using these approaches. If the model itself is bulky, we need to factor this into our deployment needs."
64+
]
65+
},
66+
{
67+
"cell_type": "markdown",
68+
"id": "c23e6806-74e3-4041-8aa4-a20f8df2b0a9",
69+
"metadata": {},
70+
"source": [
4671
"### **Pre-Training, Transfer Learning and Fine-Tuning**\n",
4772
"<img style=\"float: right;\" width=\"400\" height=\"400\" src=\"data/images/transfer_learning.jpeg\">\n",
4873
"\n",
@@ -56,13 +81,25 @@
5681
" **b.** Aggregate some training data. \n",
5782
" **c.** Compute loss and gradients. \n",
5883
" **d.** Update the model via backpropogation. \n",
59-
"4. The Transformers package from Hugging Face provides a neat and clean interface for training and fine-tuning LLMs.\n",
60-
"\n",
84+
"4. The Transformers package from Hugging Face provides a neat and clean interface for training and fine-tuning LLMs."
85+
]
86+
},
87+
{
88+
"cell_type": "markdown",
89+
"id": "cc65c3cf-2acf-42d1-9307-17b35f38ea07",
90+
"metadata": {},
91+
"source": [
6192
"### **Alignment in LLMs**\n",
6293
"1. Alignment in Language Model refers to how well the model can respond to the input prompts that match the user's expectations. Put another way, an aligned LLM has an objective that matches a human's objective.\n",
6394
"2. A popular method of aligning language model is through the incorporation of Reinforcement Learning into the training loop.\n",
64-
"3. Reinforcement Learning with Human Feedback (RLHF) is a popular method of aligning pre-trained LLMs that uses human feedback to enhance their performance.\n",
65-
"\n",
95+
"3. Reinforcement Learning with Human Feedback (RLHF) is a popular method of aligning pre-trained LLMs that uses human feedback to enhance their performance."
96+
]
97+
},
98+
{
99+
"cell_type": "markdown",
100+
"id": "8795ba78-62dc-4569-b4c1-e9f78803e1da",
101+
"metadata": {},
102+
"source": [
66103
"### **Popular Modern LLMs**\n",
67104
"\n",
68105
"#### **1. BERT (Bidirectional Encoder Representation from Transformers)**\n",
@@ -120,13 +157,28 @@
120157
"1. Text Classification\n",
121158
"2. Text Summarization\n",
122159
"3. Chatbots\n",
123-
"4. Information Retreival\n",
124-
"\n",
160+
"4. Information Retreival"
161+
]
162+
},
163+
{
164+
"cell_type": "markdown",
165+
"id": "4ded9146-c262-428f-9042-23f463695a03",
166+
"metadata": {},
167+
"source": [
125168
"### **Quick Summary**\n",
126169
"1. What really sets the Transformers appart from other deep learning architectures is its ability to capture long-term dependencies and relationships between tokens using attention mechanism.\n",
127170
"2. Attention is the crucial component of Transformer.\n",
128171
"3. Factor behind transformer's effectiveness as a language model is it is highly parallelizable, allowing for faster training and efficient processing of text.\n",
129-
"4. LLMs are pre-trained on large corpus and fine-tuned on smaller datasets for specific tasks.\n"
172+
"4. LLMs are pre-trained on large corpus and fine-tuned on smaller datasets for specific tasks."
173+
]
174+
},
175+
{
176+
"cell_type": "markdown",
177+
"id": "0781caf1-a87f-4217-b6cb-4906b0a6de55",
178+
"metadata": {},
179+
"source": [
180+
"### **Prompt Engineering**\n",
181+
"If you are wondering what is the best way to talk to ChatGPT and GPT-4 to get optimal results, we will cover that under Prompt Engineering."
130182
]
131183
},
132184
{

0 commit comments

Comments
 (0)