|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": { |
| 6 | + "id": "JoWilTNGKcrO" |
| 7 | + }, |
| 8 | + "source": [ |
| 9 | + "**Generating Ground-truth FAQIR dataset for model training**" |
| 10 | + ] |
| 11 | + }, |
| 12 | + { |
| 13 | + "cell_type": "code", |
| 14 | + "execution_count": 1, |
| 15 | + "metadata": { |
| 16 | + "id": "jop-QMVBD5a1" |
| 17 | + }, |
| 18 | + "outputs": [], |
| 19 | + "source": [ |
| 20 | + "import sys\n", |
| 21 | + "sys.path.insert(0, '../../../BERT-FAQ/')\n", |
| 22 | + "\n", |
| 23 | + "# import class to generate training data\n", |
| 24 | + "from training_data_generator import Training_Data_Generator\n", |
| 25 | + "\n", |
| 26 | + "# import utility functions\n", |
| 27 | + "from shared.utils import load_from_json\n", |
| 28 | + "from shared.utils import dump_to_json\n", |
| 29 | + "from shared.utils import make_dirs" |
| 30 | + ] |
| 31 | + }, |
| 32 | + { |
| 33 | + "cell_type": "code", |
| 34 | + "execution_count": 2, |
| 35 | + "metadata": { |
| 36 | + "id": "V3my3-bmcPF6" |
| 37 | + }, |
| 38 | + "outputs": [], |
| 39 | + "source": [ |
| 40 | + "# read and parse FAQIRv1.0.xml\n", |
| 41 | + "filepath = '../../../BERT-FAQ/data/FAQIR/query_answer_pairs.json'\n", |
| 42 | + "\n", |
| 43 | + "query_answer_pairs = load_from_json(filepath)" |
| 44 | + ] |
| 45 | + }, |
| 46 | + { |
| 47 | + "cell_type": "code", |
| 48 | + "execution_count": 3, |
| 49 | + "metadata": {}, |
| 50 | + "outputs": [ |
| 51 | + { |
| 52 | + "data": { |
| 53 | + "text/plain": [ |
| 54 | + "[{'answer': 'I have had good luck with commercial products available at auto supply stores for just a few dollars. Below are some additional suggestions if you want to try DIY:',\n", |
| 55 | + " 'id': 1,\n", |
| 56 | + " 'label': 1,\n", |
| 57 | + " 'query_type': 'faq',\n", |
| 58 | + " 'question': 'How to remove tree sap from car? Hi,I have some tree sap on my car, both older (pretty dry) and some newer from a redwood tree.Any tips on how to remove it?'},\n", |
| 59 | + " {'answer': \"Goo Gone is a fantastic product (available at Target and most any hardware store or drug store) and works amazingly well on even pesky sticker residue. Soak the area, and wipe it away. It might take repeated applications, but you can remove the sticker without scratching, and without leaving a gooey residue.Note that Goo Gone can affect some surfaces, including some plastics, so you should be careful, especially if it is clear plastic. I've used Goo Gone many times on plastic glassic and CD cases, so it will certainly work on many plastics, but you should be careful. In the event that Goo Gone seems too risky, you can frequently remove just the adhesive residue using plain old Scotch tape. Remove as much of the sticker as possible (the paper/plastic backing) and then apply tape to the sticker residue firmly. Pull the tape off sharply, and some of the residue will come with it. This is a slow process so only do it if the area is relatively small, however, I frequently use this tactic for price tags and the like and it is surprisingly effective.\",\n", |
| 60 | + " 'id': 2,\n", |
| 61 | + " 'label': 1,\n", |
| 62 | + " 'query_type': 'faq',\n", |
| 63 | + " 'question': 'How do I get a sticker off of a car window (plastic part)? Nail polish remover seems risky?'},\n", |
| 64 | + " {'answer': '1. Inspect the black rubber cover that fits in the disposal. Sometimes bits of garbage collect under the flaps and create a smell. 2. Clean the rubber cover, if necessary. If yours lifts out, remove it and clean away debris with a scrub brush and warm, soapy water. If your rubber cover is installed permanently, lift up each flap and clean it with soapy water and an old toothbrush. 3. Deodorize the disposal. Cut a lemon in half and drop the fruit and a handful of baking soda into the disposal. Turn on the cold water faucet and then the disposal. The unit will clean itself as it grinds up the mixture.',\n", |
| 65 | + " 'id': 3,\n", |
| 66 | + " 'label': 1,\n", |
| 67 | + " 'query_type': 'faq',\n", |
| 68 | + " 'question': 'How to get rid of garbage disposal odor?'},\n", |
| 69 | + " {'answer': \"To properly maintain your lawnmover, you'll need to do the things below. If you use use your lawnmover year round, you might need to do maintaince twice a year.1) Changing the oil (for non-electric)2) Removing dirt, grass, and grime from the underside3) Cleaning the air filter4) Cleaning the spark plugsYou'll need the following tools to do the job:1) Putty knife2) Socket wrench3) File or sharpening tool4) Wire brush5) 30-weight oilIt actually didn't sound so hard to maintain. The most important safety concern is to remember to remove the spark plug before working on the blade.You'll find the instructions on the home depot site.\",\n", |
| 70 | + " 'id': 4,\n", |
| 71 | + " 'label': 1,\n", |
| 72 | + " 'query_type': 'faq',\n", |
| 73 | + " 'question': 'How to maintain a lawn mower?'},\n", |
| 74 | + " {'answer': 'Click on the link and follow the steps.',\n", |
| 75 | + " 'id': 5,\n", |
| 76 | + " 'label': 1,\n", |
| 77 | + " 'query_type': 'faq',\n", |
| 78 | + " 'question': 'How to fix vertical blinds? I can no longer re-hang them because they are broken.'}]" |
| 79 | + ] |
| 80 | + }, |
| 81 | + "execution_count": 3, |
| 82 | + "metadata": {}, |
| 83 | + "output_type": "execute_result" |
| 84 | + } |
| 85 | + ], |
| 86 | + "source": [ |
| 87 | + "# check first 5 entries\n", |
| 88 | + "query_answer_pairs[:5]" |
| 89 | + ] |
| 90 | + }, |
| 91 | + { |
| 92 | + "cell_type": "code", |
| 93 | + "execution_count": 4, |
| 94 | + "metadata": {}, |
| 95 | + "outputs": [], |
| 96 | + "source": [ |
| 97 | + "####################### Triplet Loss ########################\n", |
| 98 | + "# simple Type\n", |
| 99 | + "tdg = Training_Data_Generator(\n", |
| 100 | + " random_seed=5, num_samples=24, hard_filepath='../../../BERT-FAQ/data/FAQIR/', \n", |
| 101 | + " neg_type='simple', query_type='faq', loss_type='triplet'\n", |
| 102 | + ")\n", |
| 103 | + "tdg.generate_triplet_dataset(\n", |
| 104 | + " query_answer_pairs=query_answer_pairs, \n", |
| 105 | + " output_path='../../../BERT-FAQ/data/FAQIR/'\n", |
| 106 | + ")\n", |
| 107 | + "\n", |
| 108 | + "\n", |
| 109 | + "# hard Type\n", |
| 110 | + "tdg = Training_Data_Generator(\n", |
| 111 | + " random_seed=5, num_samples=24, hard_filepath='../../../BERT-FAQ/data/FAQIR/', \n", |
| 112 | + " neg_type='hard', query_type='faq', loss_type='triplet'\n", |
| 113 | + ")\n", |
| 114 | + "tdg.generate_triplet_dataset(\n", |
| 115 | + " query_answer_pairs=query_answer_pairs, \n", |
| 116 | + " output_path='../../../BERT-FAQ/data/FAQIR/'\n", |
| 117 | + ")\n", |
| 118 | + "\n", |
| 119 | + "\n", |
| 120 | + "tdg = Training_Data_Generator(\n", |
| 121 | + " random_seed=5, num_samples=24, hard_filepath='../../../BERT-FAQ/data/FAQIR/', \n", |
| 122 | + " neg_type='simple', query_type='user_query', loss_type='triplet'\n", |
| 123 | + ")\n", |
| 124 | + "tdg.generate_triplet_dataset(\n", |
| 125 | + " query_answer_pairs=query_answer_pairs, \n", |
| 126 | + " output_path='../../../BERT-FAQ/data/FAQIR/'\n", |
| 127 | + ")\n", |
| 128 | + "\n", |
| 129 | + "\n", |
| 130 | + "# hard Type\n", |
| 131 | + "tdg = Training_Data_Generator(\n", |
| 132 | + " random_seed=5, num_samples=24, hard_filepath='../../../BERT-FAQ/data/FAQIR/', \n", |
| 133 | + " neg_type='hard', query_type='user_query', loss_type='triplet'\n", |
| 134 | + ")\n", |
| 135 | + "tdg.generate_triplet_dataset(\n", |
| 136 | + " query_answer_pairs=query_answer_pairs, \n", |
| 137 | + " output_path='../../../BERT-FAQ/data/FAQIR/'\n", |
| 138 | + ")\n", |
| 139 | + "\n", |
| 140 | + "\n", |
| 141 | + "####################### Softmax Loss ########################\n", |
| 142 | + "# simple Type\n", |
| 143 | + "tdg = Training_Data_Generator(\n", |
| 144 | + " random_seed=5, num_samples=24, hard_filepath='../../../BERT-FAQ/data/FAQIR/', \n", |
| 145 | + " neg_type='simple', query_type='faq', loss_type='softmax'\n", |
| 146 | + ")\n", |
| 147 | + "tdg.generate_triplet_dataset(\n", |
| 148 | + " query_answer_pairs=query_answer_pairs, \n", |
| 149 | + " output_path='../../../BERT-FAQ/data/FAQIR/'\n", |
| 150 | + ")\n", |
| 151 | + "\n", |
| 152 | + "# hard Type\n", |
| 153 | + "tdg = Training_Data_Generator(\n", |
| 154 | + " random_seed=5, num_samples=24, hard_filepath='../../../BERT-FAQ/data/FAQIR/', \n", |
| 155 | + " neg_type='hard', query_type='faq', loss_type='softmax'\n", |
| 156 | + ")\n", |
| 157 | + "tdg.generate_triplet_dataset(\n", |
| 158 | + " query_answer_pairs=query_answer_pairs, \n", |
| 159 | + " output_path='../../../BERT-FAQ/data/FAQIR/'\n", |
| 160 | + ")\n", |
| 161 | + "\n", |
| 162 | + "# simple Type\n", |
| 163 | + "tdg = Training_Data_Generator(\n", |
| 164 | + " random_seed=5, num_samples=24, hard_filepath='../../../BERT-FAQ/data/FAQIR/', \n", |
| 165 | + " neg_type='simple', query_type='user_query', loss_type='softmax'\n", |
| 166 | + ")\n", |
| 167 | + "tdg.generate_triplet_dataset(\n", |
| 168 | + " query_answer_pairs=query_answer_pairs, \n", |
| 169 | + " output_path='../../../BERT-FAQ/data/FAQIR/'\n", |
| 170 | + ")\n", |
| 171 | + "\n", |
| 172 | + "# hard Type\n", |
| 173 | + "tdg = Training_Data_Generator(\n", |
| 174 | + " random_seed=5, num_samples=24, hard_filepath='../../../BERT-FAQ/data/FAQIR/', \n", |
| 175 | + " neg_type='hard', query_type='user_query', loss_type='softmax'\n", |
| 176 | + ")\n", |
| 177 | + "tdg.generate_triplet_dataset(\n", |
| 178 | + " query_answer_pairs=query_answer_pairs, \n", |
| 179 | + " output_path='../../../BERT-FAQ/data/FAQIR/'\n", |
| 180 | + ")\n" |
| 181 | + ] |
| 182 | + } |
| 183 | + ], |
| 184 | + "metadata": { |
| 185 | + "accelerator": "GPU", |
| 186 | + "colab": { |
| 187 | + "collapsed_sections": [], |
| 188 | + "name": "01_Generating_Ground_Truth_Dataset_for_FAQIR.ipynb", |
| 189 | + "provenance": [] |
| 190 | + }, |
| 191 | + "kernelspec": { |
| 192 | + "display_name": "Python 3", |
| 193 | + "language": "python", |
| 194 | + "name": "python3" |
| 195 | + }, |
| 196 | + "language_info": { |
| 197 | + "codemirror_mode": { |
| 198 | + "name": "ipython", |
| 199 | + "version": 3 |
| 200 | + }, |
| 201 | + "file_extension": ".py", |
| 202 | + "mimetype": "text/x-python", |
| 203 | + "name": "python", |
| 204 | + "nbconvert_exporter": "python", |
| 205 | + "pygments_lexer": "ipython3", |
| 206 | + "version": "3.8.5" |
| 207 | + } |
| 208 | + }, |
| 209 | + "nbformat": 4, |
| 210 | + "nbformat_minor": 1 |
| 211 | +} |
0 commit comments