Skip to content

Commit dac06ee

Browse files
committed
Updated Lab 4 for new data editor
1 parent 90ff036 commit dac06ee

10 files changed

+1699
-29
lines changed

04_normal_distribution/img/.gitignore

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
transform_dq_cal_fat.paint
2+
summary_dialog.paint
3+
reshape.paint
4+
scatterplot_single.paint
5+
simulate_single.paint
6+
summary_data.paint
7+
all_data_wide.paint
8+
calculate_proportion.paint
9+
fastfood_data_view.paint
10+
hist_calories_each_level.paint
11+
hist_fle.paint
12+
merge.paint
13+
multiple_sim_generator.paint
14+
normal_probability_plot_GUI.paint
15+
npp_all.paint
16+
npp_all_graph.paint
17+
all_data_long.paint
18+
.DS_Store
-23.5 KB
Loading
8.24 KB
Loading
602 Bytes
Loading
Loading
Loading

04_normal_distribution/img/npp_all_graph.svg

Lines changed: 1632 additions & 0 deletions
Loading
-23.5 KB
Loading

04_normal_distribution/normal_distribution_rguroo.Rmd

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ generate random numbers from a normal distribution.
2626
This week you'll be working with fast food data. This data set contains data on
2727
515 menu items from some of the most popular fast food restaurants worldwide.
2828

29-
As usual, find the *fastfood* dataset in the `OpenIntro` Repository, view the information about the dataset by clicking ![info](../icon_images/info.png), and then import the dataset to your **Data** toolbox. Then, view the **Dataset Summary** and **View** the data in Rguroo's Data Viewer.
29+
As usual, find the *fastfood* dataset in the `OpenIntro` Repository, view the information about the dataset by clicking ![info](../icon_images/info.png), and then import the dataset to your **Data** toolbox. Then, in the **Data** toolbox, double-click on the dataset to view it in the dataset editor. In the dataset editor, you can also view a summary of the data by clicking on the **Summary Statistic** icon ![data summary icon](../icon_images/data_summary_icon.png).
3030

3131
```{r fastfood_data_view, echo=FALSE, results="asis" , fig.align = "center", fig.cap = "*A portion of the fast food dataset*", out.width="90%"}
3232
knitr::include_graphics("img/fastfood_data_view.png")
@@ -186,11 +186,20 @@ knitr::include_graphics("img/scatterplot_single.png")
186186

187187
Now that we know how to create a normal probability plot in the **Scatterplot** function, let's go through the five steps.
188188

189-
In Step 1, we create a dataset with a single variable that consists of the calories from fat for the menu items at Dairy Queen restaurants. You can use the **Subset** function to do this. But to see an alternative method, let's use the **Data Editor** itself. In the **Data** toolbox, right-click the dataset *fastfood* and select *Open*. On the very right, select ![filters](../icon_images/filter.png) and open *restaurant*. Click *Select All* to de-select everything, then in the text box enter "Dairy" and check the *Dairy Queen* box that shows up. You should now see only the 42 menu items from the Dairy Queen restaurant. Then, on the very right, select ![columns](../icon_images/columns.png) to choose the columns in the dataset. Click the checkbox next to the *Search* function to de-select everything, then check the box next to *cal_fat*, as shown in the dialog below. You should now see only the 42 values of the *cal_fat* variable. Finally, click `Add Rows/Variables` ![add rows/variables](../icon_images/add_column.png) and in the dialog that pops up, select `Variable Properties`. Select the *cal_fat* variable and change its `Name` to *DQ*.
190-
**Save** your final dataset as *DQ_cal_fat*.
189+
In Step 1, we create a dataset with a single variable that consists of the calories from fat for the menu items at Dairy Queen restaurants. You can use the **Subset** function to do this. But to see an alternative method, let's use the **Dataset Editor** itself.
191190

192-
```{r transform_dq_cal_fa, echo=FALSE, results = "asis", fig.align = "center", fig.cap = "*Getting the values of cal_fat for only Dairy Queen restaurants*", out.width="75%"}
193-
knitr::include_graphics("img/transform_dq_cal_fat.png")
191+
- In the **Data** toolbox, right-click the dataset *fastfood* and select *Edit*. This opens the *fastfood* data in the Dataset Editor.
192+
193+
- On the very right, select ![filters](../icon_images/filters.png) and click the *restaurant* dropdown.
194+
195+
- Click `(Select All)` to de-select everything, then in the search textbox enter "Dairy" to locate *Dairy Queen* and check the box that shows up. This will remove all resturants from the dataset, except for Dairy Queen. You should now see only the 42 menu items from the Dairy Queen restaurant.
196+
197+
- On the very right, select ![columns](../icon_images/columns.png) to choose the columns in the dataset. Click the checkbox next to the *Search* textbox to de-select everything, then check the box next to *cal_fat*, as shown in the dialog below. You should now see only the 42 values of the *cal_fat* variable.
198+
199+
- Finally, in the Save As textbox enter *DQ_cal_fat* as the name of the dataset, and click the Save as ![save as](../icon_images/save_as_button.png) button to save the dataset.
200+
201+
```{r transform_dq_cal_fa, echo=FALSE, results = "asis", fig.align = "center", fig.cap = "*A portion of the DQ_cal_fat dataset*", out.width="75%"}
202+
knitr::include_graphics("img/dq_cal_fat_dataset.png")
194203
```
195204

196205
In Step 2, we simulate 8 samples of size 42 from a normal distribution with mean 260.48 and standard deviation 156.49. To do this, we use the **Multiple Distribution Generator** function the same way that we simulated a single sample, except here we change the value of 1 in the `Replications` box to 8. The dialog box is shown below. Click the `Preview` button ![eye](../icon_images/preview.png). You should see a dataset with 42 rows and 8 columns with variable names *sim_1*, *sim_2*, ..., *sim_8*. Each column is a sample of size 42 from the normal distribution with mean of 260.48 and standard deviation of 156.49. **Save** this dataset as *normal_sims*.
@@ -221,7 +230,7 @@ To change the data from a wide format to a long format, we use the **Reshape** f
221230
knitr::include_graphics("img/reshape.png")
222231
```
223232

224-
Portions of the *all_data_long* dataset are shown below. This dataset has 378 rows since we stacked 9 columns of size 42. The first 42 rows are the Dairy Queen data, identified by *DQ* in the identification variable *Sample*. Then rows 43 to 84 consist of data from the *sim_1* variable, rows 85 to 126 consist of data from the *sim_2* variable, and so on. The last 42 rows are values from the *sim_8* variable.
233+
Portions of the *all_data_long* dataset are shown below. This dataset has 378 rows since we stacked 9 columns of size 42. The first 42 rows represent calories from fat for the 42 actual menu items from the Dairy Queen restaurant, identified by the variable *cal_fat* in the Sample identification variable. Rows 43 to 84 correspond to data from the *sim_1* variable, rows 85 to 126 correspond to the *sim_2* variable, and so on. The last 42 rows contain values from the *sim_8* variable.
225234

226235
```{r all_data_long, echo=FALSE, results="asis" , fig.align = "center", fig.cap = "*A portion of the all_data_long dataset*", out.width="65%"}
227236
knitr::include_graphics("img/all_data_long.png")
@@ -234,10 +243,10 @@ We are now ready to create the normal probability plots. Open the **Scatterplot*
234243
knitr::include_graphics("img/npp_all.png")
235244
```
236245

237-
The figure below shows the 9 normal probability plots. We have changed the dots' color for the Dairy Queen data in the **Factor Level Editor** so it stands out.
246+
The figure below displays the 9 normal probability plots. To distinguish the Dairy Queen data, we have altered the color of its dots. Additionally, we have made some adjustments to make the plot more visually appealing. Specifically, in the **Details** section, we have reduced the number of tick labels on the y-axis. Furthermore, in the **Factor Level Editor**, we have decreased the point size for all variables.
238247

239-
```{r npp_all_graph, echo=FALSE, results="asis" , fig.align = "center", fig.cap = "*Normal probability plots of Dairy Queen and eight simulated samples*", out.width="85%"}
240-
knitr::include_graphics("img/npp_all_graph.png")
248+
```{r npp_all_graph, echo=FALSE, results="asis" , fig.align = "center", fig.cap = "*Normal probability plots of fat calories for Dairy Queen and eight simulated samples*", out.width="85%"}
249+
knitr::include_graphics("img/npp_all_graph.svg")
241250
```
242251

243252
4. Does the normal probability plot for the calories from fat for the Dairy Queen restaurant look similar to the plots created for the simulated data? That is, do the plots provide evidence that the fat calories for Dairy Queen are nearly normal?
@@ -268,7 +277,7 @@ knitr::include_graphics("img/normalcalc1-1.png")
268277

269278
You can also see how the probability corresponds to the area under the normal density curve by checking the `Graph` box. When you `Preview` ![eye](../icon_images/preview.png) the output, you will see a graph showing the distribution of the variable, in which the gold shaded region visually displays the probability as an area under the density curve.
270279

271-
```{r pnorm 2, echo=FALSE, results = "asis", fig.cap = "*The theoretical probability that a Dairy Queen item has more than 600 calories from fat*"}
280+
```{r pnorm 2, echo=FALSE, fig.align = "center", results = "asis", fig.cap = "*The theoretical probability that a Dairy Queen item has more than 600 calories from fat*"}
272281
knitr::include_graphics("img/normalcalc1-output.png")
273282
```
274283

@@ -277,9 +286,9 @@ probability. If we want to calculate the probability empirically, we simply
277286
need to determine how many observations fall above 600 then divide this number
278287
by the total sample size.
279288

280-
There are a variety of ways to do this in Rguroo. Probably the easiest way to do this is with the **Transform** dialog. Recall that the fat calories for Dairy Queen were saved in the dataset *DQ_cal_fat* in the variable *DQ*. In the **Transform** dialog, select the *DQ_cal_fat* dataset; you should see the variable *DQ* on the left column. Click the ![plus](../icon_images/add.png) sign, and in the middle panel type ```sum(DQ > 600) / length(DQ)```. Note that here we add a logical variable. Rguroo interprets TRUE as 1 and FALSE as 0, so the statement ```sum(DQ > 600)``` is essentially counting the number of Dairy Queen items with more than 600 calories. The R code ```length(DQ)``` gives the number of Dairy Queen items. The ratio of these two values gives us the proportion of Dairy Queen items with more than 600 calories. Move the *DQ* variable to `Excluded Variable` section, as we don't need to see its values, and make sure to check `Complete Cases Only`. Otherwise, you will see a whole bunch of NA's below the proportion value. The figure below shows the dialog. Click the `Preview` button ![eye](../icon_images/preview.png), and you will see the result.
289+
There are a variety of ways to do this in Rguroo. Probably the easiest way to do this is with the **Transform** dialog. Recall that the fat calories for Dairy Queen were saved in the dataset *DQ_cal_fat* in the variable *cal_fat*. In the **Transform** dialog, select the *DQ_cal_fat* dataset; you should see the variable *cal_fat* on the left column. Click the ![plus](../icon_images/add.png) sign, and in the middle panel type ```sum(cal_fat > 600) / length(cal_fat)```. Note that here we add a logical variable. Rguroo interprets TRUE as 1 and FALSE as 0, so the statement ```sum(cal_fat > 600)``` is essentially counting the number of Dairy Queen items with more than 600 calories. The R code ```length(cal_fat)``` gives the number of Dairy Queen items. The ratio of these two values gives us the proportion of Dairy Queen items with more than 600 calories. Move the *cal_fat* variable to `Excluded Variable` section, as we don't need to see its values, and make sure to check `Complete Cases Only`. Otherwise, you will see a whole bunch of NA's below the proportion value. The figure below shows the dialog. Click the `Preview` button ![eye](../icon_images/preview.png), and you will see the result.
281290

282-
```{r calculate_proportion, echo=FALSE, results = "asis", fig.cap = "*Calculating the empirical probability that a Dairy Queen menu item has over 600 calories from fat*"}
291+
```{r calculate_proportion, echo=FALSE, fig.align = "center", results = "asis", fig.cap = "*Calculating the empirical probability that a Dairy Queen menu item has over 600 calories from fat*"}
283292
knitr::include_graphics("img/calculate_proportion.png")
284293
```
285294

04_normal_distribution/normal_distribution_rguroo.html

Lines changed: 28 additions & 17 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)