Skip to content

Commit 8bdfdb0

Browse files
committed
some typos
1 parent 4b01996 commit 8bdfdb0

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+38565
-7
lines changed

04-explore-categorical.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -630,7 +630,7 @@ county %>%
630630
Based on Figure \@ref(fig:countyIncomeRidgeMulti), what can you say about how median household income in counties vary depending on population gain/no gain, metropolitan area/not, and median degree?[^explore-categorical-6]
631631
:::
632632

633-
[^explore-categorical-6]: The ridge plot give us a better sense of the shape, and especially modality, of the data.
633+
[^explore-categorical-6]: Regardless of the location (metropolitan or not) or change in population, it seems like there is an increase in median household income from individuals with only a HS diploma, to individuals with some college, to individuals with a Bachelor's degree.
634634

635635
\vspace{20mm}
636636

08-model-mlr.Rmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -322,13 +322,13 @@ Interpret the coefficient of the variable `credit_checks`.[^model-mlr-4]
322322
[^model-mlr-4]: All else held constant, for each additional inquiry into the applicant's credit during the last 12 months, we would expect the interest rate for the loan to be higher, on average, by 0.23 points.
323323

324324
::: {.guidedpractice data-latex=""}
325-
Compute the residual of the first observation in Table \@ref(tab:loans-data-matrix) on page using the full model.[^model-mlr-5]
325+
Compute the residual of the first observation in Table \@ref(tab:loans-data-matrix) using the full model.[^model-mlr-5]
326326
:::
327327

328328
[^model-mlr-5]: To compute the residual, we first need the predicted value, which we compute by plugging values into the equation from earlier.
329-
For example, $\texttt{verified_income}_{\texttt{Source Verified}}$ takes a value of 0, $\texttt{verified_income}_{\texttt{Verified}}$ takes a value of 1 (since the borrower's income source and amount were verified), was 18.01, and so on.
330-
This leads to a prediction of $\widehat{\texttt{interest_rate}}_1 = 18.09$.
331-
The observed interest rate was 14.07%, which leads to a residual of $e_1 = 14.07 - 18.09 = -4.02$.
329+
For example, $\texttt{verified_income}_{\texttt{Source Verified}}$ takes a value of 0, $\texttt{verified_income}_{\texttt{Verified}}$ takes a value of 1 (since the borrower's income source and amount were verified), $\texttt{debt_to_income}$ was 18.01, and so on.
330+
This leads to a prediction of $\widehat{\texttt{interest_rate}}_1 = 17.84$.
331+
The observed interest rate was 14.07%, which leads to a residual of $e_1 = 14.07 - 17.84 = -3.77$.
332332

333333
::: {.workedexample data-latex=""}
334334
We calculated a slope coefficient of 0.74 for `bankruptcy` in Section \@ref(ind-and-cat-predictors) while the coefficient is 0.39 here.
@@ -418,7 +418,7 @@ There were n = 10,000 auctions in the dataset and $k=9$ predictor variables in t
418418
Use $n$, $k$, and the variances from the earlier Guided Practice to calculate $R_{adj}^2$ for the interest rate model.[^model-mlr-8]
419419
:::
420420

421-
[^model-mlr-8]: $R_{adj}^2 = 1 - \frac{18.53}{25.01}\times \frac{10000-1}{1000-9-1} = 0.2584$.
421+
[^model-mlr-8]: $R_{adj}^2 = 1 - \frac{18.53}{25.01}\times \frac{10000-1}{10000-9-1} = 0.2584$.
422422
While the difference is very small, it will be important when we fine tune the model in the next section.
423423

424424
::: {.guidedpractice data-latex=""}

20-inference-two-means.Rmd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -446,7 +446,7 @@ Compute the standard error of the point estimate for the average difference betw
446446

447447
::: {.workedexample data-latex=""}
448448
Complete the hypothesis test started in the previous Example and Guided Practice on `births14` dataset and research question.
449-
Use a significance level of $\alpha=0.05.$ For reference, $\bar{x}_{n} - \bar{x}_{s} = `r xbar_difference`,$ $SE = `r se_difference`,$ and the sample sizes were $n_n = 100$ and $n_s = 50.$
449+
Use a significance level of $\alpha=0.05.$ For reference, $\bar{x}_{n} - \bar{x}_{s} = `r xbar_difference`,$ $SE = `r se_difference`,$ and the sample sizes were $n_n = `r n_nonsmoker`$ and $n_s = `r n_smoker`.$
450450

451451
------------------------------------------------------------------------
452452

HWchp6.Rmd

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
title: "HW Chp6"
3+
author: "Jo Hardin"
4+
date: "3/19/2021"
5+
output: html_document
6+
---
7+
8+
## Inv 4.1 Which samples of size 3 are "more extreme" than what we observed?
9+
10+
(a) 53, 56, 64
11+
(b) 55, 55, 56
12+
(c) 55, 55, 64
13+
(d) 55, 56, 64
14+
15+
## Look at Jessica's book, how does she frame her thought questions? Add questions like that to the begining of each case study.
16+
17+
## Problem 1 (MDSR)
18+
19+
A categorical variable is XXX.
20+
Show scatter plot with continuous x and y.
21+
how can you differentiate the points based on the categorical variable?
22+
23+
## Problem 2 (MDSR - exactly copied): Find two graphs published in a newspaper or on the internet in the last two years.
24+
25+
Identify a graphical display that you find compelling.
26+
What aspects of the display work well, and how do these relate to the principles established in this chapter?
27+
Include a screen shot of the display along with your solution.
28+
29+
Identify a graphical display that you find less than compelling.
30+
What aspects of the display don't work well?
31+
Are there ways that the display might be improved?
32+
Include a screen shot of the display along with your solution.
33+
34+
## Problem 3: WGOITG
35+
36+
<https://www.nytimes.com/column/whats-going-on-in-this-graph>
37+
38+
## Problem 4: steal some of Ben's viz links? All are too hard.
39+
40+
<http://mdsr-book.github.io/exercises.html#exercise_25>
41+
42+
## Problem 4: WEB Du Bois (please, please, let's include!) can we get the liscences???
43+
44+
<https://github.com/ajstarks/dubois-data-portraits/tree/master/challenge>
45+
46+
- What do the colors tell us?
47+
- What do the labels add?
48+
49+
faceted bar plots in challenge 2: <https://github.com/ajstarks/dubois-data-portraits/tree/master/challenge/challenge02>

duke-house.R

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
library(tidyverse)
2+
library(tidymodels)
3+
4+
# read data --------------------------------------------------------------------
5+
6+
df <- read_csv("duke-forest.csv")
7+
8+
# visualise --------------------------------------------------------------------
9+
10+
ggplot(df, aes(x = area, y = price)) +
11+
geom_point() +
12+
geom_vline(xintercept = 2825, color = "gray")
13+
14+
ggplot(df, aes(x = lot, y = price)) +
15+
geom_point() +
16+
geom_vline(xintercept = 0.52, color = "gray")
17+
18+
ggplot(df, aes(x = bed, y = price)) +
19+
geom_point() +
20+
geom_vline(xintercept = 3, color = "gray")
21+
22+
ggplot(df, aes(x = price)) +
23+
geom_histogram() +
24+
geom_vline(xintercept = 400000, color = "gray")
25+
26+
ggplot(df, aes(x = log10(price))) +
27+
geom_histogram()
28+
29+
# model ------------------------------------------------------------------------
30+
31+
set.seed(4595)
32+
33+
preds <- c("bed", "bath", "area", "year_built", "lot")
34+
35+
df <- df %>%
36+
select(price, all_of(preds)) %>%
37+
na.omit()
38+
39+
data_split <- initial_split(df, strata = "price", prop = 0.75)
40+
41+
df_train <- training(data_split)
42+
df_test <- testing(data_split)
43+
44+
rf_defaults <- rand_forest(mode = "regression")
45+
46+
rf_mod <- rf_defaults %>%
47+
set_engine("ranger")
48+
49+
rf_fit <- rf_mod %>%
50+
fit_xy(
51+
x = df_train[, preds],
52+
y = df_train$price
53+
)
54+
55+
rf_fit
56+
57+
# predict ----------------------------------------------------------------------
58+
59+
# training set predictions
60+
61+
rf_training_pred <-
62+
predict(rf_fit, df_train) %>%
63+
# Add the true outcome data back in
64+
bind_cols(df_train %>% select(price))
65+
66+
rf_training_pred %>%
67+
rmse(truth = price, .pred)
68+
69+
# testing set predictions
70+
71+
rf_testing_pred <-
72+
predict(rf_fit, df_test) %>%
73+
# Add the true outcome data back in
74+
bind_cols(df_test %>% select(price))
75+
76+
rf_testing_pred %>%
77+
rmse(truth = price, .pred)
78+
79+
harvey15 <- tibble(
80+
bed = 3,
81+
bath = 3.5,
82+
area = 2825,
83+
year_built = 1982,
84+
lot = 0.52
85+
)
86+
87+
predict(rf_fit, new_data = harvey15)
88+
89+
# cross validation -------------------------------------------------------------
90+
91+
set.seed(345)
92+
folds <- vfold_cv(df_train, v = 5)
93+
folds
94+
95+
rf_wf <-
96+
workflow() %>%
97+
add_model(rf_mod) %>%
98+
add_formula(price ~ .)
99+
100+
set.seed(456)
101+
rf_fit_rs <-
102+
rf_wf %>%
103+
fit_resamples(folds)
104+
105+
rf_fit_rs
106+
collect_metrics(rf_fit_rs, summarize = FALSE)
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
^ims\.Rproj$
2+
^\.Rproj\.user$
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.Rproj.user
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
Version: 1.0
2+
3+
RestoreWorkspace: No
4+
SaveWorkspace: No
5+
AlwaysSaveHistory: Default
6+
7+
EnableCodeIndexing: Yes
8+
UseSpacesForTab: Yes
9+
NumSpacesForTab: 2
10+
Encoding: UTF-8
11+
12+
RnwWeave: knitr
13+
LaTeX: pdfLaTeX
14+
15+
AutoAppendNewline: Yes
16+
StripTrailingWhitespace: Yes
17+
LineEndingConversion: Posix
18+
19+
BuildType: Package
20+
PackageUseDevtools: Yes
21+
PackageInstallArgs: --no-multiarch --with-keep.source
22+
PackageRoxygenize: rd,collate,namespace

images/boot1prop3.png

-89.2 KB
Loading

images/boot1prop3_old.png

374 KB
Loading

images/boot1propboth.png

-171 KB
Loading

images/boot1propboth_old.png

700 KB
Loading

libs/bootstrap-4.5.3/bootstrap.bundle.min.js

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

libs/bootstrap-4.5.3/bootstrap.bundle.min.js.map

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

libs/bootstrap-4.5.3/bootstrap.min.css

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Binary file not shown.

0 commit comments

Comments
 (0)