OpenIntroStat
diff --git a/‎04-explore-categorical.Rmd
Lines changed: 1 addition & 1 deletion b/‎04-explore-categorical.Rmd
Lines changed: 1 addition & 1 deletion
diff --git a/‎08-model-mlr.Rmd
Lines changed: 5 additions & 5 deletions b/‎08-model-mlr.Rmd
Lines changed: 5 additions & 5 deletions
diff --git a/‎20-inference-two-means.Rmd
Lines changed: 1 addition & 1 deletion b/‎20-inference-two-means.Rmd
Lines changed: 1 addition & 1 deletion
diff --git a/‎HWchp6.Rmd
Lines changed: 49 additions & 0 deletions b/‎HWchp6.Rmd
Lines changed: 49 additions & 0 deletions
diff --git a/‎duke-house.R
Lines changed: 106 additions & 0 deletions b/‎duke-house.R
Lines changed: 106 additions & 0 deletions
diff --git a/‎https:/github.com/hardin47/blogdownwebsite.git/.Rbuildignore
Lines changed: 2 additions & 0 deletions b/‎https:/github.com/hardin47/blogdownwebsite.git/.Rbuildignore
Lines changed: 2 additions & 0 deletions
diff --git a/‎https:/github.com/hardin47/blogdownwebsite.git/.gitignore
Lines changed: 1 addition & 0 deletions b/‎https:/github.com/hardin47/blogdownwebsite.git/.gitignore
Lines changed: 1 addition & 0 deletions
diff --git a/‎https:/github.com/hardin47/blogdownwebsite.git/ims.Rproj
Lines changed: 22 additions & 0 deletions b/‎https:/github.com/hardin47/blogdownwebsite.git/ims.Rproj
Lines changed: 22 additions & 0 deletions
diff --git a/‎images/boot1prop3.png
-89.2 KB b/‎images/boot1prop3.png
-89.2 KB
diff --git a/‎images/boot1prop3_old.png
374 KB b/‎images/boot1prop3_old.png
374 KB
diff --git a/‎images/boot1propboth.png
-171 KB b/‎images/boot1propboth.png
-171 KB
diff --git a/‎images/boot1propboth_old.png
700 KB b/‎images/boot1propboth_old.png
700 KB
diff --git a/‎libs/bootstrap-4.5.3/bootstrap.bundle.min.js
Lines changed: 7 additions & 0 deletions b/‎libs/bootstrap-4.5.3/bootstrap.bundle.min.js
Lines changed: 7 additions & 0 deletions
diff --git a/‎libs/bootstrap-4.5.3/bootstrap.bundle.min.js.map
Lines changed: 1 addition & 0 deletions b/‎libs/bootstrap-4.5.3/bootstrap.bundle.min.js.map
Lines changed: 1 addition & 0 deletions
diff --git a/‎libs/bootstrap-4.5.3/bootstrap.min.css
Lines changed: 1 addition & 0 deletions b/‎libs/bootstrap-4.5.3/bootstrap.min.css
Lines changed: 1 addition & 0 deletions
diff --git a/‎libs/bootstrap-4.5.3/fonts/bootstrap/glyphicons-halflings-regular.eot
19.7 KB b/‎libs/bootstrap-4.5.3/fonts/bootstrap/glyphicons-halflings-regular.eot
19.7 KB
@@ -630,7 +630,7 @@ county %>%
 Based on Figure \@ref(fig:countyIncomeRidgeMulti), what can you say about how median household income in counties vary depending on population gain/no gain, metropolitan area/not, and median degree?[^explore-categorical-6]
 :::
 
-[^explore-categorical-6]: The ridge plot give us a better sense of the shape, and especially modality, of the data.
+[^explore-categorical-6]: Regardless of the location (metropolitan or not) or change in population, it seems like there is an increase in median household income from individuals with only a HS diploma, to individuals with some college, to individuals with a Bachelor's degree.
 
 \vspace{20mm}
 
 
@@ -322,13 +322,13 @@ Interpret the coefficient of the variable `credit_checks`.[^model-mlr-4]
 [^model-mlr-4]: All else held constant, for each additional inquiry into the applicant's credit during the last 12 months, we would expect the interest rate for the loan to be higher, on average, by 0.23 points.
 
 ::: {.guidedpractice data-latex=""}
-Compute the residual of the first observation in Table \@ref(tab:loans-data-matrix) on page using the full model.[^model-mlr-5]
+Compute the residual of the first observation in Table \@ref(tab:loans-data-matrix) using the full model.[^model-mlr-5]
 :::
 
 [^model-mlr-5]: To compute the residual, we first need the predicted value, which we compute by plugging values into the equation from earlier.
-    For example, $\texttt{verified_income}_{\texttt{Source Verified}}$ takes a value of 0, $\texttt{verified_income}_{\texttt{Verified}}$ takes a value of 1 (since the borrower's income source and amount were verified), was 18.01, and so on.
-    This leads to a prediction of $\widehat{\texttt{interest_rate}}_1 = 18.09$.
-    The observed interest rate was 14.07%, which leads to a residual of $e_1 = 14.07 - 18.09 = -4.02$.
+    For example, $\texttt{verified_income}_{\texttt{Source Verified}}$ takes a value of 0, $\texttt{verified_income}_{\texttt{Verified}}$ takes a value of 1 (since the borrower's income source and amount were verified), $\texttt{debt_to_income}$ was 18.01, and so on.
+    This leads to a prediction of $\widehat{\texttt{interest_rate}}_1 = 17.84$.
+    The observed interest rate was 14.07%, which leads to a residual of $e_1 = 14.07 - 17.84 = -3.77$.
 
 ::: {.workedexample data-latex=""}
 We calculated a slope coefficient of 0.74 for `bankruptcy` in Section \@ref(ind-and-cat-predictors) while the coefficient is 0.39 here.
@@ -418,7 +418,7 @@ There were n = 10,000 auctions in the dataset and $k=9$ predictor variables in t
 Use $n$, $k$, and the variances from the earlier Guided Practice to calculate $R_{adj}^2$ for the interest rate model.[^model-mlr-8]
 :::
 
-[^model-mlr-8]: $R_{adj}^2 = 1 - \frac{18.53}{25.01}\times \frac{10000-1}{1000-9-1} = 0.2584$.
+[^model-mlr-8]: $R_{adj}^2 = 1 - \frac{18.53}{25.01}\times \frac{10000-1}{10000-9-1} = 0.2584$.
     While the difference is very small, it will be important when we fine tune the model in the next section.
 
 ::: {.guidedpractice data-latex=""}
 
@@ -446,7 +446,7 @@ Compute the standard error of the point estimate for the average difference betw
 
 ::: {.workedexample data-latex=""}
 Complete the hypothesis test started in the previous Example and Guided Practice on `births14` dataset and research question.
-Use a significance level of $\alpha=0.05.$ For reference, $\bar{x}_{n} - \bar{x}_{s} = `r xbar_difference`,$ $SE = `r se_difference`,$ and the sample sizes were $n_n = 100$ and $n_s = 50.$
+Use a significance level of $\alpha=0.05.$ For reference, $\bar{x}_{n} - \bar{x}_{s} = `r xbar_difference`,$ $SE = `r se_difference`,$ and the sample sizes were $n_n = `r n_nonsmoker`$ and $n_s = `r n_smoker`.$
 
 ------------------------------------------------------------------------
 
 
@@ -0,0 +1,49 @@
+---
+title: "HW Chp6"
+author: "Jo Hardin"
+date: "3/19/2021"
+output: html_document
+---
+
+## Inv 4.1 Which samples of size 3 are "more extreme" than what we observed?
+
+(a) 53, 56, 64
+(b) 55, 55, 56
+(c) 55, 55, 64
+(d) 55, 56, 64
+
+## Look at Jessica's book, how does she frame her thought questions? Add questions like that to the begining of each case study.
+
+## Problem 1 (MDSR)
+
+A categorical variable is XXX.
+Show scatter plot with continuous x and y.
+how can you differentiate the points based on the categorical variable?
+
+## Problem 2 (MDSR - exactly copied): Find two graphs published in a newspaper or on the internet in the last two years.
+
+Identify a graphical display that you find compelling.
+What aspects of the display work well, and how do these relate to the principles established in this chapter?
+Include a screen shot of the display along with your solution.
+
+Identify a graphical display that you find less than compelling.
+What aspects of the display don't work well?
+Are there ways that the display might be improved?
+Include a screen shot of the display along with your solution.
+
+## Problem 3: WGOITG
+
+<https://www.nytimes.com/column/whats-going-on-in-this-graph>
+
+## Problem 4: steal some of Ben's viz links? All are too hard.
+
+<http://mdsr-book.github.io/exercises.html#exercise_25>
+
+## Problem 4: WEB Du Bois (please, please, let's include!) can we get the liscences???
+
+<https://github.com/ajstarks/dubois-data-portraits/tree/master/challenge>
+
+-   What do the colors tell us?
+-   What do the labels add?
+
+faceted bar plots in challenge 2: <https://github.com/ajstarks/dubois-data-portraits/tree/master/challenge/challenge02>
@@ -0,0 +1,106 @@
+library(tidyverse)
+library(tidymodels)
+
+# read data --------------------------------------------------------------------
+
+df <- read_csv("duke-forest.csv")
+
+# visualise --------------------------------------------------------------------
+
+ggplot(df, aes(x = area, y = price)) +
+  geom_point() +
+  geom_vline(xintercept = 2825, color = "gray")
+
+ggplot(df, aes(x = lot, y = price)) +
+  geom_point() +
+  geom_vline(xintercept = 0.52, color = "gray")
+
+ggplot(df, aes(x = bed, y = price)) +
+  geom_point() +
+  geom_vline(xintercept = 3, color = "gray")
+
+ggplot(df, aes(x = price)) +
+  geom_histogram() +
+  geom_vline(xintercept = 400000, color = "gray")
+
+ggplot(df, aes(x = log10(price))) +
+  geom_histogram()
+
+# model ------------------------------------------------------------------------
+
+set.seed(4595)
+
+preds <- c("bed", "bath", "area", "year_built", "lot")
+
+df <- df %>%
+  select(price, all_of(preds)) %>%
+  na.omit()
+
+data_split <- initial_split(df, strata = "price", prop = 0.75)
+
+df_train <- training(data_split)
+df_test  <- testing(data_split)
+
+rf_defaults <- rand_forest(mode = "regression")
+
+rf_mod <- rf_defaults %>%
+  set_engine("ranger")
+
+rf_fit <- rf_mod %>%
+  fit_xy(
+    x = df_train[, preds],
+    y = df_train$price
+  )
+
+rf_fit
+
+# predict ----------------------------------------------------------------------
+
+# training set predictions
+
+rf_training_pred <-
+  predict(rf_fit, df_train) %>%
+  # Add the true outcome data back in
+  bind_cols(df_train %>% select(price))
+
+rf_training_pred %>%
+  rmse(truth = price, .pred)
+
+# testing set predictions
+
+rf_testing_pred <-
+  predict(rf_fit, df_test) %>%
+  # Add the true outcome data back in
+  bind_cols(df_test %>% select(price))
+
+rf_testing_pred %>%
+  rmse(truth = price, .pred)
+
+harvey15 <- tibble(
+  bed  = 3,
+  bath = 3.5,
+  area = 2825,
+  year_built = 1982,
+  lot  = 0.52
+)
+
+predict(rf_fit, new_data = harvey15)
+
+# cross validation -------------------------------------------------------------
+
+set.seed(345)
+folds <- vfold_cv(df_train, v = 5)
+folds
+
+rf_wf <-
+  workflow() %>%
+  add_model(rf_mod) %>%
+  add_formula(price ~ .)
+
+set.seed(456)
+rf_fit_rs <-
+  rf_wf %>%
+  fit_resamples(folds)
+
+rf_fit_rs
+collect_metrics(rf_fit_rs, summarize = FALSE)
@@ -0,0 +1,2 @@
+^ims\.Rproj$
+^\.Rproj\.user$
@@ -0,0 +1 @@
+.Rproj.user
@@ -0,0 +1,22 @@
+Version: 1.0
+
+RestoreWorkspace: No
+SaveWorkspace: No
+AlwaysSaveHistory: Default
+
+EnableCodeIndexing: Yes
+UseSpacesForTab: Yes
+NumSpacesForTab: 2
+Encoding: UTF-8
+
+RnwWeave: knitr
+LaTeX: pdfLaTeX
+
+AutoAppendNewline: Yes
+StripTrailingWhitespace: Yes
+LineEndingConversion: Posix
+
+BuildType: Package
+PackageUseDevtools: Yes
+PackageInstallArgs: --no-multiarch --with-keep.source
+PackageRoxygenize: rd,collate,namespace