Skip to content

Commit 1ecf19c

Browse files
committed
Fixed Chapter 7 lab
1 parent 5e4ea04 commit 1ecf19c

File tree

3 files changed

+151
-38
lines changed

3 files changed

+151
-38
lines changed
Loading

07_inf_for_numerical_data/inf_for_numerical_data_rguroo.Rmd

Lines changed: 34 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -73,42 +73,64 @@ knitr::include_graphics("img/summarystat1-1.png")
7373
There is an observed difference, but is this difference statistically
7474
significant? In order to answer this question we will conduct a hypothesis test.
7575

76-
## Inference
76+
## Hypothesis Test Step-by-Step
77+
78+
### Step 0: Ensure all necessary conditions for performing inference are met
79+
80+
Before doing the hypothesis test, we need to look at our graphical and numerical summaries to determine which conditions for performing our desired inference are satisfied. If all conditions are satisfied, then we can proceed. If not, then we need to find a different type of hypothesis test that only requires the conditions that are satisfied.
7781

7882
1. Are all conditions necessary for inference satisfied? Comment on each. You can
7983
see the group sizes in the **Summary Statistic** output that you created above.
8084

81-
1. Write the (null and alternative) hypotheses for testing if the average weights are different between those
82-
who exercise at least times a week and those who don't.
85+
### Step 1: Specify the null hypothesis, alternative hypothesis, and significance level
86+
87+
We write each hypothesis as a complete sentence and also, if possible, as a "mathematical sentence" (equation or inequality) about the parameter(s) of interest.
88+
89+
1. Write the (null and alternative) hypotheses for testing if the average weights are different between those who exercise at least three times a week and those who don't.
90+
91+
We also specify our significance level $\alpha$ in this step. This removes the temptation to define our $\alpha$ level after seeing the results of the test. For this lab example, we will use $\alpha = 0.05$, a typical default significance level.
92+
93+
### Step 2: Choose the appropriate procedure in your statistical software
94+
95+
Since our null and alternative hypothesis concern population means, we use Rguroo's **Mean Inference** dialog from the **Analytics** toolbox to conduct this hypothesis test. Since our null and alternative hypothesis discuss a difference between two groups, we select the **One and Two Population** option under **Mean Inference**.
8396

84-
Next, we will use the **Mean Inference** dialog from the **Analytics** toolbox to conduct hypothesis tests. Since we are looking at a difference between two groups, we select the **One and Two Population** option under **Mean Inference**.
97+
### Step 3: Set up the test in your statistical software
8598

86-
In the **One & Two Population Mean Inference** dialog, select the *yrbss_transformed* `Dataset`. We want to look at the values of a numerical `Variable`, *weight*, grouped `By Factor`, *physical_3plus*. Select either "yes" or "no" as the `Level` in the `Population 1` section and the other value as the `Level` in the `Population 2` section. In the screenshot below, Population 1 represents the people who are physically active at least 3 days a week ("yes") and Population 2 represents the people who are not ("no").
99+
To fill out the **One & Two Population Mean Inference** dialog for this test, select the *yrbss_transformed* `Dataset`. We want to look at the values of a numerical `Variable`, *weight*, grouped `By Factor`, *physical_3plus*. Select either "yes" or "no" as the `Level` in the `Population 1` section and the other value as the `Level` in the `Population 2` section. In the screenshot below, Population 1 represents the people who are physically active at least 3 days a week ("yes") and Population 2 represents the people who are not ("no").
87100

88101
```{r HT1, echo = FALSE, results = "asis", fig.align = "center", fig.cap = "*Testing a hypothesis about difference of two population means*", out.width="80%"}
89102
knitr::include_graphics("img/hypothesistest1-1.png")
90103
```
91104

92-
In the tabs below, click `Population 1-2` to indicate that we are comparing two populations, then click the `Test of Hypothesis` tab. The default option in Rguroo is to do a test using a `z-statistic`; uncheck that box and instead check the `Permutation Unscaled` box to do a simulation-based test for the difference of means. Fill in the appopriate alternative hypothesis and view the results.
105+
In the tabs below, click `Population 1-2` to indicate that we are comparing two populations, then click the `Test of Hypothesis` tab. The default option in Rguroo is to do a test using a `z-statistic`; uncheck that box and instead check the `Permutation Unscaled` box to do a simulation-based test for the difference of means. Fill in the appropriate alternative hypothesis.
93106

94107
```{r HT2, echo = FALSE, results = "asis", fig.align = "center", fig.cap = "*Specifying the method and the alternative hypothesis*", out.width="80%"}
95108
knitr::include_graphics("img/hypothesistest1-2b.png")
96109
```
97110

98-
We can visualize the null distribution by finding in the output the graph labeled "Distribution of Permutation Replicates".
111+
### Step 4: Run the test and identify the p-value in the output
112+
113+
Now, we can `Preview` ![eye](../icon_images/preview.png) the Rguroo output for our hypothesis test. We can visualize the null distribution (the sampling distribution of the test statistic, assuming the null hypothesis is true) by finding in the output the graph labeled "Distribution of Permutation Replicates."
99114

100115
1. According to the graph, what is the p-value? How many replicates were generated, and how many of them produced a difference at least as great as the observed difference in sample means?
101116

102-
This the standard workflow for performing hypothesis tests.
117+
### Step 5: Make a conclusion in context
118+
119+
A conclusion requires two steps. First, we must decide whether to reject the null hypothesis. Second, we must explain what that decision means in the context of our original question.
120+
121+
1. Using our $\alpha = 0.05$ significance level, do we have enough statistical evidence to reject the null hypothesis? What does this suggest about the difference in average weight between those who exercise at least three times a week and those who don't?
122+
123+
This process is the standard workflow for performing hypothesis tests.
103124

104-
1. Construct and record a confidence interval for the difference between the
105-
weights of those who exercise at least three times a week and those who don't, and
106-
interpret this interval in context of the data. To construct a bootstrap confidence interval, select the `Confidence Interval` tab, and select the option `Bootstrap Percentile`.
107125

108126
* * *
109127

110128
## More Practice
111129

130+
1. Construct and record a 95% confidence interval for the difference between the average
131+
weights of those who exercise at least three times a week and those who don't, and
132+
interpret this interval in context of the data. This exercise uses the same data as in the previous section. To construct a bootstrap confidence interval using this data, click the ![Basics](../icon_images/basics2_selected.png) button to reopen the dialog box, select the `Confidence Interval` tab, and select the option `Bootstrap Percentile`.
133+
112134
1. Calculate a 95% confidence interval for the average height in meters (*height*)
113135
and interpret it in context.
114136

@@ -117,7 +139,7 @@ confidence level. Comment on the width of this interval versus
117139
the one obtained in the previous exercise.
118140

119141
1. Conduct a hypothesis test evaluating whether the average height is different
120-
for those who exercise at least three times a week and those who don't.
142+
for those who exercise at least three times a week and those who don't. Follow all steps in the workflow.
121143

122144
1. Now, a non-inference task: Determine the number of different options there
123145
are in the dataset for the variable *hours_tv_per_school_day*.

07_inf_for_numerical_data/inf_for_numerical_data_rguroo.html

Lines changed: 117 additions & 26 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)