You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is an observed difference, but is this difference statistically
74
74
significant? In order to answer this question we will conduct a hypothesis test.
75
75
76
-
## Inference
76
+
## Hypothesis Test Step-by-Step
77
+
78
+
### Step 0: Ensure all necessary conditions for performing inference are met
79
+
80
+
Before doing the hypothesis test, we need to look at our graphical and numerical summaries to determine which conditions for performing our desired inference are satisfied. If all conditions are satisfied, then we can proceed. If not, then we need to find a different type of hypothesis test that only requires the conditions that are satisfied.
77
81
78
82
1. Are all conditions necessary for inference satisfied? Comment on each. You can
79
83
see the group sizes in the **Summary Statistic** output that you created above.
80
84
81
-
1. Write the (null and alternative) hypotheses for testing if the average weights are different between those
82
-
who exercise at least times a week and those who don't.
85
+
### Step 1: Specify the null hypothesis, alternative hypothesis, and significance level
86
+
87
+
We write each hypothesis as a complete sentence and also, if possible, as a "mathematical sentence" (equation or inequality) about the parameter(s) of interest.
88
+
89
+
1. Write the (null and alternative) hypotheses for testing if the average weights are different between those who exercise at least three times a week and those who don't.
90
+
91
+
We also specify our significance level $\alpha$ in this step. This removes the temptation to define our $\alpha$ level after seeing the results of the test. For this lab example, we will use $\alpha = 0.05$, a typical default significance level.
92
+
93
+
### Step 2: Choose the appropriate procedure in your statistical software
94
+
95
+
Since our null and alternative hypothesis concern population means, we use Rguroo's **Mean Inference** dialog from the **Analytics** toolbox to conduct this hypothesis test. Since our null and alternative hypothesis discuss a difference between two groups, we select the **One and Two Population** option under **Mean Inference**.
83
96
84
-
Next, we will use the **Mean Inference** dialog from the **Analytics** toolbox to conduct hypothesis tests. Since we are looking at a difference between two groups, we select the **One and Two Population** option under **Mean Inference**.
97
+
### Step 3: Set up the test in your statistical software
85
98
86
-
In the **One & Two Population Mean Inference** dialog, select the *yrbss_transformed*`Dataset`. We want to look at the values of a numerical `Variable`, *weight*, grouped `By Factor`, *physical_3plus*. Select either "yes" or "no" as the `Level` in the `Population 1` section and the other value as the `Level` in the `Population 2` section. In the screenshot below, Population 1 represents the people who are physically active at least 3 days a week ("yes") and Population 2 represents the people who are not ("no").
99
+
To fill out the **One & Two Population Mean Inference** dialog for this test, select the *yrbss_transformed*`Dataset`. We want to look at the values of a numerical `Variable`, *weight*, grouped `By Factor`, *physical_3plus*. Select either "yes" or "no" as the `Level` in the `Population 1` section and the other value as the `Level` in the `Population 2` section. In the screenshot below, Population 1 represents the people who are physically active at least 3 days a week ("yes") and Population 2 represents the people who are not ("no").
87
100
88
101
```{r HT1, echo = FALSE, results = "asis", fig.align = "center", fig.cap = "*Testing a hypothesis about difference of two population means*", out.width="80%"}
In the tabs below, click `Population 1-2` to indicate that we are comparing two populations, then click the `Test of Hypothesis` tab. The default option in Rguroo is to do a test using a `z-statistic`; uncheck that box and instead check the `Permutation Unscaled` box to do a simulation-based test for the difference of means. Fill in the appopriate alternative hypothesis and view the results.
105
+
In the tabs below, click `Population 1-2` to indicate that we are comparing two populations, then click the `Test of Hypothesis` tab. The default option in Rguroo is to do a test using a `z-statistic`; uncheck that box and instead check the `Permutation Unscaled` box to do a simulation-based test for the difference of means. Fill in the appropriate alternative hypothesis.
93
106
94
107
```{r HT2, echo = FALSE, results = "asis", fig.align = "center", fig.cap = "*Specifying the method and the alternative hypothesis*", out.width="80%"}
We can visualize the null distribution by finding in the output the graph labeled "Distribution of Permutation Replicates".
111
+
### Step 4: Run the test and identify the p-value in the output
112
+
113
+
Now, we can `Preview` the Rguroo output for our hypothesis test. We can visualize the null distribution (the sampling distribution of the test statistic, assuming the null hypothesis is true) by finding in the output the graph labeled "Distribution of Permutation Replicates."
99
114
100
115
1. According to the graph, what is the p-value? How many replicates were generated, and how many of them produced a difference at least as great as the observed difference in sample means?
101
116
102
-
This the standard workflow for performing hypothesis tests.
117
+
### Step 5: Make a conclusion in context
118
+
119
+
A conclusion requires two steps. First, we must decide whether to reject the null hypothesis. Second, we must explain what that decision means in the context of our original question.
120
+
121
+
1. Using our $\alpha = 0.05$ significance level, do we have enough statistical evidence to reject the null hypothesis? What does this suggest about the difference in average weight between those who exercise at least three times a week and those who don't?
122
+
123
+
This process is the standard workflow for performing hypothesis tests.
103
124
104
-
1. Construct and record a confidence interval for the difference between the
105
-
weights of those who exercise at least three times a week and those who don't, and
106
-
interpret this interval in context of the data. To construct a bootstrap confidence interval, select the `Confidence Interval` tab, and select the option `Bootstrap Percentile`.
107
125
108
126
* * *
109
127
110
128
## More Practice
111
129
130
+
1. Construct and record a 95% confidence interval for the difference between the average
131
+
weights of those who exercise at least three times a week and those who don't, and
132
+
interpret this interval in context of the data. This exercise uses the same data as in the previous section. To construct a bootstrap confidence interval using this data, click the  button to reopen the dialog box, select the `Confidence Interval` tab, and select the option `Bootstrap Percentile`.
133
+
112
134
1. Calculate a 95% confidence interval for the average height in meters (*height*)
113
135
and interpret it in context.
114
136
@@ -117,7 +139,7 @@ confidence level. Comment on the width of this interval versus
117
139
the one obtained in the previous exercise.
118
140
119
141
1. Conduct a hypothesis test evaluating whether the average height is different
120
-
for those who exercise at least three times a week and those who don't.
142
+
for those who exercise at least three times a week and those who don't. Follow all steps in the workflow.
121
143
122
144
1. Now, a non-inference task: Determine the number of different options there
123
145
are in the dataset for the variable *hours_tv_per_school_day*.
0 commit comments