Skip to content

Commit 6f1ff83

Browse files
author
Carlos E Hernández R
committed
Project finished
1 parent 066f7ef commit 6f1ff83

File tree

11 files changed

+3611
-120
lines changed

11 files changed

+3611
-120
lines changed

Analysis.Rmd

Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: "Analysis of ToothGrowth data in the R datasets package"
3+
author: "Carlos Hernández"
4+
date: "25/9/2020"
5+
output:
6+
pdf_document:
7+
latex_engine: xelatex
8+
highlight: espresso
9+
toc: true
10+
toc_depth: 4
11+
---
12+
13+
```{r setup, include=FALSE}
14+
knitr::opts_chunk$set(echo = TRUE)
15+
```
16+
17+
## ToothGrowth Dataset
18+
19+
ToothGrowth data set contains the result from an experiment studying the effect of vitamin C on tooth growth in 60 Guinea pigs. Each animal received one of three dose levels of vitamin C (0.5, 1, and 2 mg/day) by one of two delivery methods, orange juice or ascorbic acid (a form of vitamin C and coded as VC).
20+
21+
```{r ToothGrowth }
22+
library(kableExtra)
23+
head(ToothGrowth,5) %>%
24+
kbl() %>%
25+
kable_material(c("striped", "hover"))
26+
27+
```
28+
29+
1. len: Tooth length
30+
2. supp: Supplement type (VC or OJ).
31+
3. dose: numeric Dose in milligrams/day
32+
33+
## Basic exploratory data analysis
34+
35+
```{r}
36+
library(ggplot2)
37+
ggplot(ToothGrowth, aes(x = dose, y = len, fill = supp)) +
38+
geom_col() +
39+
facet_grid(~supp, scales = "free")
40+
```
41+
42+
At first glance, it seems that the dose given through orange juice is more effective, since greater growth is observed in the teeth when the dose is administered via orange juice and less when it is administered with ascorbic acid. We could also notice that when doses of 2mg are administered, it seems that the growth is the same regardless of which medium is administered.
43+
44+
But these are only initial guesses that we can verify or reject by performing a hypothesis test.
45+
46+
47+
## Hypothesis Testing
48+
49+
### Assumptions
50+
51+
- The variables must be independent and identically distributed (i.i.d.).
52+
- Variances of tooth growth are different when using different supplement and dosage.
53+
- Tooth growth follows a normal distribution.
54+
55+
### Hypothesis 1: Variation of tooth length when using OJ or VC
56+
57+
Our null hypothesis is that the length of the tooth does not vary when we use either of the two methods (VC or OJ).
58+
59+
Therefore, our alternative hypothesis would be that tooth length varies depending on the method through which the dose is delivered.
60+
61+
```{r}
62+
oj_len <- ToothGrowth[ToothGrowth$supp=="OJ",]$len
63+
vc_len <- ToothGrowth[ToothGrowth$supp=="VC",]$len
64+
65+
t.test(oj_len,vc_len, paired = FALSE, var.equal = FALSE, alternative = "greater")
66+
```
67+
68+
As we can see our p-value is greater than 0.05, therefore our null hypothesis is rejected and we accept that the length of the tooth varies according to the method used.
69+
70+
Furthermore, we can see that on average if we use OJ the tooth length is greater than using VC.
71+
72+
### Hypothesis 2: Variation of tooth length when using different doses
73+
74+
Our null hypothesis is that tooth length does not vary between methods when we use different doses.
75+
76+
Therefore, our alternative hypothesis would be that the length of the teeth varies according to the method and dose delivered.
77+
78+
```{r}
79+
OJDoseHalf <- ToothGrowth[ToothGrowth$supp=="OJ" & ToothGrowth$dose==0.5,]$len
80+
OJDoseOne <- ToothGrowth[ToothGrowth$supp=="OJ" & ToothGrowth$dose==1.0,]$len
81+
OJDoseTwo <- ToothGrowth[ToothGrowth$supp=="OJ" & ToothGrowth$dose==2.0,]$len
82+
83+
VCDoseHalf <- ToothGrowth[ToothGrowth$supp=="VC" & ToothGrowth$dose==0.5,]$len
84+
VCDoseOne <- ToothGrowth[ToothGrowth$supp=="VC" & ToothGrowth$dose==1.0,]$len
85+
VCDoseTwo <- ToothGrowth[ToothGrowth$supp=="VC" & ToothGrowth$dose==2.0,]$len
86+
```
87+
88+
For dose equal to 0.5 mg:
89+
90+
```{r}
91+
t.test(OJDoseHalf, VCDoseHalf, paired = FALSE, var.equal = FALSE, alternative = "greater")
92+
```
93+
94+
For dose equal to 1 mg:
95+
96+
```{r}
97+
t.test(OJDoseOne,VCDoseOne, paired = FALSE, var.equal = FALSE, alternative = "greater")
98+
99+
```
100+
101+
For dose equal to 2 mg:
102+
103+
```{r}
104+
t.test(OJDoseTwo, VCDoseTwo, paired = FALSE, var.equal = FALSE, alternative = "greater")
105+
```
106+
107+
As we can see, for doses of 0.5 mg and 1 mg we obtained results similar to that of our hypothesis 1. In both cases the p-value is less than 0.5, therefore we can reject the null hypothesis and accept that the logintud of the teeth It varies according to the dose and greater lengths are obtained with doses of 1 mg being administered with OJ.
108+
109+
However, for doses of 2 mg, we obtain a p-value greater than 0.5, which we can interpret in that we must accept the null hypothesis. This means that regardless of the method used (VC or OJ) the length of the teeth obtained is the same for a dose of 2 mg.
110+
111+
## Conclusion
112+
113+
As a conclusion we can say that after conducting this brief but interesting analysis, we have shown that for doses of 0.5 mg and 1 mg, orange juice results in greater tooth length. However for doses of 2mg, the length of teeth obtained will be the same regardless of whether OJ or VC is used.
114+
115+
116+
117+
118+
119+

Analysis.html

Lines changed: 2066 additions & 0 deletions
Large diffs are not rendered by default.

Analysis.pdf

45.7 KB
Binary file not shown.

Report.Rmd

Lines changed: 37 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,14 @@
11
---
22
title: "Simulation of Exponential Distribution using R"
3+
34
author: "Carlos Hernández"
45
date: "25/9/2020"
5-
output: html_document
6+
output:
7+
pdf_document:
8+
latex_engine: xelatex
9+
highlight: espresso
10+
toc: true
11+
toc_depth: 4
612
---
713

814
```{r setup, include=FALSE}
@@ -50,8 +56,9 @@ data <- data.frame(value = c(t(rexp(1000, rate = 1))))
5056
5157
ggplot(data, aes(x=value)) +
5258
geom_histogram(aes(y=..density..),binwidth=.25, col="black", fill="lightblue")+
53-
labs(title= "Exponential distribution with mean = 1", caption="Produced by Carlos Hernández") +
54-
xlab("x") +
59+
labs(title= "Exponential distribution with mean = 1",
60+
caption="Produced by Carlos Hernández") +
61+
xlab("x") +
5562
ylab("y")
5663
5764
```
@@ -73,7 +80,9 @@ expData <- data.frame(value = c(t(expData))) # convert to data frame
7380
# plot
7481
ggplot(expData, aes(x=value)) +
7582
geom_histogram(aes(y=..density..), binwidth=.8,colour="black", fill="lightblue") +
76-
labs(title= "Exponential distribution with lambda = 0.2 and 40 observations", subtitle = "Replicated 1000 times", caption="Produced by Carlos Hernández") +
83+
labs(title= "Exponential distribution with lambda = 0.2 and 40 observations",
84+
subtitle = "Replicated 1000 times",
85+
caption="Produced by Carlos Hernández") +
7786
xlab("x") +
7887
ylab("exp(x)")
7988
@@ -89,7 +98,9 @@ data <- data.frame(value = c(t(data)), size = 40)
8998
9099
ggplot(data, aes(x=value)) +
91100
geom_histogram(aes(y=..density..),binwidth=.25, col="black", fill="lightblue") +
92-
labs(title= "Average of 40 random exponential distribution", subtitle = "Replicated 1000 times", caption="Produced by Carlos Hernández") +
101+
labs(title= "Average of 40 random exponential distribution",
102+
subtitle = "Replicated 1000 times",
103+
caption="Produced by Carlos Hernández") +
93104
xlab("x") +
94105
ylab("mean")
95106
@@ -105,15 +116,19 @@ theoretical_mu <- 1/lambda # calculate theoretical mean
105116
sample_mu <-mean(data$value) # calculate experimental mean
106117
107118
ggplot(data, aes(x=value)) +
108-
stat_function(fun=dnorm,
109-
color="black",
110-
args=list(mean=mean(data$value),
111-
sd=sd(data$value)))+
119+
stat_function(fun=dnorm,
120+
color="black",
121+
args=list(mean=mean(data$value),
122+
sd=sd(data$value)))+
112123
geom_vline(xintercept = theoretical_mu, colour="red") +
113-
geom_text(aes(x=theoretical_mu-.25, label="\nTheoretical mean", y=.2), colour="red", angle=90, text=element_text(size=11)) +
124+
geom_text(aes(x=theoretical_mu-.25,
125+
label="\nTheoretical mean", y=.2),
126+
colour="red", angle=90, text=element_text(size=11)) +
114127
geom_vline(xintercept = sample_mu, colour="green")+
115-
geom_text(aes(x=sample_mu+.05, label="\nSample mean", y=.2), colour="green", angle=90, text=element_text(size=11)) +
116-
labs(title= "Theoretical mean vs sample mean", caption="Produced by Carlos Hernández") +
128+
geom_text(aes(x=sample_mu+.05, label="\nSample mean", y=.2),
129+
colour="green", angle=90) +
130+
labs(title= "Theoretical mean vs sample mean",
131+
caption="Produced by Carlos Hernández") +
117132
xlab("x") +
118133
ylab("y")
119134
@@ -131,15 +146,15 @@ theoretical_variance <- 1/(n * lambda^2)
131146
sample_variance <- round(var(data$value),3)
132147
133148
ggplot(data, aes(x=value)) +
134-
stat_function(fun=dnorm,
135-
color="black",
136-
args=list(mean=mean(data$value),
137-
sd=sd(data$value)))+
149+
stat_function(fun=dnorm, color="black", args=list(mean=mean(data$value), sd=sd(data$value)))+
138150
geom_vline(xintercept = sample_mu, colour="gray", linetype="dashed")+
139151
geom_vline(xintercept = theoretical_mu, colour="gray", linetype="dashed")+
140-
geom_segment(aes(x = sample_mu, y = 0.36, xend =sample_mu + sample_variance, yend = 0.36), colour="green") +
141-
geom_segment(aes(x = theoretical_mu - theoretical_variance, y = 0.35, xend =theoretical_mu, yend = 0.35), colour="red") +
142-
labs(title= "Theoretical variance vs sample variance", caption="Produced by Carlos Hernández") +
152+
geom_segment(aes(x = sample_mu, y = 0.36, xend =sample_mu +
153+
sample_variance, yend = 0.36), colour="green") +
154+
geom_segment(aes(x = theoretical_mu - theoretical_variance, y = 0.35,
155+
xend =theoretical_mu, yend = 0.35), colour="red") +
156+
labs(title= "Theoretical variance vs sample variance",
157+
caption="Produced by Carlos Hernández") +
143158
geom_text(aes(x=sample_mu+.55, label="\nSample variance", y=.42), colour="green") +
144159
geom_text(aes(x=theoretical_mu-.65, label="\nTheoretical variance", y=.33), colour="red") +
145160
xlab("x") +
@@ -160,7 +175,9 @@ ggplot(data, aes(x=value)) +
160175
color="blue",
161176
args=list(mean=mean(data$value),
162177
sd=sd(data$value)))+
163-
labs(title= "Average of 40 random exponential distribution", subtitle = "Replicated 1000 times", caption="Produced by Carlos Hernández") +
178+
labs(title= "Average of 40 random exponential distribution",
179+
subtitle = "Replicated 1000 times",
180+
caption="Produced by Carlos Hernández") +
164181
xlab("x") +
165182
ylab("y")
166183
```

Report.html

Lines changed: 1368 additions & 99 deletions
Large diffs are not rendered by default.

Report.pdf

80.2 KB
Binary file not shown.
17.6 KB
Loading
16.3 KB
Loading
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
name: Document
2+
title:
3+
username:
4+
account: rpubs
5+
server: rpubs.com
6+
hostUrl: rpubs.com
7+
appId: https://api.rpubs.com/api/v1/document/666390/732703cd577a43f6bc78b51c6ea04308
8+
bundleId: https://api.rpubs.com/api/v1/document/666390/732703cd577a43f6bc78b51c6ea04308
9+
url: http://rpubs.com/publish/claim/666390/1284b297f0dd4bf78e340a41469d0358
10+
when: 1601097026.03967
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
name: Document
2+
title:
3+
username:
4+
account: rpubs
5+
server: rpubs.com
6+
hostUrl: rpubs.com
7+
appId: https://api.rpubs.com/api/v1/document/666328/609244398d1340bf99c4cfae58bb879a
8+
bundleId: https://api.rpubs.com/api/v1/document/666328/609244398d1340bf99c4cfae58bb879a
9+
url: http://rpubs.com/publish/claim/666328/7f37464851904714b83eea439bb2e2f7
10+
when: 1601083212.76761

statistical-inference.Rproj

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,4 @@ NumSpacesForTab: 2
1010
Encoding: UTF-8
1111

1212
RnwWeave: Sweave
13-
LaTeX: pdfLaTeX
13+
LaTeX: XeLaTeX

0 commit comments

Comments
 (0)