Skip to content

Commit 9dbbe1a

Browse files
committed
source commit: f9cc1bc
0 parents  commit 9dbbe1a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

55 files changed

+9132
-0
lines changed

.Rhistory

Whitespace-only changes.

01-rstudio-intro.md

+724
Large diffs are not rendered by default.

02-project-intro.md

+243
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
---
2+
title: Project Management With RStudio
3+
teaching: 10
4+
exercises: 5
5+
source: Rmd
6+
---
7+
8+
::::::::::::::::::::::::::::::::::::::: objectives
9+
10+
- Create self-contained projects in RStudio
11+
12+
::::::::::::::::::::::::::::::::::::::::::::::::::
13+
14+
:::::::::::::::::::::::::::::::::::::::: questions
15+
16+
- How can I manage my projects in R?
17+
18+
::::::::::::::::::::::::::::::::::::::::::::::::::
19+
20+
21+
22+
## Introduction
23+
24+
The scientific process is naturally incremental, and many projects start life as
25+
random notes, some code, then a manuscript, and eventually everything is a bit
26+
mixed together. Organising a project involving spatial data is no different from
27+
any other data analysis project, although you may require more disk space than
28+
usual.
29+
30+
<div class="text-center">
31+
32+
<blockquote class="twitter-tweet"><p>Managing your projects in a reproducible fashion doesn't just make your science reproducible, it makes your life easier.</p>— Vince Buffalo (@vsbuffalo) <a href="https://twitter.com/vsbuffalo/status/323638476153167872">April 15, 2013</a></blockquote>
33+
34+
<script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script>
35+
36+
</div>
37+
38+
Most people tend to organize their projects like this:
39+
40+
![](fig/bad_layout.png){alt='A screenshot of a project folder containing multiple versions of data, analysis scripts, figures, and results files'}
41+
42+
There are many reasons why we should *ALWAYS* avoid this:
43+
44+
1. It is really hard to tell which version of your data is
45+
the original and which is the modified;
46+
2. It gets really messy because it mixes files with various
47+
extensions together;
48+
3. It probably takes you a lot of time to actually find
49+
things, and relate the correct figures to the exact code
50+
that has been used to generate it;
51+
52+
A good project layout will ultimately make your life easier:
53+
54+
- It will help ensure the integrity of your data;
55+
- It makes it simpler to share your code with someone else
56+
(a lab-mate, collaborator, or supervisor);
57+
- It allows you to easily upload your code with your manuscript submission;
58+
- It makes it easier to pick the project back up after a break.
59+
60+
## A possible solution
61+
62+
Fortunately, there are tools and packages which can help you manage your work effectively.
63+
64+
One of the most powerful and useful aspects of RStudio is its project management
65+
functionality. We'll be using this today to create a self-contained, reproducible
66+
project.
67+
68+
::::::::::::::::::::::::::::::::::::::: instructor
69+
70+
Make sure learners download the data files in Challenge 1 and move those files
71+
to their `data/` directory.
72+
73+
When learners load an RStudio project, their R session's working directory should
74+
automatically be set to the same folder as the `.RProj` file. We'll be using relative
75+
paths throughout the lesson to refer to files, so it's important to make sure that
76+
learners have loaded the right project and are in the right directory! You may also
77+
want to introduce other ways to make file paths, such as the `here` package, after
78+
creating the project.
79+
80+
:::::::::::::::::::::::::::::::::::::::
81+
82+
::::::::::::::::::::::::::::::::::::::: challenge
83+
84+
## Challenge: Creating a self-contained project
85+
86+
We're going to create a new project in RStudio:
87+
88+
1. Click the "File" menu button, then "New Project".
89+
2. Click "New Directory".
90+
3. Click "Empty Project".
91+
4. Type in "r-geospatial" as the name of the directory.
92+
5. Click the "Create Project" button.
93+
94+
95+
::::::::::::::::::::::::::::::::::::::::::::::::::
96+
97+
A key advantage of an RStudio Project is that whenever we open this project in
98+
subsequent RStudio sessions our working directory will *always* be set to the
99+
folder `r-geospatial`.
100+
Let's check our working directory by entering the following into the R console:
101+
102+
```r
103+
getwd()
104+
```
105+
106+
R should return `your/path/r-geospatial` as the working directory.
107+
108+
## Best practices for project organization
109+
110+
Although there is no "best" way to lay out a project, there are some general
111+
principles to adhere to that will make project management easier:
112+
113+
### Treat data as read only
114+
115+
This is probably the most important goal of setting up a project. Data is
116+
typically time consuming and/or expensive to collect. Working with them
117+
interactively (e.g., in Excel) where they can be modified means you are never
118+
sure of where the data came from, or how it has been modified since collection.
119+
It is therefore a good idea to treat your data as "read-only".
120+
121+
### Data Cleaning
122+
123+
In many cases your data will be "dirty": it will need significant preprocessing
124+
to get into a format R (or any other programming language) will find useful. This
125+
task is sometimes called "data munging". I find it useful to store these scripts
126+
in a separate folder, and create a second "read-only" data folder to hold the
127+
"cleaned" data sets.
128+
129+
### Treat generated output as disposable
130+
131+
Anything generated by your scripts should be treated as disposable: it should
132+
all be able to be regenerated from your scripts.
133+
134+
There are lots of different ways to manage this output. I find it useful to
135+
have an output folder with different sub-directories for each separate
136+
analysis. This makes it easier later, as many of my analyses are exploratory
137+
and don't end up being used in the final project, and some of the analyses
138+
get shared between projects.
139+
140+
### Keep related data together
141+
142+
Some GIS file formats are really 3-6 files that need to be kept together and have the same name,
143+
e.g. shapefiles. It may be tempting to store those components separately,
144+
but your spatial data will be unusable if you do that.
145+
146+
### Keep a consistent naming scheme
147+
148+
It is generally best to avoid renaming downloaded spatial data,
149+
so that a clear connection is maintained with the point of truth.
150+
You may otherwise find yourself wondering whether `file_A` really is just a copy of `Official_file_on_website` or not.
151+
152+
For datasets you generate, it's worth taking the time to come up with a naming convention that works for your project,
153+
and sticking to it. File names don't have to be long, they just have to be long enough that you can tell what the file
154+
is about. Date generated, topic, and whether a product is intermediate or final are good bits of information to keep
155+
in a file name. For more tips on naming files, check out [the slides from Jenny Bryan's talk "Naming things" at the 2015 Reproducible Science Workshop](https://speakerdeck.com/jennybc/how-to-name-files).
156+
157+
::::::::::::::::::::::::::::::::::::::::: callout
158+
159+
## Tip: Good Enough Practices for Scientific Computing
160+
161+
[Good Enough Practices for Scientific Computing](https://github.com/swcarpentry/good-enough-practices-in-scientific-computing/blob/gh-pages/good-enough-practices-for-scientific-computing.pdf) gives the following recommendations for project organization:
162+
163+
1. Put each project in its own directory, which is named after the project.
164+
2. Put text documents associated with the project in the `doc` directory.
165+
3. Put raw data and metadata in the `data` directory, and files generated during cleanup and analysis in a `results` directory.
166+
4. Put source for the project's scripts and programs in the `src` directory, and programs brought in from elsewhere or compiled locally in the `bin` directory.
167+
5. Name all files to reflect their content or function.
168+
169+
::::::::::::::::::::::::::::::::::::::::::::::::::
170+
171+
### Save the data in the data directory
172+
173+
Now we have a good directory structure we will now place/save our data files in the `data/` directory.
174+
175+
::::::::::::::::::::::::::::::::::::::: challenge
176+
177+
## Challenge 1
178+
179+
1\. Download each of the data files listed below (<kbd>Ctrl</kbd>\+<kbd>S</kbd>, right mouse click -> "Save as", or File -> "Save page as")
180+
181+
- [nordic country data](https://datacarpentry.org/r-intro-geospatial/data/nordic-data.csv)
182+
- [nordic country data (version 2)](https://datacarpentry.org/r-intro-geospatial/data/nordic-data-2.csv)
183+
- [gapminder data](https://datacarpentry.org/r-intro-geospatial/data/gapminder_data.csv)
184+
185+
2\. Make sure the files have the following names:
186+
187+
- `nordic-data.csv`
188+
- `nordic-data-2.csv`
189+
- `gapminder_data.csv`
190+
191+
3\. Save the files in the `data/` folder within your project.
192+
193+
We will load and inspect these data later.
194+
195+
196+
::::::::::::::::::::::::::::::::::::::::::::::::::
197+
198+
::::::::::::::::::::::::::::::::::::::: challenge
199+
200+
## Challenge 2
201+
202+
We also want to move the data that we downloaded from the [data page](https://datacarpentry.org/geospatial-workshop/data/) into a subdirectory
203+
inside `r-geospatial`. If you haven't already downloaded the data, you can do so by clicking
204+
[this download link](https://ndownloader.figshare.com/articles/2009586/versions/10).
205+
206+
1. Move the downloaded zip file to the `data` directory.
207+
2. Once the data have been moved, unzip all files.
208+
209+
210+
::::::::::::::::::::::::::::::::::::::::::::::::::
211+
212+
Once you have completed moving the data across to the new folder,
213+
your data directory should look as follows:
214+
215+
```
216+
data/
217+
gapminder_data.csv
218+
NEON-DS-Airborne-Remote-Sensing/
219+
NEON-DS-Landsat-NDVI/
220+
NEON-DS-Met-Time-Series/
221+
NEON-DS-Site-Layout-Files/
222+
NEON-DS-Airborne-Remote-Sensing.zip
223+
NEON-DS-Landsat-NDVI.zip
224+
NEON-DS-Met-Time-Series.zip
225+
NEON-DS-Site-Layout-Files.zip
226+
nordic-data.csv
227+
nordic-data-2.csv
228+
```
229+
230+
### Stage your scripts
231+
232+
Creating separate R scripts or Rmarkdown documents for different stages of a project will maximise efficiency.
233+
For instance, separating data download commands into their own file means that you won't re-download data unnecessarily.
234+
235+
:::::::::::::::::::::::::::::::::::::::: keypoints
236+
237+
- Use RStudio to create and manage projects with consistent layout.
238+
- Treat raw data as read-only.
239+
- Treat generated output as disposable.
240+
241+
::::::::::::::::::::::::::::::::::::::::::::::::::
242+
243+

0 commit comments

Comments
 (0)