Insights from Data Website
1
Introduction
2
Insights Workflow
2.1
Preparing the data
2.2
Prepare your computer, R, and RStudio
2.3
Read the data into R
2.4
Tidy the data
2.5
Clean the data.
2.6
Initial insights
2.7
Insights about question posed
3
Questions and exercises
3.1
Preface
3.2
Chapter 1 (Introduction)
3.3
Chapter 2 (Getting Acquainted)
3.4
Chapter 3 (Workflow Demonstration–Part 1)
3.5
Chapter 4 (Workflow Demonstration–Part 2)
3.6
Chapter 5 (Dealing with data 1—Digging into dplyr)
3.6.1
General questions and exercises
3.6.2
Bat diet workflow questions and exercises
3.7
Chapter 6 (Dealing with data 2—Expanding your toolkit)
3.7.1
General questions and exercises
3.7.2
Bat diet workflow questions and exercises
3.8
Chapter 7 (Getting to grips with ggplot2)
3.8.1
General questions and exercises
3.8.2
Bat diet workflow questions and exercises
3.9
Chapter 8 (Making Deeper Insights: Part 1 - working with single variables)
3.9.1
General questions and exercises
3.9.2
Workflow demonstration questions and exercises
3.10
Chapter 9 (Making Deeper Insights Part 2: Relationships among (many) variables)
3.10.1
Workflow questions and exercises
3.11
Chapter 10 (Looking back and looking forward)
4
More R
4.1
RStudio Project setup
4.2
Base/classic and tidyverse comparison
4.3
Multiple graphs in one figure
4.4
Factors
4.5
Other pipes
4.6
Simulating data
4.7
Avoiding “loops”
4.8
Syntax highlighting
4.9
Summarise more than one variable
5
Data analysis concepts
5.1
Distributions
5.2
Interactions and complexity
5.3
Lurking variables
5.4
Power of data to give insights
5.5
Effect sizes
5.6
Ordination
5.7
Influence and outliers
5.8
Transformations
5.9
Non-independence
5.10
Missing values (NAs)
5.11
Skewness
5.12
Interoperability / standardising terms
5.13
Comparing descriptive statistics
6
How does dietary diversity affect populations?
6.1
About this Workflow Demonstration
6.2
Going to the next level
6.3
Introduction to the study and data
6.4
What type of response variable?
6.5
A little preparation
6.6
Acquire the dataset
6.7
Import the dataset
6.8
Checking the import worked correctly
6.9
Cleaning and tidying
6.9.1
Recode some names
6.9.2
Make the
prey_composition
variable a factor with specific order
6.9.3
Fix those variable names
6.9.4
Calculate an important variable
6.9.5
Remove NAs
6.9.6
Checking some specifics
6.9.7
A closer look at the data
6.9.8
Calculate the three response variables
6.10
Shapes
6.11
Relationships
6.11.1
Maximum predator density
6.11.2
Predator population variability (CV)
6.11.3
Predator persistence time
6.11.4
All three at once
6.12
Wrapping up
7
Are diets more diverse in more democratic countries?
7.1
About this Workflow Demonstration
7.2
Introduction to the study and data
7.3
Understanding the data
7.4
A little preparation
7.5
Polity data: origins, acquire, import, clean, tidy, NAs, duplicates
7.5.1
Data origins and acquisition
7.5.2
Data import
7.5.3
Tidy and clean
7.5.4
Deal with NAs
7.5.5
Check for innapropriate duplicate observations
7.5.6
Check ranges of numeric variables
7.6
First insights from the polity data
7.7
Acquire, import, check the FAO Food balance sheet data
7.7.1
Tidy the FAO data
7.7.2
Clean the FAO data
7.7.3
Check for innapropriate duplicate observations
7.7.4
Checking something else…
7.7.5
Missing values
7.7.6
More cleaning
7.7.7
Calculating our response variables
7.8
Merge the two datasets (aaaaaargh!!)
7.8.1
Polity standardisation
7.8.2
The final merge
7.9
Tidying up
7.10
Shapes
7.11
Relationships
7.12
Wrapping up
7.13
And a challenge for you…
8
What are the effects of dietary restriction?
8.1
About this Workflow Demonstration
8.2
Introduction
8.3
The question
8.4
Before working in R
8.5
What was the experimental design?
8.6
What are the features of the data?
8.7
Acquire and import the necessary datafiles.
8.8
Explore and understand the datafiles
8.9
Check the data import
8.10
Make more informative variable names (and discard variables not obviously of use):
8.11
Replace codes with informative words
8.12
Checking for duplicates
8.13
NAs, variable entries, e.g. levels of characters, ranges of numerics, numbers of “things”
8.14
Independence
8.15
Balance in experimental design
8.16
Calculate response variable(s) (if required)
8.17
Merge all datasets together and check for correct number of rows
8.18
Something a bit weird…
8.19
Import the updated versions of the datasets.
8.20
Inspect shapes (distributions)
8.21
Inspect relationships
9
Solutions: What are the effects of dietary restriction?
9.1
About these solutions
9.1.1
NAs, variable entries, e.g. levels of characters, ranges of numerics, numbers of “things”*
9.2
Independence
9.3
Balance in experimental design
9.4
Calculate response variable(s) (if required)
9.5
Inspect relationships
9.6
Below are lists of variables in each of the four used datasets.
9.7
Moatt et al Data S1 – Mortality Data
9.8
Moatt et al Data S5 – Courtship Data
9.9
Moatt et al Data S6 – Eggs Data
9.10
Moatt et al Data S15 – Length, Weight and Condition Index Data
10
Workflow demonstration R scripts
11
Live data analysis demonstration
11.1
Introduction for intructors
Introduction
11.2
Meta-task
11.3
The question
11.4
Expectation
11.5
How are we going to present the results?
11.6
What statistical test will we use?
11.7
Selection of subjects
11.8
Ethical clearance and considerations
11.9
Data collection
11.10
Look at the data!
11.11
Lets get the data into our data analysis software of choice (R, via RStudio)
11.12
Now we need to do some data wrangling (cleaning and tidying)
11.12.1
Clean up the column / variable names:
11.12.2
Check the variable types are correct.
11.12.3
Correct or exclude problematic data
11.12.4
Check numbers of data points in each sex
11.12.5
Check the number of observations
11.13
Visualise the data
11.14
Get the means
11.15
Effect size and practical importance?
11.16
Assess assumptions
11.16.1
Independence
11.16.2
Normally distributed residuals
11.16.3
Equal variance
11.17
Do the statistical test
11.18
Critical thinking
11.19
Report and communicate the results
11.19.1
The results as a sentence
11.19.2
The results graphically
11.19.3
Do not use a table
12
More datasets
12.1
Hungry ladybirds
12.2
Seal suppers
12.3
More bat poop
12.4
Marten isotopes
12.5
Snake diets
12.6
Desert bat diets
12.7
Birds eating insects
12.8
Diets of predatory fish
12.9
Cervical spine compression and MRI (not food related)
12.10
Lots of other datasets here:
12.11
Fish eye lens diets
13
Related reading
13.1
Data science related reading
13.2
Study design related reading
13.3
Web sites / pages
14
Answers and solutions
14.1
Preface
14.2
Chapter 1 (Introduction)
14.3
Chapter 2 (Getting Acquainted)
14.4
Chapter 3 (Workflow Demonstration–Part 1)
14.5
Chapter 4 (Workflow Demonstration–Part 2)
14.6
Chapter 5 (Dealing with data 1—Digging into dplyr)
14.6.1
General questions and exercises
14.6.2
Bat diet workflow questions and exercises
14.7
Chapter 6 (Dealing with data 2—Expanding your toolkit)
14.7.1
General questions and exercises
14.7.2
Bat diet workflow questions and exercises
14.8
Chapter 7 (Getting to grips with ggplot2)
14.8.1
General questions and exercises
14.8.2
Bat diet workflow questions and exercises
14.9
Chapter 8 (Making Deeper Insights: Part 1 - working with single variables)
14.9.1
General questions and exercises
14.9.2
Workflow demonstration questions and exercises
14.10
Chapter 9 (Making Deeper Insights Part 2: Relationships among (many) variables)
14.10.1
General questions and exercises
14.10.2
Workflow questions and exercises
14.11
Chapter 10 (Looking back and looking forward)
14.12
Polity, food diversity, and GDP challenge
15
Corrections
Published with bookdown
Companion Website — Insights from Data with R
Chapter 10
Workflow demonstration R scripts
Here we provide the R scripts for each of the workflow demonstrations.
Bat diets - Part 1
Bat diets - Part 2
Polity - Food diversity
Prey diversity - predator stability
Fish dietary restriction