- Question 1: Rainforest species survey
A team of researchers surveys the plant species at a number of sites within a rainforest. They also collect soil data and elevation data for these sites, as well as recording whether each site was in a valley, on a slope, or on a ridge. Their research question was whether soil characteristics, elevation and/or landscape position could predict the vegetation communities at different sites.
How many sites were there?
1 points
QUESTION 2
- How many species were recorded?
1 points
QUESTION 3
- What distance function is most appropriate to calculate a distance matrix between the sites based on the log-transformed species count data?
Bray-curtis | ||
Euclidean |
1 points
QUESTION 4
- Now investigate whether there is a difference between sites in different landscape positions, in terms of their vegetation composition. Construct and plot a nmMDS of the sites. What is the stress for the MDS (to two decimal places)?
1 points
QUESTION 5
- Does this mean that the MDS is a reasonable representation of the differences between sites? YES/NO
1 points
QUESTION 6
- Which site out of sites 1,2 and 3 seems most different to all the other sites in terms of vegetation composition? 1/2/3
1 points
QUESTION 7
- Make the sites colour represent their position within the landscape (valley, slope or ridge) – is there a visual indication that the sites group by landscape position?
yes | ||
no |
1 points
QUESTION 8
- Does an ANOSIM show that this grouping is significant?
yes | ||
no |
1 points
QUESTION 9
- Does a PERMANOVA show that this grouping is significant?
yes | ||
no |
1 points
QUESTION 10
- Now turn your attention to the environmental (soil and elevation) data, including measures of soil Ph, site elevation, and levels of soil P, Ca, Mg, K and N (in ppm) and percentage soil organic matter. Is there evidence of correlations in this data?
yes | ||
no |
1 points
QUESTION 11
- Perform a principal components analysis on this environmental data. First do this without scaling the variables. Plot the resulting PCA biplot. Which variable(s) is/are dominating the results?
Ca, Mg and Ph | ||
K | ||
P | ||
Elevation and Ca | ||
N and soil organic matter |
1 points
QUESTION 12
- Why?
Because these have the biggest values | ||
Because these have the smallest values | ||
Because they are the most strongly correlated | ||
Because they are not strongly correlated | ||
Because these have much bigger absolute variability than the others, because they have bigger absolute values | ||
Because these have much bigger relative variability than the others, because they have bigger absolute values |
1 points
QUESTION 13
- Now perform the principal components analysis on the environmental data with scaling, and plot the resulting PCA biplot. Which would be better if we want to give similar importance to all the environmental variables in our analysis?
PCA with scaling | ||
PCA without scaling |
1 points
QUESTION 14
- Consider the results of the PCA that gives similar importance to all the environmental variables. How much variance do the first two PCs explain (as a percent out of 100)?
____________________
1 points
QUESTION 15
- Does this mean that the 2-D biplot is a reasonable representation of the variation in the environmental data across the sites?
yes | ||
no |
1 points
QUESTION 16
- Variation in which of the environmental variables is least well represented in the 2-D biplot?
Ca | ||
N | ||
P | ||
Mg | ||
K | ||
Elevation | ||
Ph |
1 points
QUESTION 17
- According to the biplot, which site looks like it has the highest levels of P? (enter just the number of the site)
________
1 points
QUESTION 18
- Which variables appear to be strongly positively correlated with Ca?
K and elevation | ||
Elevation and P | ||
Ph and Mg | ||
N and soil organic matter |
1 points
QUESTION 19
- Which variables appear to be strongly negatively correlated with elevation?
K and elevation | ||
Elevation and P | ||
Ph and Mg | ||
N and soil organic matter |
1 points
QUESTION 20
- Which variable seems to be positively correlated with soil N but negatively correlated with soil Ca?
K | ||
soil organic matter | ||
P | ||
Ph | ||
Mg | ||
elevation |
1 points
QUESTION 21
- Which variables appear to vary independently of elevation?
K and P | ||
Ca and P | ||
Just P | ||
Ph and Mg | ||
N and soil organic matter |
1 points
QUESTION 22
- How does soil N seem to be related to soil Ca?
Positively correlated | ||
Negatively correlated | ||
Not correlated |
1 points
QUESTION 23
- PC1 is most strongly related to which variable?
K/soil organic mater/Ph/Mg/elevation/P/N/Ca
- PC2 is most strongly related to which variable?
K/soil organic mater/Ph/Mg/elevation/P/N/Ca
- PC3 is most strongly related to which variable?
K/soil organic mater/Ph/Mg/elevation/P/N/Ca/
3 points
QUESTION 24
- Is there a relationship between elevation and Shannon diversity index?
Yes, a positive relationship | ||
Yes, a negative relationship | ||
No, no significant relationship |
1 points
QUESTION 25
- Is there a relationship between PC1 and Shannon diversity index?
Yes, a positive relationship | ||
Yes, a negative relationship | ||
No, no significant relationship |
1 points
QUESTION 26
- Is there a relationship between PC2 and Shannon diversity index?
Yes, a positive relationship | ||
Yes, a negative relationship | ||
No, no significant relationship |
1 points
QUESTION 27
- Is there a relationship between PC3 and Shannon diversity index?
Yes, a positive relationship | ||
Yes, a negative relationship | ||
No, no significant relationship |
1 points
QUESTION 28
- Does the adonis function indicate a significant relationship between differences in soil P and differences in vegetation composition?
yes | ||
no |
1 points
QUESTION 29
- Does the adonis function indicate a significant relationship between differences in soil K and differences in vegetation composition?
yes | ||
no |
1 points
QUESTION 30
- Does the adonis function indicate a significant relationship between differences in soil N and differences in vegetation composition?
yes | ||
no |
1 points
QUESTION 31
- Does the adonis function indicate a significant relationship between differences in soil Ca and differences in vegetation composition?
yes | ||
no |
1 points
QUESTION 32
- Does the adonis function indicate a significant relationship between differences in elevation and differences in vegetation composition?
yes | ||
no |
1 points
QUESTION 33
Which PCs and interactions have a significant effect on composition?
pc1 yesno
pc2 yesno
pc3 yesno
pc1:pc2 yesno
pc1:pc3 yesno
pc2:pc3 yesno
pc1:pc2:pc3 yesno
7 points
QUESTION 34
- Does this match the results of the previous tests?
yes | ||
no |
1 points
QUESTION 35
Which of the following conclusions would you draw from this analysis above, accounting for the correlations in environmental variables that you observed?
- Elevation appears to have an effect on species composition, but this may actually be an indirect effect due to differences in soil N YES/NO
- Elevation appears to have an effect on species composition, but this may actually be an indirect effect due to differences in soil P YES/NO
- Elevation appears to have an effect on species composition, but this may actually be an indirect effect due to differences in soil Ph YES/NO
- K appears to have an effect on species composition, but this may actually be an indirect effect due to differences in soil N YES/NO
- K appears to have an effect on species composition, but this may actually be an indirect effect due to differences in soil P YES/NO
- K appears to have an effect on species composition, but this may actually be an indirect effect due to differences in soil Ph YES/NO
- P appears to have an effect on species composition, but this may actually be an indirect effect due to differences in soil N YES/NO
- P appears to have an effect on species composition, but this may actually be an indirect effect due to differences in soil elevation YES/NO
- P appears to have an effect on species composition, but this may actually be an indirect effect due to differences in soil Ph YES/NO
9 points
QUESTION 36
- The most common species is species number:
___________________
1 points
QUESTION 37
This species prefers (relatively high/intermediate/relatively low) elevation sites with (relatively high/intermediate/relatively low) levels of P and (relatively high/intermediate/relatively) low Ph levels.
3 points
QUESTION 38
- The second most common species is species number:
___________________
1 points
QUESTION 39
This species prefers (relatively high/intermediate/relatively) low elevation sites with (relatively high/intermediate/relatively low) levels of P and (relatively high/intermediate/relatively low) Ph levels.
3 points
QUESTION 40
- More guinea pigs
My friend was so excited about the potential of testing his baby guinea pigs to predict their head hair length, number of rosettes and chance of show success, that he did the whole thing again the following year, but with a greater number of babies.
The resulting data is in the file ‘guineapigs assignment.csv’. Open it up in Excel and have a look. Then get the data into R.
Has he measured the same variables as last time?
yes | ||
no |
1 points
QUESTION 41
- How many guinea pigs did he include in his study this time?
________________
1 points
QUESTION 42
- Fit a multiple regression model predicting adult head fur length with all the ‘baby’ variables as predictors, but no interactions (call this model fm for ‘full model’).
How many variables show as significant (p<0.05) in the model summary?
__________________
1 points
QUESTION 43
- How many variables show as very highly significant (p<0.001) in the model summary?
__________________
1 points
QUESTION 44
- Look at the correlation between the predictors visually and/or numerically. Do we see similar correlation patterns in the predictors to those we saw last time?
yes | ||
no |
1 points
QUESTION 45
- Carefully do step-wise model selection based on dropping the least significant term until only significant terms are left (sm1 for ‘simplified model 1’). Are you left with the same significant variables as in the full model?
yes | ||
no |
1 points
QUESTION 46
- Now do automated step-wise model reduction of the full model based on AIC (sm2). Are you left with any terms in simplified model 2 that were not retained in simplified model 1?
yes | ||
no |
1 points
QUESTION 47
- Now try to fit a multiple regression model predicting adult head fur length with all the ‘baby’ variables as predictors, and also all second order interactions (fim for ‘full interactions model’). Why can we do that this time, while it didn’t work in the labs?
❏ Because there are more variables. | ||
❏ Because there are less variables. | ||
❏ Because there are more observations. | ||
❏ Because there are less observations. | ||
❏ Because we are using a different selection criterion. |
1 points
QUESTION 48
- Do automated step-wise model reduction of this model based on AIC (fims for ‘full interactions model, simplified’). Are you left with any terms in the simplified interactions model that were not retained in simplified model 2?
yes | ||
no |
1 points
QUESTION 49
- Now fit a null model (fmn). Then use the step function to do automated ‘both directions’ model selection based on AIC, allowing all baby variables and up to second order interactions between them (call it fmbda1 with ‘bda’ for ‘both directions AIC’). Look at the summary of this model.
Now try doing the same thing – automated ‘both directions’ model selection based on AIC, allowing all baby variables and up to second order interactions between them, but this time starting with sm1 instead of the null model (call it fmbda2). Look at the summary of this model.
Now try automated ‘both directions’ model selection based on BIC, still allowing all baby variables and up to second order interactions between them – try starting first with the null model then with sm1 – call the resulting models fmbdb1 and fmbdb2. Look at the summary of these models.
Look at the AIC of the seven models that we have fitted based on some kind of model/variable selection (ignore full models where no selection was applied – they must be worse).
Look at the BIC of these seven models.
Some of these models are actually the same model ie the different procedures have found the same ‘best model’. If we remove these duplicates, how many distinct different simplified models do we have left to consider?
____________________
1 points
QUESTION 50
Which one has the best AIC? ____________________
And which the best BIC? ______________________
(Make sure you write the model abbreviation exactly for these – if more than one model had equal best AIC or BIC, either abbreviation will do.)
One of the models is clearly not a good model to consider, as it has the worst BIC and is also really complicated. Which one? ______________________
This model resulted from the automated selection procedure ‘getting stuck’ and not simplifying enough. Never trust an automated model simplification procedure!! :) So we can leave that one out.
3 points
QUESTION 51
Look closely at the remaining models. We can conclude that the (more/less) hormone3 a baby has, the longer its head hair length is likely to be when adult.
1 points
QUESTION 52
- In addition, the more of what hormone number would result in the the baby most likely having shorter head hair length as an adult?
_____________
1 points
QUESTION 53
- The other variable that seems to be a strong predictor is: (Write all variable names exactly as they appear in the spreadsheets).
_________________
1 points
QUESTION 54
- This strong predictor variable has a (positive/negative) relationship with adult head fur length.
1 points
QUESTION 55
- There is some indication that the effect of hormone 3 might depend on: (Write all variable names exactly as they appear in the spreadsheets).
_____________
1 points
QUESTION 56
Higher levels of this variable (reduced/increased) the effect of hormone 3.
1 points
QUESTION 57
- Of these significant predictors, the one most likely to be confounded with other predictors is: (Write all variable names exactly as they appear in the spreadsheets).
____________
1 points
QUESTION 58
- Do the results agree exactly with the first analysis conducted in the lab?
yes | ||
no |
1 points
QUESTION 59
- Which variable seemed maybe important then but not here? (Write all variable names exactly as they appear in the spreadsheets).
______________________
1 points
QUESTION 60
- Which variable seems important here but not then? (Write all variable names exactly as they appear in the spreadsheets).
_____________________________
1 points
QUESTION 61
- Now, rosettes!
Create a new TRUE/FALSE variable regarding whether the adult pig had 8 rosettes, like we did in the lab.
Try to find the best model predicting whether an adult guinea pig will have rosettes, based on BIC. Include up to second order interactions in the possible model terms considered. Try a few different starting points to make sure you haven’t got stuck somewhere far from the best model (To make it a bit simpler, don’t worry about AIC this time.)
What would you conclude from this best model?
There seem to be two variables useful in predicting whether an adult pig will have 8 rosettes: one is hormone number:
____________________________
1 points
QUESTION 62
- and the other is a ‘fur variable’ i.e.:
❏ baby.fur.length.head | ||
❏ baby.fur.length.stomach | ||
❏ baby.fur.length.back | ||
❏ baby.fur.lightness | ||
❏ baby.fur.smoothness | ||
❏ baby.fur.n.colours |
1 points
QUESTION 63
- There (is/is not) a significant interaction between these two variables.
1 points
QUESTION 64
- This indicates that the effect of one (does/does not) depend on the level of the other.
1 points
QUESTION 65
- When the fur variable is relatively low, the hormone has a (strong positive/strong negative/medium positive/medium negative/weak positive/weak negative) effect on the chance that a pig will have 8 rosettes, while when the fur variable is relatively high, the hormone has a (strong positive/strong negative/medium positive/medium negative/weak positive/weak negative) effect the chance that a pig will have 8 rosettes.
2 points
QUESTION 66
- There is a 90%+ chance that a pig will have 8 rosettes. This is when the fur variable is at the lowest level it was measured at, and the significant hormone level is higher than about:
1 points
QUESTION 67
- When the fur variable is at the highest level it was measured at, the chance that a pig will have 8 rosettes (increases/decreases) as the significant hormone level increases.
1 points
QUESTION 68
- When the hormone is at very low levels the percentage chance is about:
______________
1 points
QUESTION 69
- When the hormone is at very high levels the percentage chance is about:
_______________
1 points
QUESTION 70
- Now, my friend has five new guinea pig babies. Is it going to be a pampered life of indulgences to prepare for the show, or…. The BBQ?!?
He has measured all the predictors on them. The data is in the file ‘guineapigs asignment year3.csv’.
Based on the model that had the lowest BIC, predict the adult head fur length of each of these five babies? Write down the five predictions (note these lengths are cm! long haired guinea pigs! but just enter the numbers). Remember the ‘predict’ function!
Predictions are:
Guinea pig baby 1:
_______________
1 points
QUESTION 71
- Guinea pig baby 2:
____________________
1 points
QUESTION 72
- Guinea pig baby 3:
______________________
1 points
QUESTION 73
- Guinea pig baby 4:
______________________
1 points
QUESTION 74
- Guinea pig baby 5:
_________________________
1 points
QUESTION 75
- Based on the model that had the lowest BIC, what is the probability of having exactly eight rosettes for each of these five babies? Rank the babies from highest to lowest chance. The baby with the highest chance is number: (1,2,3,4,5) and the baby with the lowest chance is number: (1,2,3,4,5).
2 points
QUESTION 76
- Which baby is safest from the BBQ? Guinea pig baby number: (1,2,3,4,5)
1 points
QUESTION 77
- Which two are most likely to feel the heat? Guinea pig baby number: (1,2,3,4,5) and guinea pig baby number: (1,2,3,4,5).
- (Make sure you enter these in numerical order, so if it was 1 and 5, then enter 1 then 5, NOT 5 then 1).