By Vera Pfeiffer
Data for today
BadFungi<-read.csv("BadFun.csv", row.names=1)
head(BadFungi)
## Botrytis Cladosporium Monilinia ## X10orgedge 0.02131367 0.15160858 0.04504021 ## X10orgint 0.02654155 0.09718499 0.03900804 ## X11convedge 0.04959786 0.24664879 0.02560322 ## X11convint 0.03391421 0.32600536 0.04571046 ## X15softedge 0.10442359 0.17694370 0.05201072 ## X15softint 0.09195710 0.21796247 0.01300268 ## Mycosphaerella Penicillium ## X10orgedge 0.3230563 0 ## X10orgint 0.1786863 0 ## X11convedge 0.6823056 0 ## X11convint 0.5143432 0 ## X15softedge 0.3900804 0 ## X15softint 0.4044236 0
Envi<-read.csv("FunEnvi.csv", row.names=1)
head(Envi)
## PCNM1 PCNM2 Precip MinT VPDmin ## X10orgedge -0.11090592 -0.01442449 2.31 37.0 1.15 ## X10orgint -0.11092923 -0.01445247 2.31 37.0 1.15 ## X11convedge -0.11065756 -0.01294471 2.31 37.0 1.15 ## X11convint -0.08651768 0.69306694 2.31 37.0 1.15 ## X15softedge -0.11087064 -0.03984493 2.18 37.9 1.27 ## X15softint -0.11088600 -0.04012614 2.18 37.9 1.27 ## VPDmax PropPear PropFruit PropFor soft org ## X10orgedge 10.20 7.219268 0.1145916 5.7295780 0 1 ## X10orgint 10.20 7.219268 0.1145916 5.7295780 0 1 ## X11convedge 10.20 66.119330 1.6042818 0.3437747 0 0 ## X11convint 10.20 66.119330 1.6042818 0.3437747 0 0 ## X15softedge 11.28 14.782311 1.4896903 0.0000000 1 0 ## X15softint 11.28 14.782311 1.4896903 0.0000000 1 0
R packages for today
install.packages(“CCA”),
install.packages(“vegan”)
R functions
Everything that exists in R is an object, and everything that is done (data transformations, plotting, operations) in R is a function.
There are many functions supplied by the base R software package, as well as libraries of code accessible through open access packages available at R Cran repositories.
You can also write your own functions, and activate them by running them in your script. Or alternatively, you can to use source() to load them from a separate file in your working directory.
Writing functions
Abstracting your code into many small functions is a great way to improve your
organization, comfort, and efficiency in R. You can store your functions in .R
documents and source them from a script if they are long and obscure the flow of your working version of code. Try to avoid making very long or complex functions that do too much at once.
FUNCTION.NAME = function(PARAMETER.LIST) {
BODY
}
A function call proceeds as follows:
- Execution jumps to the first line of the function upon seeing the call FUNCTION.NAME(ARGUMENT.LIST)
- Function’s PARAMETER.LIST is copied from caller’s ARGUMENT.LIST by name or position, and from defaults specified as PARAMETER.NAME=DEFAULT in PARAMETER.LIST
- Assignment to function parameters and local variables doesn’t affect caller’s variables
- Code in function is executed until return(EXPRESSION), or until function’s closing }
- Execution returns to caller; if caller assigned a variable to function it gets EXPRESSION from function’s return() or last expression
a=4
b=5
square.a <- function(a = 1, b = 2) {
b=100
c=a*a
return(c)
}
square.a
## function(a = 1, b = 2) { ## b=100 ## c=a*a ## return(c) ## }
square.a()
## [1] 1
square.a(a=3)
## [1] 9
square.a(3)
## [1] 9
square.a(3,7)
## [1] 9
a
## [1] 4
b
## [1] 5
You don’t have to use a return statement…
sum.of.squares <- function(x,y) {
x^2 +y^2
}
sum.of.squares(5,6)
## [1] 61
x<-c(2:11)
y<-c(3:12)
sum.of.squares(x,y)
## [1] 13 25 41 61 85 113 145 181 221 265
Function goals:
- short
- performs one opperation
- intuitive name
Functions should primarily help you improve the readability of your code, and secondarily improve your efficiency and organization. It also helps you maintain your code by storing it in separate easily accessible files, rather than in super, super, long whole-project document.
Bigger function example
badfungiCC<-CCorA(Envi,BadFungi, stand.X=TRUE,stand.Y=TRUE)
summary(badfungiCC)
## Length Class Mode ## Pillai 1 -none- numeric ## Eigenvalues 5 -none- numeric ## CanCorr 5 -none- numeric ## Mat.ranks 2 -none- numeric ## RDA.Rsquares 2 -none- numeric ## RDA.adj.Rsq 2 -none- numeric ## nperm 1 -none- numeric ## p.Pillai 1 -none- numeric ## p.perm 1 -none- logical ## Cy 150 -none- numeric ## Cx 150 -none- numeric ## corr.Y.Cy 55 -none- numeric ## corr.X.Cx 25 -none- numeric ## corr.Y.Cx 55 -none- numeric ## corr.X.Cy 25 -none- numeric ## control 12 how list ## call 5 -none- call
badfungiCC$CanCorr
## [1] 0.8717640 0.8035587 0.6797292 0.5107511 0.4393814
#0.87 0.80
#Eigenvalues are canonical correlations squared
#Pillai's trace is the sum of squared Canonical correlations
print(sum(badfungiCC$CanCorr^2))
## [1] 2.321634
print(sum(badfungiCC$Eigenvalues))
## [1] 2.321634
print(badfungiCC$Pillai)
## [1] 2.321634
#RDA R squares
print(badfungiCC$RDA.Rsquares)
## [1] 0.1611034 0.4759253
# ...tell you how much variance in the *dependent* variables is explained by the independent *canonical variate* (and vice versa)
# i.e. X|Y = variance in X variables explained by the Y canonical variate
# i.e. Y|X = variance in Y variables explained by the X canonical variate
# IMPORTANT
# corr.Y.Cy = Correlation of each variable in Y with its Canonical variate (similarly, corr.X.Cx)
# Also called Canonical loadings or canonical structure correlations, interpreted as factor loadings
# ...tells you how much do individual variables contribute to their own canonical variates
badfungiCC$corr.Y.Cy
## CanAxis1 CanAxis2 CanAxis3 CanAxis4 ## PCNM1 -0.32081197 -0.33933054 -0.21230081 -0.27697487 ## PCNM2 0.20369029 0.14892281 -0.22857132 0.48254574 ## Precip -0.18490434 -0.22563958 -0.08315862 -0.23894850 ## MinT 0.03450369 0.15228199 0.23966211 0.26635597 ## VPDmin -0.13952698 -0.21987239 0.03910030 0.14645347 ## VPDmax -0.12043970 0.25484033 -0.05257090 -0.13428438 ## PropPear 0.45021330 0.69746308 -0.10002690 0.25950088 ## PropFruit 0.20965421 0.37011541 0.14175772 0.33301858 ## PropFor -0.20070175 -0.34619064 -0.19023253 -0.33176122 ## soft 0.32626004 0.05835548 0.07913943 -0.59234743 ## org -0.21437285 -0.34322738 -0.06779914 -0.01384469 ## CanAxis5 ## PCNM1 0.16866057 ## PCNM2 0.12143280 ## Precip 0.26614803 ## MinT -0.23270298 ## VPDmin 0.13155815 ## VPDmax -0.41006701 ## PropPear -0.05107942 ## PropFruit -0.07686641 ## PropFor 0.26656009 ## soft -0.28857927 ## org 0.15500380
badfungiCC$corr.X.Cx
## CanAxis1 CanAxis2 CanAxis3 ## Botrytis -0.58500580 0.49284541 -0.40984921 ## Cladosporium -0.04194937 0.70546350 -0.46226576 ## Monilinia 0.66393123 0.18551419 0.44195394 ## Mycosphaerella -0.28847351 0.81947263 0.09200482 ## Penicillium 0.26637032 0.09997318 -0.14498754 ## CanAxis4 CanAxis5 ## Botrytis -0.4107330 0.27963134 ## Cladosporium 0.4506787 0.28941422 ## Monilinia -0.4621564 0.34039386 ## Mycosphaerella 0.4837810 0.05233234 ## Penicillium -0.2441335 -0.91565805
# So the Canonical axis 1 correlation is driven by the high influence of the first spatial distance vector and the proportion of pear in the surrounding area - on Botrytis and Monilinia
# The second Canonical axis 2 correlation shows an even stronger influence of the proportion of pear, as well as more distributed effects of the proportion of fruit, the first spatial vector, organic management and precipitation on Cladosporium and Mycosphaerella
# ...NOT THAT INFORMATIVE
# corr.Y.Cx = Correlation of each variable in Y with each Canonical variate in X (similarly corr.X.Cy)
# Also called canonical cross-loading. Correlation of each variabel with the opposite canonical variate
# ...tells you how strongly are individual variables correlated with the canonical variates of the dependent/independent set
badfungiCC$corr.Y.Cx
## CanAxis1 CanAxis2 CanAxis3 CanAxis4 ## PCNM1 -0.27967234 -0.27267200 -0.14430705 -0.141465227 ## PCNM2 0.17756987 0.11966822 -0.15536659 0.246460778 ## Precip -0.16119295 -0.18131464 -0.05652534 -0.122043214 ## MinT 0.03007908 0.12236752 0.16290533 0.136041613 ## VPDmin -0.12163460 -0.17668037 0.02657761 0.074801276 ## VPDmax -0.10499500 0.20477916 -0.03573398 -0.068585896 ## PropPear 0.39247977 0.56045251 -0.06799120 0.132540364 ## PropFruit 0.18276900 0.29740945 0.09635686 0.170089613 ## PropFor -0.17496457 -0.27818449 -0.12930660 -0.169447413 ## soft 0.28442177 0.04689206 0.05379338 -0.302542112 ## org -0.18688254 -0.27580334 -0.04608505 -0.007071191 ## CanAxis5 ## PCNM1 0.07410631 ## PCNM2 0.05335531 ## Precip 0.11694049 ## MinT -0.10224536 ## VPDmin 0.05780420 ## VPDmax -0.18017580 ## PropPear -0.02244335 ## PropFruit -0.03377367 ## PropFor 0.11712154 ## soft -0.12679636 ## org 0.06810578
badfungiCC$corr.X.Cy
## CanAxis1 CanAxis2 CanAxis3 ## Botrytis -0.50998702 0.39603021 -0.27858646 ## Cladosporium -0.03656995 0.56688132 -0.31421552 ## Monilinia 0.57879137 0.14907154 0.30040898 ## Mycosphaerella -0.25148083 0.65849435 0.06253836 ## Penicillium 0.23221207 0.08033432 -0.09855226 ## CanAxis4 CanAxis5 ## Botrytis -0.2097823 0.12286480 ## Cladosporium 0.2301846 0.12716321 ## Monilinia -0.2360469 0.14956272 ## Mycosphaerella 0.2470917 0.02299386 ## Penicillium -0.1246915 -0.40232309
biplot(badfungiCC)
Effect sizes and significant canonical variates for Bad Fungi
badfungiCCA<-cc(Envi,BadFungi)
#Effect sizes
badfungiCCA$ycoef
## [,1] [,2] [,3] [,4] ## Botrytis -22.438664 11.750585 -8.474913 32.178310 ## Cladosporium 12.718896 2.526191 -17.735366 -6.120606 ## Monilinia 2.291911 1.647968 1.267286 1.787275 ## Mycosphaerella -3.954103 4.631983 7.749042 -1.847856 ## Penicillium 1.862247 2.207945 -1.680171 1.763883 ## [,5] ## Botrytis 4.676756 ## Cladosporium 5.152196 ## Monilinia 1.122518 ## Mycosphaerella -2.364223 ## Penicillium -5.048976
badfungiCCA$xcoef
## [,1] [,2] [,3] ## PCNM1 20.20318698 -30.87925582 -3.54177300 ## PCNM2 1.22514085 -0.68045395 -2.74271617 ## Precip -30.26201821 40.71110573 24.93730815 ## MinT 1.26711128 -2.85800379 4.65692142 ## VPDmin -25.44970665 30.85066149 -11.54735182 ## VPDmax -5.64031688 6.12719196 -1.63367262 ## PropPear -0.01383938 0.07172806 -0.02568558 ## PropFruit 0.87891684 -0.85420311 0.48111233 ## PropFor 0.02406463 -0.04232783 -0.05020187 ## soft 2.06630668 -0.45726184 0.57663036 ## org 0.56344937 0.21743181 0.63884373 ## [,4] [,5] ## PCNM1 -16.952541694 -53.60023442 ## PCNM2 -1.837648383 2.73584459 ## Precip 19.380736253 38.64802445 ## MinT -3.515535279 -9.39025411 ## VPDmin 18.076110329 38.23013274 ## VPDmax 3.280948570 6.08024432 ## PropPear 0.001137222 -0.06726919 ## PropFruit 0.006018099 0.43610758 ## PropFor 0.005374205 0.12673231 ## soft 1.806271076 -1.39622883 ## org 0.126172135 -2.55539272
FIND WHICH CANONICAL DIMENSIONS ARE SIGNIFICANT
ev=(1 - badfungiCCA$cor^2)
n=dim(Envi)[1]
p=length(Envi)
q=length(BadFungi)
k=min(p, q)
m=n - 3/2 - (p + q)/2
w=rev(cumprod(rev(ev)))
# initialize
d1=d2=f=vector("numeric", k)
for (i in 1:k) {
s=sqrt((p^2 * q^2 - 4)/(p^2 + q^2 - 5))
si=1/s
d1[i]=p * q
d2[i]=m * s - p * q/2 + 1
r=(1 - w[i]^si)/w[i]^si
f[i]=r * d2[i]/d1[i]
p=p - 1
q=q - 1
}
pv=pf(f, d1, d2, lower.tail = FALSE)
dmat=cbind(WilksL = w, F = f, df1 = d1, df2 = d2, p = pv)
print(dmat)
## WilksL F df1 df2 p ## [1,] 0.02728644 1.4637894 55 68.38983 0.06724978 ## [2,] 0.11368049 1.1369873 40 58.73361 0.32254775 ## [3,] 0.32086535 0.8348354 27 47.37057 0.68786933 ## [4,] 0.59643919 0.6265401 16 34.00000 0.83991561 ## [5,] 0.80694401 0.6151972 7 18.00000 0.73655072
As a function…
TestSignificance <- function(veganCCA,x,y) {
ev=(1 - veganCCA$cor^2)
n=dim(x)[1]
p=length(x)
q=length(y)
k=min(p, q)
m=n - 3/2 - (p + q)/2
w=rev(cumprod(rev(ev)))
# initialize
d1=d2=f=vector("numeric", k)
for (i in 1:k) {
s=sqrt((p^2 * q^2 - 4)/(p^2 + q^2 - 5))
si=1/s
d1[i]=p * q
d2[i]=m * s - p * q/2 + 1
r=(1 - w[i]^si)/w[i]^si
f[i]=r * d2[i]/d1[i]
p=p - 1
q=q - 1
}
pv=pf(f, d1, d2, lower.tail = FALSE)
dmat=cbind(WilksL = w, F = f, df1 = d1, df2 = d2, p = pv)
print(dmat)
}
TestSignificance(badfungiCCA, Envi, BadFungi)
## WilksL F df1 df2 p ## [1,] 0.02728644 1.4637894 55 68.38983 0.06724978 ## [2,] 0.11368049 1.1369873 40 58.73361 0.32254775 ## [3,] 0.32086535 0.8348354 27 47.37057 0.68786933 ## [4,] 0.59643919 0.6265401 16 34.00000 0.83991561 ## [5,] 0.80694401 0.6151972 7 18.00000 0.73655072
Apply function family
Apply functions are implicit loops that iterate over the elements in an object like a dataframe, list, matrix, or array (3-d matrix) and execute some set of functions.
Apply functions are a more efficient way to write a loop – which can help to keep your code organized and efficient. They are also more computationally efficient.
Depending on your inputs and outputs – there is a whole family of apply functions. The first four apply a function over all the elements in a single object.
lapply(X, FUN, ...)
list apply; returns a list of values obtatined by applying a function to a vector, list, or data frame.
lapply(Envi$Precip, sqrt)
## [[1]] ## [1] 1.519868 ## ## [[2]] ## [1] 1.519868 ## ## [[3]] ## [1] 1.519868 ## ## [[4]] ## [1] 1.519868 ## ## [[5]] ## [1] 1.476482 ## ## [[6]] ## [1] 1.476482 ## ## [[7]] ## [1] 1.476482 ## ## [[8]] ## [1] 1.476482 ## ## [[9]] ## [1] 1.486607 ## ## [[10]] ## [1] 1.486607 ## ## [[11]] ## [1] 1.486607 ## ## [[12]] ## [1] 1.486607 ## ## [[13]] ## [1] 1.593738 ## ## [[14]] ## [1] 1.593738 ## ## [[15]] ## [1] 1.593738 ## ## [[16]] ## [1] 1.593738 ## ## [[17]] ## [1] 1.593738 ## ## [[18]] ## [1] 1.593738 ## ## [[19]] ## [1] 1.479865 ## ## [[20]] ## [1] 1.479865 ## ## [[21]] ## [1] 1.479865 ## ## [[22]] ## [1] 1.479865 ## ## [[23]] ## [1] 1.519868 ## ## [[24]] ## [1] 1.519868 ## ## [[25]] ## [1] 1.584298 ## ## [[26]] ## [1] 1.584298 ## ## [[27]] ## [1] 1.486607 ## ## [[28]] ## [1] 1.486607 ## ## [[29]] ## [1] 1.486607 ## ## [[30]] ## [1] 1.486607
sapply(X, FUN, ...)
Like lapply, but simplifies the output to the more simple data type… may return a vector instead of a list.
sapply(Envi$Precip, sqrt)
## [1] 1.519868 1.519868 1.519868 1.519868 1.476482 1.476482 ## [7] 1.476482 1.476482 1.486607 1.486607 1.486607 1.486607 ## [13] 1.593738 1.593738 1.593738 1.593738 1.593738 1.593738 ## [19] 1.479865 1.479865 1.479865 1.479865 1.519868 1.519868 ## [25] 1.584298 1.584298 1.486607 1.486607 1.486607 1.486607
apply(X, MARGIN, FUN, ...)
returns a vector or array or list of values obtatined by applying a function to margins (1 = rows, 2 = columns) of a data frame, matrix or array.
apply(Envi, MARGIN=2, mean)
## PCNM1 PCNM2 Precip MinT ## -5.086263e-18 -6.604124e-17 2.309333e+00 3.710000e+01 ## VPDmin VPDmax PropPear PropFruit ## 1.216000e+00 1.051000e+01 2.711236e+01 1.443854e+00 ## PropFor soft org ## 9.862514e+00 3.333333e-01 3.333333e-01
tapply(X, INDEX, FUN = NULL)
Table apply, this apply statement requires an index values (object of the same length as X), and allows you to apply a function to a subset of your data.
tapply(Envi$MinT, Envi$Precip, mean)
## 2.18 2.19 2.21 2.31 2.51 2.54 ## 37.9 37.9 37.3 37.0 35.8 36.3
… and…
The multiple arguments apply function allows you to apply a function across multiple objects… takes several vectors, and applies the function to the first object of each, then the second object of each… can be different lengths
mapply(FUN, ...)
returns a vector or array or list of values obtatined by applying a function to margins of a matrix or array.
x= 1:4
y= 5:8
z= 9:12
mapply(sum, x, y, z)
## [1] 15 18 21 24
MAP(FUN, ...)
is apparently similar and often preferable version, that does not simplify data structures or break lazy argument loading norms.