10-11 Dec. 2015, breedR version: 0.11

Douglas dataset

Description of Douglas dataset

data("douglas")
str(douglas)
'data.frame':   9630 obs. of  15 variables:
 $ self : num  135 136 137 138 139 140 141 142 143 144 ...
 $ dad  : num  41 41 41 41 41 41 41 41 41 41 ...
 $ mum  : num  21 21 21 21 21 21 21 21 21 21 ...
 $ orig : Factor w/ 11 levels "pA","pB","pC",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ site : Factor w/ 3 levels "s1","s2","s3": 1 1 1 1 1 1 1 1 1 1 ...
 $ block: Factor w/ 127 levels "s1:1","s1:2",..: 11 44 24 28 13 35 8 40 3 15 ...
 $ x    : num  6 27 45 57 57 60 63 66 66 75 ...
 $ y    : num  81 135 90 45 327 474 450 21 234 483 ...
 $ H02  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ H03  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ H04  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ H05  : int  634 581 611 370 721 488 574 498 528 620 ...
 $ C13  : int  586 474 715 372 665 558 490 372 527 612 ...
 $ AN   : Factor w/ 5 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...
 $ BR   : Factor w/ 5 levels "1","2","3","4",..: NA NA NA NA NA NA NA NA NA NA ...

Origins by site

with(douglas, table(orig, site))
    site
orig   s1   s2   s3
  pA 1381 1326  635
  pB  870  815  111
  pC  930 1030   80
  pD   28    1    0
  pE  265    0    0
  pF  502  673  388
  pG    0  116   26
  pH    0   78   54
  pI    0   78   52
  pJ    0   78   55
  pK    0    0   58
  • Mixed design (neither factorial nor hierarchical)
  • Heavily unbalanced

Spatial arrangement

ggplot(douglas, aes(x, y, color = block)) +
  geom_point(show_guide = FALSE) +
  facet_wrap(~ site)

  • The colour scale highlights the blocks within each site

Multi-site modelling

Statistical model

Excercise: write the following model in breedR:

\[ \begin{aligned} \mathrm{CR13} = & \mathrm{site} + \mathrm{orig} + fam + bl_\mathrm{site} + fsi + \varepsilon\\ fam \sim & \mathcal{N}(0, \sigma_f^2) \\ bl_s \sim & \mathcal{N}(0, \sigma_{b(s)}^2), \quad s = 1, 2, 3 \\ fsi \sim & \mathcal{N}(0, \sigma_{i}^2) \\ \varepsilon \sim & \mathcal{N}(0, \sigma^2) \end{aligned} \]

  • Adjust for blocks within each site
  • (possibly with different variabilities)

Preparation of dataset

## create site-wise block variables (i.e. bl_i = bl * Ind(i))
dat <- transform(douglas, bl1 = block, bl2 = block, bl3 = block)
dat$bl1[dat$site != "s1"] <- NA
dat$bl2[dat$site != "s2"] <- NA
dat$bl3[dat$site != "s3"] <- NA
dat <- droplevels(dat)

## variable family (taken as a maternal effect)
dat$fam <- factor(dat$mum)

## family-site interaction variable
dat <- transform(dat, famxsite = factor(fam:site))

Basic Family-Site interaction model

Model with origin and site as fixed effects, and family, blocks and family-site interaction as random effects

reml.tree <- remlf90(fixed  = C13 ~ site + orig,
                     random = ~ fam + bl1 + bl2 + bl3 + famxsite,
                     data = dat)

Model summary

Linear Mixed Model with pedigree and spatial effects fit by AI-REMLF90 ver. 1.122 
   Data: dat 
    AIC    BIC logLik
 109427 109470 -54708

Parameters of special components:


Variance components:
         Estimated variances   S.E.
fam                  1339.70 235.07
bl1                   736.37 182.76
bl2                    66.46  50.84
bl3                  1233.80 385.47
famxsite              118.43  61.54
Residual            13653.00 211.56

Fixed effects:
           value    s.e.
site.s1 503.4873  7.4887
site.s2 550.2785  6.7923
site.s3 474.9609  9.6581
orig.pA  -8.9503  4.4835
orig.pB   0.0000  0.0000
orig.pC -42.4425  9.4899
orig.pD  52.2017 44.8236
orig.pE -26.2352 18.7055
orig.pF -37.0728 11.3151
orig.pG -47.1587 23.2942
orig.pH -14.7085 26.4218
orig.pI -49.8486 29.8747
orig.pJ -57.2313 26.3383
orig.pK -53.7943 28.7336

Minimal model diagnosis

Excercise: produce the following plots:

  • histogram of residuals
  • fitted vs. observations

Assessment of the interaction

Interaction effect

Excercise: produce the following interaction plot:

  • predicted family-site interaction by site
  • connect values for the same family with a line

Ecovalence

Consider only families present in all the 3 sites

famxsite.tbl <- table(dat$fam, dat$site) > 0
fam3.idx <- apply(famxsite.tbl, 1, all)
table(fam3.idx)
fam3.idx
FALSE  TRUE 
   64    45 

Compute ecovalence by family

Sum of squares of interaction effects by family (across sites) divided by total sum of squares (for all families)

  • (a bit of data-wrangling using dplyr)
(dat.ecov <- 
   dat.fsi %>%
   group_by(fam) %>%
   summarise(ssfam =
               sum(pred_fsi**2)) %>%
   filter(fam3.idx) %>% 
   mutate(Ecovalence=ssfam/sum(ssfam)))

## Check
stopifnot(
  all.equal(sum(dat.ecov$Ecovalence), 1)
)
Source: local data frame [45 x 3]

      fam      ssfam  Ecovalence
   (fctr)      (dbl)       (dbl)
1       1  18.447072 0.006589986
2       2   8.286972 0.002960417
3       3  58.993908 0.021074836
4       4 109.377567 0.039073769
5       5   4.886922 0.001745792
6       7  20.096726 0.007179304
7       8  33.322147 0.011903920
8       9  10.683869 0.003816679
9      10  18.413615 0.006578034
10     11  67.250109 0.024024261
..    ...        ...         ...

Determine most interactive families

threshold <- 0.05  # arbitrarily chosen
dat.ecov$hi <- dat.ecov$Ecovalence > threshold
ggplot(dat.ecov, aes(Ecovalence, fill = hi)) +
  geom_histogram(show_guide = FALSE)
Most interactive families:  24 28 38 48 105 106

Plot most interactive families

Represent the families which interact the most with the sites:

Correlations of genetic effects across sites

Approaches to genetic correlations

  1. Type-B genetic correlation
    • homogeneity of genetic variances across sites
    • equal correlations between any pair of sites
  2. Sample correlation of Predicted Breeding Values
    • with or without homogeneity of variances
  3. Formal inference about genetic correlations
    • (breedR's interface not supporting it yet)

Type-B genetic correlation

  • For \(k=3\) fixed environments, the average genetic correlation between observations of the same genetic group under two environments can be approximated by (Yamada 1962, eq. 38):

\[ \rho_{B} = \frac{\sigma_f^2 - \sigma_{fs}^2/3}{\sigma_f^2 + 2*\sigma_{fs}^2/3} \]

  • Underlying hypotheses:
    • equal genetic variances between environments
    • equal correlations between any two pairs of environments
  • High values indicate weak interaction, i.e. high consistency of genetic values accross sites

Type-B genetic correlation

inference

  • We can estimate functions of variance components by using the option se_covar_function in the argument progsf90.options.
r_B.fml <- "(G_3_3_1_1-G_7_7_1_1/3)/(G_3_3_1_1+2*G_7_7_1_1/3)"
reml.tree <- remlf90(
  fixed  = C13 ~ site + orig,
  random = ~ fam + bl1 + bl2 + bl3 + famxsite,
  progsf90.options = paste("se_covar_function r_B", r_B.fml),
  data = dat
)

Linear Mixed Model with pedigree and spatial effects fit by AI-REMLF90 ver. 1.122 
   Data: dat 
    AIC    BIC logLik
 109427 109470 -54708

Parameters of special components:


Variance components:
         Estimated variances   S.E.
fam                  1339.70 235.07
bl1                   736.37 182.76
bl2                    66.46  50.84
bl3                  1233.80 385.47
famxsite              118.43  61.54
Residual            13653.00 211.56

    Estimate    S.E.
r_B   0.9157 0.04774

Fixed effects:
           value    s.e.
site.s1 503.4873  7.4887
site.s2 550.2785  6.7923
site.s3 474.9609  9.6581
orig.pA  -8.9503  4.4835
orig.pB   0.0000  0.0000
orig.pC -42.4425  9.4899
orig.pD  52.2017 44.8236
orig.pE -26.2352 18.7055
orig.pF -37.0728 11.3151
orig.pG -47.1587 23.2942
orig.pH -14.7085 26.4218
orig.pI -49.8486 29.8747
orig.pJ -57.2313 26.3383
orig.pK -53.7943 28.7336

Correlation of Predicted Breeding Values

The predicted genetic effect is the sum of the family effect and the family-site interaction (Excercise).

Reshape predicted values

## reshape the genetic values
## by family

(dat.gen <- 
   
   dat.fsi %>% 
   
   select(fam, site, pred_gen) %>% 
   
   tidyr::spread(site, pred_gen) %>% 
   
   tbl_df()
)
Source: local data frame [109 x 4]

      fam          s1         s2         s3
   (fctr)       (dbl)      (dbl)      (dbl)
1       1   0.7708066   4.629670   6.755107
2       2 -54.7085993 -55.876654 -55.599421
3       3 -27.5767262 -17.541736 -19.202094
4       4  18.6370661  22.462160  32.831870
5       5  -9.5945562  -9.072929  -6.714122
6       6 -36.1778425 -31.010280         NA
7       7  44.9607625  39.364452  42.434652
8       8  41.1998076  44.446619  48.712357
9       9  40.6242870  36.905050  39.053858
10     10  14.6311835  10.742742   8.711473
..    ...         ...        ...        ...

Compute correlations

Rank correlations

  • Kendall (top triangle) and Spearman (bottom triangle)

    s1 s2 s3
    s1 1.000 0.875 0.834
    s2 0.970 1.000 0.851
    s3 0.951 0.959 1.000

Heterogeneous genetic variances

Diagnostic plots

  • The model fitted one single variance for the three sites.
  • Should we allow different variances?

Statistical model

  • Independent site-wise family effects
  • Analogous to the blocks effects

\[ \begin{aligned} \mathrm{CR13} = & \mathrm{site} + \mathrm{orig} + bl_\mathrm{site} + f_\mathrm{site} + \varepsilon\\ bl_s \sim & \mathcal{N}(0, \sigma_{b(s)}^2), \quad s = 1, 2, 3 \\ f_s \sim & \mathcal{N}(0, \sigma_{f(s)}^2), \quad s = 1, 2, 3 \\ \varepsilon \sim & \mathcal{N}(0, \sigma^2) \end{aligned} \]

Model summary

Things improve a bit wrt the previous model with a single variance. It seems that in site 2 there is not much variation, neither from blocks nor for GxE.

Linear Mixed Model with pedigree and spatial effects fit by AI-REMLF90 ver. 1.122 
   Data: dat 
    AIC    BIC logLik
 109520 109570 -54753

Parameters of special components:


Variance components:
         Estimated variances   S.E.
bl1                   741.74 184.06
bl2                    64.86  50.54
bl3                  1234.30 385.93
f1                   1550.50 283.46
f2                    907.78 201.97
f3                   1968.80 561.03
Residual            13658.00 211.81

Fixed effects:
           value    s.e.
site.s1 465.1287  8.1128
site.s2 512.7597  6.7871
site.s3 439.4925 10.7214
orig.pA  29.9486  7.3873
orig.pB  37.6115  7.7293
orig.pC  -2.1595  7.9720
orig.pD  90.4319 45.9454
orig.pE  12.7877 19.3840
orig.pF   0.0000  0.0000
orig.pG   8.5822 22.9116
orig.pH   9.4322 27.9432
orig.pI  -8.4477 27.9889
orig.pJ  -1.3922 28.0138
orig.pK   6.2548 47.8731

Minimal model diagnosis

Predicted genetic effects

Differences in model predictions

  • Full genetic merit comparison

Genetic correlations

independent interactions model

Prototype zone

formal inference for genetic correlations

Correlated effects

  • The most effective way is to introduce the (expected) correlation between interaction effects across sites into a model parameter.

  • Among other things, this allows us to predict the genetic value of a family in a site where it was not observed.

\[ \begin{aligned} \mathrm{CR13} = & \mathrm{site} + \mathrm{orig} + bl_\mathrm{site} + fsi + \varepsilon\\ bl_s \sim & \mathcal{N}(0, \sigma_{b(s)}^2), \quad s = 1, 2, 3 \\ fsi = \begin{pmatrix} f_1 \\ f_2 \\ f_3 \end{pmatrix} \sim & \mathcal{N}(0, \Sigma_i \otimes \mathrm{I}) \\ \varepsilon \sim & \mathcal{N}(0, \sigma^2) \end{aligned} \]

— {.columns-2 .smaller}

Linear Mixed Model with pedigree and spatial effects fit by AI-REMLF90 ver. 1.122 
   Data: dat 
    AIC    BIC logLik
 109026 109119 -54500

Parameters of special components:


Variance components:
$bl1
      [,1]
[1,] 737.6

$bl2
      [,1]
[1,] 67.39

$bl3
     [,1]
[1,] 1238

$fsi
     f1   f2   f3
f1 1611 1315 1552
f2 1315 1083 1353
f3 1552 1353 2268

$Residual
      [,1]
[1,] 13633


Fixed effects:
            value    s.e.
site.s1 464.26089 10.1003
site.s2 510.67266  9.1865
site.s3 435.06084 12.1328
orig.pA  30.82334 10.8268
orig.pB  38.63057 10.9586
orig.pC  -3.98224 11.3373
orig.pD  91.40232 46.9774
orig.pE  21.03562 20.4085
orig.pF   0.00000  0.0000
orig.pG  10.56924 21.5596
orig.pH  60.05614 24.4392
orig.pI   6.04216 27.6052
orig.pJ   0.38203 24.5737
orig.pK  -3.76085 38.3847
      [,1]  [,2]  [,3]
[1,] 1.000 0.996 0.812
[2,] 0.996 1.000 0.863
[3,] 0.812 0.863 1.000

Minimal model diagnosis

Predicted genetic effects

Differences in predictions

  • Full genetic merit comparison

Correlations

Bonus track

Spatial splines model on each site

Model hack

## Use breedR internal functions to compute splines models on each site
sp1 <- breedR:::breedr_splines(dat[dat$site == 's1', c('x', 'y')])
sp2 <- breedR:::breedr_splines(dat[dat$site == 's2' & !is.na(dat$x) & !is.na(dat$y), c('x', 'y')])
sp3 <- breedR:::breedr_splines(dat[dat$site == 's3', c('x', 'y')])

## Manually build the full incidence matrices with 0
## and the values computed before
mm1 <- model.matrix(sp1)
inc.sp1 <- Matrix::Matrix(0, nrow = nrow(dat), ncol = ncol(mm1))
inc.sp1[dat$site == 's1', ] <- mm1

mm2 <- model.matrix(sp2)
inc.sp2 <- Matrix::Matrix(0, nrow = nrow(dat), ncol = ncol(mm2))
inc.sp2[dat$site == 's2' & !is.na(dat$x) & !is.na(dat$y), ] <- mm2

mm3 <- model.matrix(sp3)
inc.sp3 <- Matrix::Matrix(0, nrow = nrow(dat), ncol = ncol(mm3))
inc.sp3[dat$site == 's3', ] <- mm3

reml.spl <- remlf90(
  fixed  = C13 ~ site + orig,
  random = ~ fam + famxsite,
  generic = list(sp1 = list(inc.sp1,
                            breedR:::get_structure(sp1)),
                 sp2 = list(inc.sp2,
                            breedR:::get_structure(sp2)),
                 sp3 = list(inc.sp3,
                            breedR:::get_structure(sp3))),
  data = dat,
  method = 'em'
)

Spatial effects by site

Yamada, Yukio. 1962. “Genotype by Environment Interaction and Genetic Correlation of the Same Trait Under Different Environments.” The Japanese Journal of Genetics 37 (6): 498–509. doi:10.1266/jjg.37.498.