vignettes/variancecomponents.Rmd
variancecomponents.Rmd
Here is a simple example taken from the lme4
documentation.
(confint.merMod(fm1 <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy, REML = TRUE))) #> Computing profile confidence intervals ... #> 2.5 % 97.5 % #> .sig01 14.3814012 37.7159906 #> .sig02 -0.4815008 0.6849863 #> .sig03 3.8011642 8.7533658 #> .sigma 22.8982669 28.8579964 #> (Intercept) 237.6806955 265.1295145 #> Days 7.3586533 13.5759188
There’s our output. Now suppose that we wanted the to see the interval estimates and functions of the variable “Days”, here’s how we would do it with curve_lmer()
. We use suppressMessages()
to avoid seeing the long list of profiling messages.
library(concurve) object1 <- suppressMessages(curve_lmer(object = fm1, parm = "Days", method = "profile", steps = 100)) sample1 <- suppressMessages(curve_lmer(object = fm1, parm = ".sig01", method = "profile", steps = 100)) ggcurve(data = sample1[[1]], type = "c", measure = "default")
Suppose we wanted to study the sources of variability in an experiment, which could be used for descriptive purposes or to better understand the sources so that they could be reduced in future experiment. This is vital in areas like industrial quality control because if one cannot take accurate measurements, then they have no hope of improving quality control.
Here we look at an example presented by John Lawson presented by Davies (1949).
A dye manufacturer wanted to know if there was an appreciable contribution to variability in dyestuff color yields owing to the quality of the intermediate acid batch used. It was a two- step process. Intermediate acid batches were produced in one step, and in a later step the acid was used to produce the dyestuff. The goal was to keep dyestuff color yields consistently high. If the majority of variation was caused by differences among the intermediate acid batches, then improvement efforts should be concentrated on the process that makes the acid batches. If the majority of variation was within preparations of the dyestuff made from the same acid batch, improvement efforts should be focused on the process step of making the dyestuff. A sampling experiment was run wherein six representative samples of H acid intermediate were taken from the step manufacturing process that produces it. From each acid sample, five preparations of the dyestuff Naphthalene 12B were made in a laboratory, and these were representative of the preparations that could be made with each sample. The data from the sampling experiment is shown in Table 5.2. The yields are given in grams of standard color.
We wish to estimate the variance components from the collected data above. A typical approach to estimate the variance components is to use analysis of variance (first proposed by Fisher), which uses the method of moments. However, in certain scenarios, this method is not desirable, and the restricted maximum likelihood approach (REML) is preferable.
Here we show how to do that.
fm1M <- lmer( yield ~ 1 + (1| sample), data = Naph, REML = TRUE) summary(fm1M) #> Linear mixed model fit by REML ['lmerMod'] #> Formula: yield ~ 1 + (1 | sample) #> Data: Naph #> #> REML criterion at convergence: 319.7 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -1.4117 -0.7634 0.1418 0.7792 1.8296 #> #> Random effects: #> Groups Name Variance Std.Dev. #> sample (Intercept) 1764 42.00 #> Residual 2451 49.51 #> Number of obs: 30, groups: sample, 6 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 1527.50 19.38 78.8
Now we attempt to estimate the interval estimates for variance components.
sample2 <- suppressMessages(curve_lmer(object = fm1M, parm = ".sig01", method = "profile", steps = 100)) ggcurve(data = sample2[[1]], type = "c", measure = "default")
You can also embed plots, for example:
c1 <- c( .5, -.5) mod4 <- lmer( pl ~ 1 + Group + (1|Subject:Group) + Period + Treat, contrasts = list(Group = c1, Period = c1, Treat = c1), data = antifungal) summary(mod4) #> Linear mixed model fit by REML ['lmerMod'] #> Formula: pl ~ 1 + Group + (1 | Subject:Group) + Period + Treat #> Data: antifungal #> #> REML criterion at convergence: 148.2 #> #> Scaled residuals: #> Min 1Q Median 3Q Max #> -2.0564 -0.3291 -0.1490 0.5399 1.7021 #> #> Random effects: #> Groups Name Variance Std.Dev. #> Subject:Group (Intercept) 1.508 1.228 #> Residual 4.563 2.136 #> Number of obs: 34, groups: Subject:Group, 17 #> #> Fixed effects: #> Estimate Std. Error t value #> (Intercept) 13.1688 0.4729 27.845 #> Group1 0.3375 0.9459 0.357 #> Period1 -0.2944 0.7340 -0.401 #> Treat1 0.5944 0.7340 0.810 #> #> Correlation of Fixed Effects: #> (Intr) Group1 Perid1 #> Group1 0.059 #> Period1 0.000 0.000 #> Treat1 0.000 0.000 0.059
crossed <- suppressMessages(curve_lmer(object = mod4, parm = "Period1", method = "profile")) ggcurve(data = crossed[[1]], type = "c")
Note that the echo = FALSE
parameter was added to the code chunk to prevent printing of the R code that generated the plot.
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] daewr_1.2-5 lme4_1.1-23 Matrix_1.2-18 concurve_2.7.7
#>
#> loaded via a namespace (and not attached):
#> [1] nlme_3.1-149 fs_1.5.0 rprojroot_1.3-2
#> [4] tools_4.0.3 backports_1.1.10 R6_2.4.1
#> [7] metafor_2.4-0 colorspace_1.4-1 tidyselect_1.1.0
#> [10] gridExtra_2.3 curl_4.3 compiler_4.0.3
#> [13] textshaping_0.1.2 flextable_0.5.11 xml2_1.3.2
#> [16] desc_1.2.0 officer_0.3.14 sfsmisc_1.1-7
#> [19] scales_1.1.1 lmtest_0.9-38 survMisc_0.5.5
#> [22] partitions_1.9-22 pkgdown_1.6.1 systemfonts_0.3.2
#> [25] stringr_1.4.0 digest_0.6.25 foreign_0.8-80
#> [28] minqa_1.2.4 rmarkdown_2.4 rio_0.5.16
#> [31] base64enc_0.1-3 pkgconfig_2.0.3 htmltools_0.5.0
#> [34] rlang_0.4.8 readxl_1.3.1 numbers_0.7-5
#> [37] rstudioapi_0.11 farver_2.0.3 generics_0.0.2
#> [40] combinat_0.0-8 zoo_1.8-8 dplyr_1.0.2
#> [43] zip_2.1.1 car_3.0-10 magrittr_1.5
#> [46] polynom_1.4-0 Rcpp_1.0.5 munsell_0.5.0
#> [49] abind_1.4-5 gdtools_0.2.2 lifecycle_0.2.0
#> [52] scatterplot3d_0.3-41 stringi_1.5.3 yaml_2.2.1
#> [55] carData_3.0-4 MASS_7.3-53 grid_4.0.3
#> [58] parallel_4.0.3 forcats_0.5.0 crayon_1.3.4.9000
#> [61] survminer_0.4.8 lattice_0.20-41 haven_2.3.1
#> [64] splines_4.0.3 hms_0.5.3 knitr_1.30
#> [67] pillar_1.4.6 ProfileLikelihood_1.1 igraph_1.2.6
#> [70] ggpubr_0.4.0 uuid_0.1-4 boot_1.3-25
#> [73] ggsignif_0.6.0 conf.design_2.0.0 glue_1.4.2
#> [76] evaluate_0.14 data.table_1.13.0 vcd_1.4-8
#> [79] nloptr_1.2.2.2 vctrs_0.3.4 bcaboot_0.2-1
#> [82] FrF2_2.2-2 cellranger_1.1.0 gtable_0.3.0
#> [85] purrr_0.3.4 tidyr_1.1.2 km.ci_0.5-2
#> [88] assertthat_0.2.1 ggplot2_3.3.2 DoE.base_1.1-5
#> [91] xfun_0.18 openxlsx_4.2.2 xtable_1.8-4
#> [94] broom_0.7.1 rstatix_0.6.0 ragg_0.4.0
#> [97] survival_3.2-7 tibble_3.0.3 pbmcapply_1.5.0
#> [100] memoise_1.1.0 KMsurv_0.1-5 gmp_0.6-1
#> [103] statmod_1.4.34 ellipsis_0.3.1