If you encounter any issues with installation, try the following script to see if it resolves your issue, as this will also install the other packages that are necessary for concurve
to function.
install.packages("concurve", dep = TRUE)
If that doesn’t work, please try installing, and resinstalling R
, and then installing the package again.
You can also try installing the developer version with
library(devtools) install_github("zadrafi/concurve")
If you encounter an error such as “Error: ‘data’ must be a data frame from ‘concurve’.”, it is very likely that you are not providing ggcurve()
the correct argument. If you used a function like curve_gen()
to generate intervals and saved it to an object called ‘object’, you need to provide ggcurve()
a data argument such as object[[1]]
rather than object
or object[1]
. This is because although you saved your results to something called object
, you ended up with a list with multiple components used for different purposes, usually with the first part of the list being the most commonly used part.
We can actually see these components one by one. Let’s generate some example data.
library(concurve) set.seed(1031) GroupA <- rnorm(500) GroupB <- rnorm(500) RandomData <- data.frame(GroupA, GroupB) object <- curve_mean(GroupA, GroupB, data = RandomData, method = "default" )
As stated, the first part of the list object[[1]]
contains what we usually want (I’m restricting to the first 5 results using the head()
function so that we don’t print a giant list with 1000 rows.)
head(object[[1]], 5) #> lower.limit upper.limit intrvl.width intrvl.level cdf pvalue svalue #> 1 -0.1125581 -0.1125581 0.000000e+00 0e+00 0.50000 1.0000 0.0000000000 #> 2 -0.1125658 -0.1125504 1.543412e-05 1e-04 0.50005 0.9999 0.0001442767 #> 3 -0.1125736 -0.1125427 3.086824e-05 2e-04 0.50010 0.9998 0.0002885679 #> 4 -0.1125813 -0.1125350 4.630236e-05 3e-04 0.50015 0.9997 0.0004328734 #> 5 -0.1125890 -0.1125273 6.173649e-05 4e-04 0.50020 0.9996 0.0005771935
while the second and third parts of the list contain dataframes and lists for other functions such as generating density functions or for other functions such as curve_table()
.
head(object[[2]], 5) #> x #> 1 -0.1125581 #> 2 -0.1125658 #> 3 -0.1125736 #> 4 -0.1125813 #> 5 -0.1125890
head(object[[3]], 5) #> Lower Limit Upper Limit Interval Width Interval Level (%) CDF P-value #> 2501 -0.132 -0.093 0.039 25 0.625 0.75 #> 5001 -0.154 -0.071 0.083 50 0.750 0.50 #> 7501 -0.183 -0.042 0.142 75 0.875 0.25 #> 8001 -0.192 -0.034 0.158 80 0.900 0.20 #> 8501 -0.201 -0.024 0.177 85 0.925 0.15 #> S-value (bits) #> 2501 0.415 #> 5001 1.000 #> 7501 2.000 #> 8001 2.322 #> 8501 2.737
If you encounter issues when plotting the functions, it is because there are a large number of points being plotted, which could lead to the graph being slightly distorted or not loading at all. The simplest solution to this is to refresh the plot and try the function again.
This applies to the ggcurve()
, curve_compare()
, and plot_compare()
functions.
I would also recommend saving plots using the cowplot::save_plot()
function with the actual ggcurve()
object. It has better default settings than the ggsave()
function.
Because this package is computing thousands of interval estimates via iterations and bootstrapping, it requires a lot of computational power. Luckily, concurve
supports parallelization, although it is disabled by default because some users, such as those who use Windows, are unable to use it.
However, if you are able to use parallelization, you can enable it with the following script
The script will detect the number of cores on your machine via the parallel
package and use them to speed up the computations, especially for bootstrapping.
However, if you would like to speed up the computations and are unable to use parallelization, then you can reduce the number of steps
in the each of the concurve
functions, which will drastically reduce the time it takes to complete the operation. By default, most of the steps
arguments are set to 10000.
For example, here I changed the number of steps to 100, which is the minimum needed to plot a function, and the process is now much quicker. We can evaluate this using a microbenchmark. Here I use the bench
package and the mark()
function. Because we are using parallelization, we must also set the memory
argument to FALSE
.
library(bench) library(parallel) options(mc.cores = 1) getOption("mc.cores", 1L) set.seed(1031) func1 <- mark(df1 <- curve_rev( point = 1.61, LL = 0.997, UL = 2.59, measure = "ratio", steps = 100 ), memory = FALSE) func2 <- mark(df1 <- curve_rev( point = 1.61, LL = 0.997, UL = 2.59, measure = "ratio", steps = 500000 ), memory = FALSE) #> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
I’ll now enable parallelization by setting the mc.cores
option to detect the max number of cores available using detectCores()
. The mc.cores
option is the argument that almost all the curve functions in the concurve
package use for parallel computing.
getOption("mc.cores", 1L) set.seed(1031) func3 <- mark(df1 <- curve_rev( point = 1.61, LL = 0.997, UL = 2.59, measure = "ratio", steps = 100 ), memory = FALSE) func4 <- mark(df1 <- curve_rev( point = 1.61, LL = 0.997, UL = 2.59, measure = "ratio", steps = 500000 ), memory = FALSE) #> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
func1$median #> [1] 2.56ms func2$median #> [1] 2.86s func3$median #> [1] 2.56ms func4$median #> [1] 3.75s
When setting the number of iterations to 100, utilizing parallelization doesn’t seem to help much, but when setting the number of iterations to 500000 and using multiple cores, there seems to be a computational advantage.
If you encounter any other bugs, please report them at https://github.com/zadrafi/concurve/issues
#> R version 4.0.3 (2020-10-10)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> attached base packages:
#> [1] parallel stats graphics grDevices utils datasets methods
#> [8] base
#>
#> other attached packages:
#> [1] bench_1.1.1 concurve_2.7.7
#>
#> loaded via a namespace (and not attached):
#> [1] tidyr_1.1.2 splines_4.0.3 carData_3.0-4
#> [4] ProfileLikelihood_1.1 assertthat_0.2.1 metafor_2.4-0
#> [7] cellranger_1.1.0 yaml_2.2.1 bcaboot_0.2-1
#> [10] gdtools_0.2.2 pillar_1.4.6 backports_1.1.10
#> [13] lattice_0.20-41 glue_1.4.2 uuid_0.1-4
#> [16] digest_0.6.25 ggsignif_0.6.0 colorspace_1.4-1
#> [19] htmltools_0.5.0 Matrix_1.2-18 pkgconfig_2.0.3
#> [22] broom_0.7.1 haven_2.3.1 xtable_1.8-4
#> [25] purrr_0.3.4 scales_1.1.1 km.ci_0.5-2
#> [28] openxlsx_4.2.2 officer_0.3.14 rio_0.5.16
#> [31] KMsurv_0.1-5 tibble_3.0.3 generics_0.0.2
#> [34] car_3.0-10 ggplot2_3.3.2 ellipsis_0.3.1
#> [37] ggpubr_0.4.0 survival_3.2-7 magrittr_1.5
#> [40] crayon_1.3.4.9000 readxl_1.3.1 memoise_1.1.0
#> [43] evaluate_0.14 fs_1.5.0 nlme_3.1-149
#> [46] rstatix_0.6.0 forcats_0.5.0 xml2_1.3.2
#> [49] foreign_0.8-80 textshaping_0.1.2 tools_4.0.3
#> [52] data.table_1.13.0 hms_0.5.3 lifecycle_0.2.0
#> [55] stringr_1.4.0 flextable_0.5.11 munsell_0.5.0
#> [58] zip_2.1.1 compiler_4.0.3 pkgdown_1.6.1
#> [61] survminer_0.4.8 pbmcapply_1.5.0 systemfonts_0.3.2
#> [64] rlang_0.4.8 grid_4.0.3 rstudioapi_0.11
#> [67] base64enc_0.1-3 rmarkdown_2.4 boot_1.3-25
#> [70] gtable_0.3.0 abind_1.4-5 curl_4.3
#> [73] R6_2.4.1 zoo_1.8-8 gridExtra_2.3
#> [76] knitr_1.30 dplyr_1.0.2 survMisc_0.5.5
#> [79] rprojroot_1.3-2 ragg_0.4.0 desc_1.2.0
#> [82] stringi_1.5.3 Rcpp_1.0.5 vctrs_0.3.4
#> [85] tidyselect_1.1.0 xfun_0.18