r - How to apply big data on this p-value corrgram? -
i studying didzis' p-value corrgram different input data examples, insignificant p-value (p < 0.05) corresponds perfect curve fit, strange, see fig 1-3.
fig. 1 output of "extreme" input data #1, fig. 2 output minimum input data #2, fig. 3 output didzis' input data #3,
statistical inspection.
- fig. 1 p-values high when r small,
- fig. 2 p-values high confidence intervals wide, not sure if drawing graph there appropriate,
- fig. 3 low p-values when curve fitting perfect - observation can confusing
input data test cases
real live data example #1 "extreme" example , application output in fig. 1
## 1 make list of lists set.seed(24) a=541650 m1 <- matrix(1:a, ncol=4, nrow=a) str(m1) a=360; b=1505; c=4; m2 <- array(`length<-`(m1, a*b*c), dim = c(a,b,c)) res <- lapply(seq(dim(m2)[3]), function(i) cor(m2[,,i])) str(res) res <- lapply(res, function(x) eigen(replace(x, is.na(x), 0))$vectors[,1:1]) str(res)
minimum example #2 , application output in fig. 2
a <- 1505 res <- list(rnorm(a), rnorm(rnorm(a)), rnorm(rnorm(rnorm(a))), rnorm(rnorm(rnorm(rnorm(a))))) str(res)
standard input example didzis used election data #3 in fig. 3
res <- usjudgeratings[,c(2:3,6,1,7)]
to make p-value corrgram
## 2 didzis https://stackoverflow.com/a/15271627/54964 panel.cor <- function(x, y, digits=2, cex.cor) { usr <- par("usr"); on.exit(par(usr)) par(usr = c(0, 1, 0, 1)) r <- abs(cor(x, y)) txt <- format(c(r, 0.123456789), digits=digits)[1] test <- cor.test(x,y) signif <- ifelse(round(test$p.value,3)<0.001,"p<0.001",paste("p=",round(test$p.value,3))) text(0.5, 0.25, paste("r=",txt)) text(.5, .75, signif) } panel.smooth<-function (x, y, col = "blue", bg = na, pch = 18, cex = 0.8, col.smooth = "red", span = 2/3, iter = 3, ...) { points(x, y, pch = pch, col = col, bg = bg, cex = cex) ok <- is.finite(x) & is.finite(y) if (any(ok)) lines(stats::lowess(x[ok], y[ok], f = span, iter = iter), col = col.smooth, ...) } panel.hist <- function(x, ...) { usr <- par("usr"); on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5) ) h <- hist(x, plot = false) breaks <- h$breaks; nb <- length(breaks) y <- h$counts; y <- y/max(y) rect(breaks[-nb], 0, breaks[-1], y, col="cyan", ...) } data <- res str(data) pairs(data, lower.panel=panel.smooth, upper.panel=panel.cor,diag.panel=panel.hist)
about significant upperbound
the source says study not statistically siginificant 15k points may become significant 2-3m points. observation becomes signifant 6-7m data sample , study, data 541650 541650 6925867
. think there no problem in plotting big data sets in didzis' p-value corrgram in theory. algorithms making possibly simplifications, or causing clusterisation of points such many figures increasing diagonal or y=0 line.
os: debian 8.5
r: 3.3.1
Comments
Post a Comment