r - How to apply big data on this p-value corrgram? -

i studying didzis' p-value corrgram different input data examples, insignificant p-value (p < 0.05) corresponds perfect curve fit, strange, see fig 1-3.

fig. 1 output of "extreme" input data #1, fig. 2 output minimum input data #2, fig. 3 output didzis' input data #3,

statistical inspection.

fig. 1 p-values high when r small,
fig. 2 p-values high confidence intervals wide, not sure if drawing graph there appropriate,
fig. 3 low p-values when curve fitting perfect - observation can confusing

input data test cases

real live data example #1 "extreme" example , application output in fig. 1

## 1 make list of lists set.seed(24) a=541650 m1 <- matrix(1:a, ncol=4, nrow=a) str(m1)  a=360; b=1505; c=4; m2 <- array(`length<-`(m1, a*b*c), dim = c(a,b,c))  res <- lapply(seq(dim(m2)[3]), function(i) cor(m2[,,i])) str(res)  res <- lapply(res, function(x) eigen(replace(x, is.na(x), 0))$vectors[,1:1])     str(res)

minimum example #2 , application output in fig. 2

a <- 1505 res <- list(rnorm(a), rnorm(rnorm(a)), rnorm(rnorm(rnorm(a))), rnorm(rnorm(rnorm(rnorm(a))))) str(res)

standard input example didzis used election data #3 in fig. 3

res <- usjudgeratings[,c(2:3,6,1,7)]

to make p-value corrgram

## 2 didzis https://stackoverflow.com/a/15271627/54964 panel.cor <- function(x, y, digits=2, cex.cor) {   usr <- par("usr"); on.exit(par(usr))   par(usr = c(0, 1, 0, 1))   r <- abs(cor(x, y))   txt <- format(c(r, 0.123456789), digits=digits)[1]   test <- cor.test(x,y)   signif <- ifelse(round(test$p.value,3)<0.001,"p<0.001",paste("p=",round(test$p.value,3)))   text(0.5, 0.25, paste("r=",txt))   text(.5, .75, signif) }  panel.smooth<-function (x, y, col = "blue", bg = na, pch = 18,                         cex = 0.8, col.smooth = "red", span = 2/3, iter = 3, ...) {   points(x, y, pch = pch, col = col, bg = bg, cex = cex)   ok <- is.finite(x) & is.finite(y)   if (any(ok))     lines(stats::lowess(x[ok], y[ok], f = span, iter = iter),           col = col.smooth, ...) }  panel.hist <- function(x, ...) {   usr <- par("usr"); on.exit(par(usr))   par(usr = c(usr[1:2], 0, 1.5) )   h <- hist(x, plot = false)   breaks <- h$breaks; nb <- length(breaks)   y <- h$counts; y <- y/max(y)   rect(breaks[-nb], 0, breaks[-1], y, col="cyan", ...) }  data <- res str(data)  pairs(data,           lower.panel=panel.smooth, upper.panel=panel.cor,diag.panel=panel.hist)

about significant upperbound

the source says study not statistically siginificant 15k points may become significant 2-3m points. observation becomes signifant 6-7m data sample , study, data 541650 541650 6925867. think there no problem in plotting big data sets in didzis' p-value corrgram in theory. algorithms making possibly simplifications, or causing clusterisation of points such many figures increasing diagonal or y=0 line.

os: debian 8.5
r: 3.3.1

Search This Blog

WIKI