cuda - Using cuBLAS-XT for large input size -
this link says cublas-xt routines provide out-of-core operation – size of operand data limited system memory size, not gpu on-board memory size. means long input data can stored on cpu memory , size of output greater gpu memory size can use cublas-xt functions, right?
on other hand, this link says "in case of large problems, cublasxt api offers possibility offload of computation host cpu" , "currenty, routine cublasxtgemm() supports feature. case problems input size greater cpu memory size?
i don't difference between these two! appreciate if helps me understand difference.
the purpose of cublasxt allow operations automatically run on several gpus. so, example, matrix multiply, or other supported operations, can run on several gpus.
the cublasxtgemm routine has special capability, in addition parallelizing matrix multiply across 2 or more gpus, can parallelize across 2 or more gpus plus use host cpu additional computation engine.
the matrix multiply problem readily decomposable discussed here. if run "chunks" of work on gpus, ordinary capability of cublasxtgemm (to use gpus). if run 1 of chunks of work on gpus , run 1 of chunks of work on cpu, special capability.
Comments
Post a Comment