float2 matrix (as 1D array) and CUDA -
i have work float2 matrix 1d array. wanted check things , have written code:
#include <stdio.h> #include <stdlib.h> #define index(x,y) x+y*n __global__ void test(float2* matrix_cuda,int n) { int i,j; i=blockidx.x*blockdim.x+threadidx.x; j=blockidx.y*blockdim.y+threadidx.y; matrix_cuda[index(i,j)].x=i; matrix_cuda[index(i,j)].y=j; } int main() { int n=256; int i,j; ////////////////////////////////////////// float2* matrix; matrix=(float2*)malloc(n*n*sizeof(float2)); ////////////////////////////////////////// float2* matrix_cuda; cudamalloc((void**)&matrix_cuda,n*n*sizeof(float2)); ////////////////////////////////////////// dim3 block_dim(32,2,0); dim3 grid_dim(2,2,0); test <<< grid_dim,block_dim >>> (matrix_cuda,n); ////////////////////////////////////////// cudamemcpy(matrix,matrix_cuda,n*n*sizeof(float2),cudamemcpydevicetohost); for(i=0;i<n;i++) { for(j=0;j<n;j++) { printf("%d %d, %f %f\n",i,j,matrix[index(i,j)].x,matrix[index(i,j)].y); } } return 0; }
i waiting output like:
0 0, 0 0 0 1, 0 1 0 2, 0 2 0 3, 0 3 ...
but thing find is:
0 0, -nan 7.265723657 0 1, -nan 152345 0 2, 25.2135235 -nan 0 3, 52354.324534 24.52354234523 ...
that means have problems memory allocation (i suppose) can't find wrong code. me?
any time having trouble cuda code, should use proper cuda error checking , run code cuda-memcheck
, before asking help.
even if don't understand output, useful others trying you.
if had run code cuda-memcheck
, have gotten (amongst other output!) output this:
$ cuda-memcheck ./t1273 ========= cuda-memcheck ========= program hit cudaerrorinvalidconfiguration (error 9) due "invalid configuration argument" on cuda api call cudalaunch. ========= saved host backtrace driver entry point @ error ========= host frame:/lib64/libcuda.so.1 [0x2eea03] ========= host frame:./t1273 [0x3616e] ========= host frame:./t1273 [0x2bfd] ========= host frame:./t1273 [0x299a] ========= host frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b15] ========= host frame:./t1273 [0x2a5d] ========= ========= error summary: 1 error $
this means wrong way configured kernel launch:
dim3 block_dim(32,2,0); dim3 grid_dim(2,2,0); test <<< grid_dim,block_dim >>> (matrix_cuda,n); ^^^^^^^^^^^^^^^^^^ kernel config arguments
specifically, not ever select dimension of 0 when creating dim3
variable kernel launch. minimum dimension component 1, not zero.
so use arguments this:
dim3 block_dim(32,2,1); dim3 grid_dim(2,2,1);
in addition, once fix that, still find many of outputs not touched code. fix that, you'll need increase size of thread array match size of data array. since have 1-d array, it's not clear me why launching 2d threadblocks , 2d grids. data array should "coverable" total of 65536 threads in linear dimension, this:
dim3 block_dim(32,1,1); dim3 grid_dim(2048,1,1);
Comments
Post a Comment