float2 matrix (as 1D array) and CUDA -


i have work float2 matrix 1d array. wanted check things , have written code:

#include <stdio.h> #include <stdlib.h>  #define index(x,y) x+y*n  __global__ void test(float2* matrix_cuda,int n) {        int i,j;      i=blockidx.x*blockdim.x+threadidx.x;     j=blockidx.y*blockdim.y+threadidx.y;      matrix_cuda[index(i,j)].x=i;     matrix_cuda[index(i,j)].y=j;  }  int main() {     int n=256;      int i,j;      //////////////////////////////////////////      float2* matrix;      matrix=(float2*)malloc(n*n*sizeof(float2));      //////////////////////////////////////////      float2* matrix_cuda;      cudamalloc((void**)&matrix_cuda,n*n*sizeof(float2));      //////////////////////////////////////////      dim3 block_dim(32,2,0);     dim3 grid_dim(2,2,0);      test <<< grid_dim,block_dim >>> (matrix_cuda,n);      //////////////////////////////////////////      cudamemcpy(matrix,matrix_cuda,n*n*sizeof(float2),cudamemcpydevicetohost);       for(i=0;i<n;i++)     {         for(j=0;j<n;j++)         {             printf("%d %d, %f %f\n",i,j,matrix[index(i,j)].x,matrix[index(i,j)].y);         }     }       return 0; } 

i waiting output like:

0 0, 0 0 0 1, 0 1 0 2, 0 2 0 3, 0 3 ... 

but thing find is:

0 0, -nan 7.265723657 0 1, -nan 152345 0 2, 25.2135235 -nan 0 3, 52354.324534 24.52354234523 ... 

that means have problems memory allocation (i suppose) can't find wrong code. me?

any time having trouble cuda code, should use proper cuda error checking , run code cuda-memcheck, before asking help.

even if don't understand output, useful others trying you.

if had run code cuda-memcheck, have gotten (amongst other output!) output this:

$ cuda-memcheck ./t1273 ========= cuda-memcheck ========= program hit cudaerrorinvalidconfiguration (error 9) due "invalid configuration argument" on cuda api call cudalaunch. =========     saved host backtrace driver entry point @ error =========     host frame:/lib64/libcuda.so.1 [0x2eea03] =========     host frame:./t1273 [0x3616e] =========     host frame:./t1273 [0x2bfd] =========     host frame:./t1273 [0x299a] =========     host frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x21b15] =========     host frame:./t1273 [0x2a5d] ========= ========= error summary: 1 error $  

this means wrong way configured kernel launch:

dim3 block_dim(32,2,0); dim3 grid_dim(2,2,0);  test <<< grid_dim,block_dim >>> (matrix_cuda,n);          ^^^^^^^^^^^^^^^^^^          kernel config arguments 

specifically, not ever select dimension of 0 when creating dim3 variable kernel launch. minimum dimension component 1, not zero.

so use arguments this:

dim3 block_dim(32,2,1); dim3 grid_dim(2,2,1); 

in addition, once fix that, still find many of outputs not touched code. fix that, you'll need increase size of thread array match size of data array. since have 1-d array, it's not clear me why launching 2d threadblocks , 2d grids. data array should "coverable" total of 65536 threads in linear dimension, this:

dim3 block_dim(32,1,1); dim3 grid_dim(2048,1,1); 

Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -