OpenCL - Sum an array

davidholman's picture

Hi everyone,

New to OpenCL.

I have a 1-dimensional float4 array as input. I'd like to output the vector sum of the array.

I'm not sure how to declare a globally shared variable that each work-item can update (or if that is even the right approach).

Any input appreciated.

idlefon's picture
Re: OpenCL - Sum an array

I think you can not output a float4 from an OpenCL kernel. Only arrays and images are acceptable for outputs.

dust's picture
Re: OpenCL - Sum an array

even if you need just a scalar value like an average or sum, maybe distance etc.. you have to output it as an array. then use a structure index member to pull your value out.

in a single threaded case its best to keep your output / input sizes the same even if you only need one component for output. below returns the length or magnitude of a float4 as a float. it produces similar results to the average sum.

for sum just add each component up ?

sum[tid] += temp[tid].x+temp[tid].y+temp[tid].z+temp[tid].w/4

then just divide the output by your structures total size to get average sum etc...

here are two ways to do this. both produce identical results.

__kernel void main(__global const float4 *src, __global float *sum, __global float4 *temp)
{
   int  tid = get_global_id(0);
   temp[tid] = src[tid];
   sum[tid] = length(temp[tid]);
}

this may be more understandable but still produces the same as above and could be costly with a large data set.

__kernel void main(__global const float4 *src, float size, __global float *sum, __global float4 *temp)
{
   int  tid = get_global_id(0);
   temp[tid] = src[tid];
 
   float all = 0.0;
   for( int i = 0; i < size; i++)
   {all += length(temp[i]);}
   sum[tid] = all/size;
 
}
PreviewAttachmentSize
sum.qtz21.77 KB

davidholman's picture
Re: OpenCL - Sum an array

Thanks for the snippets.

The second (more understandable) snippet is what I'll use. It's the only way I've been able to get correct results.

On my test data, the first two snippets are producing much smaller values than what should be expected when summing large arrays.

To put it in context, I've got a normal map. I'd like to compute the average normal of the array. Without using a loop, I haven't been able to get accurate results. To avoid the performance hit, I've started sub-sampling my input image (i.e using 320x480 instead 640x480).

dust's picture
Re: OpenCL - Sum an array

thought that was strange that the two snippets returned the same results. so tested again. indeed on the first run the numbers are different but if you hit stop and play the numbers become the same. very strange. if these are for normals then you will want to use both x and y workgroups the snippets above are using only x. really i'm not going crazy run the attached. you will see they are the same if you stop and start once or twice ?

PreviewAttachmentSize
sum.zip63.6 KB

gtoledo3's picture
Re: OpenCL - Sum an array

What does one dimensional float4 array mean?

One index value that contains one float 4?

One dimensional suggests a structure of many indices that contain one float each. A one lane structure.

A float4 = 4 lane structure = XYZW or RGBA stype stuff.

To get the sum of each object 4 lane structure (eg, what XYZW or RGBA adds up to for a given index), you can use the "length" method.

The easiest way to get the average of a bunch of values is to write to image data and use the "Area Average" patch, which reduces all of the data to the average pixel - a 4 lane value / RGBA. You can extract those values using another CL kernel or "read pixels" and then get the numeric average.

There are probably some more performant ways... but I would guess that would work well.

You can also return the average of columns or rows using some of the other average patches. You just need to convert to image data out first.... the caveat.

I haven't found a method in CL 1.0 to return the average within one kernel, because of embargoes between flipping from structure to image, back to structure within one kernel. I think this is possible in CL 1.0+. Maybe there's some great way that I'm not seeing at the moment.