OpenCL : Array Size at run time

yanomano's picture

Is there a way to force an OpenCL kernel to adapt its working size if an array size change on its input? the function get_global_size() seems to evaluate only one time at the start.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

cybero's picture
Re: OpenCL : Array Size at run time

I have been finding that this has to be handled carefully , but, if you are referring to the global work size, it can be done,.

The local work size invariably defaults to 0 though.

If one is working in 2d then an image process kernel will not accept any value to the z global work space dimension.

See attached example.

PreviewAttachmentSize
OpenCLVaryingGlobalWorkspace105.72 KB

toneburst's picture
Re: OpenCL : Array Size at run time

The z-size is for 3D textures, but these aren't supported by QC yet, so it doesn't do anything. When you create a new OpenCL Kernel, you might also have noticed 3D textures are listen as an available input type. I got all excited about this when I first noticed it, but my excitement soon dissipated when I realised there was no way actually to get a 3D texture into an OpenCL Kernel in QC, at least not without some major behind-the-scenes surgery and/or a custom plugin.

a|x

toneburst's picture
Re: OpenCL : Array Size at run time

cybero wrote:
The local work size invariably defaults to 0 though.

If I understand correctly, the global size refers to motherboard-side memory, and local to VRAM on the GPU (local to where the code is executed).

I might be completely wrong though. This has always been something that confused the hell out of me, whenever I dipped my toes into OpenCL.

a|x

cybero's picture
Re: OpenCL : Array Size at run time

The first posted varying global work dimension's example file was an image routine kernel, this creates a mesh and yet has its global work dimensions varying.

Not a very efficient example, BTW, just a proof of concept.

PreviewAttachmentSize
ParticleImageVaryingWorkDimensions.qtz21.53 KB

cybero's picture
Re: OpenCL : Array Size at run time

Just re-read your post yanomano.

I think that if I've understood how the pipeline is working with OpenCL, get_globalsize() evaluates every time one updates that global size value, which is sought once at the start of every run through of the kernel

__kernel void grid( __rd image2d_t srcImage,__global float4 *Colors,__global float4 *Vertices)
{
 
   int   tid_x   = get_global_id(0),
      tid_y   = get_global_id(1),
      sz_x      = get_global_size(0),
      sz_y      = get_global_size(1);
 
   int index      = mad24(tid_y, sz_x, tid_x);
 
   float u = tid_x/(float)(sz_x-1);
   float v = tid_y/(float)(sz_y-1);
 
 
   int2 coords = (int2)(get_image_width(srcImage)*u, get_image_height(srcImage)*v);
 
   const sampler_t samplerA = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;
   float4 color = read_imagef(srcImage, samplerA, coords);
 
 
   float x = 2.0f*u-1.0f;
   float y = 2.0f*v-1.0f;
   float z = 0.; 
     vstore4(color, (size_t)index, (__global float*)Colors);
     vstore4(make_float4(x, y, z, 1.), (size_t)index, (__global float*)Vertices);
 }

gives a varying set of vertices results if the input variables that the kernel seeks the size of , the Count x and Count y for instance, change and update.

This is taken from the example posted, and I've just remembered, it does use a Signal patch to pump up the results. If disconnected, then the varying work dimensions will still kick in though.

Hopefully I've picked up totally on what you're asking about this time, Interpolations aside :-)

If you connect a structure count to the vertices and put that to an instructions patch and disconnect the signal, then you can see what varying the x and y size count does, it does update the array size and alter the number of structure members.

Pardon the excruciating fps performance BTW •~

yanomano's picture
Re: OpenCL : Array Size at run time

I had to provide an exemple at first sorry ;) Here is a super basic example, as you can see i'am starting OpenCl from the ground...

Just try to change the pointsCount parameter to see that the Array size at the output of the OpenCL kernel is always the same....

The only way to update the Array size from the CL kernel output is to stop/start the composition.

Re-edit : I've just noticed something very interesting : it is the length of the for loop in the javascript patch that doesn't update when it is connected to an Open CL Kernel !

PreviewAttachmentSize
OpenCL_ArraySize.qtz4.22 KB

gtoledo3's picture
Re: OpenCL : Array Size at run time

I haven't looked at it much, but do note that your javascript isn't setup to change array size and is the root of the problem.

cybero's picture
Re: OpenCL : Array Size at run time

Actually you've also missed another trick, see attached rework and kudos on the self correction of your JS.

You need to expose the output dimensions then the changing points number will update the size of the structure's member count.

PreviewAttachmentSize
OpenCL_ArraySize_Update.qtz36.91 KB
ArrayUpdateOpenCL.jpg
ArrayUpdateOpenCL.jpg82.47 KB

yanomano's picture
Re: OpenCL : Array Size at run time

Hum :) Check this compo : With a HUD displaying structure count...

as you can see if you switch from "JS A" to" JS B" with the menu, the CL kernel update the array length correctly. But when the menu is set to JS A and changing the point count manually the Counted points from the output of the Javascript doesn't update...

Strange no ?

PreviewAttachmentSize
OpenCL_ArraySize_ex2.qtz15.46 KB

yanomano's picture
Re: OpenCL : Array Size at run time

Ok it makes sense ! Exposing output dimension works ! ( then connecting the "output point size X" where needed) THX Cybero :)

gtoledo3's picture
Re: OpenCL : Array Size at run time

Forcing re-evaluation by forcing the OpenCL kernel to search back to the javascript patch may work, but it's not the best workaround, and I wouldn't count on it always working (it probably shouldn't be doing that... the fact that it is may actually be a bug).

Changing the javascript on the patch so that the "myPos" length gets compared to the input "pointCount" will change the javascript to do what you want it to do. There are some other minor changes that aren't as crucial.

So, this will fix it from the root of the problem, and let your pointCount be dynamic.

PreviewAttachmentSize
OpenCL_ArraySize_Javascript Fix.qtz5.07 KB

cybero's picture
Re: OpenCL : Array Size at run time - Implicit or Explicit ?

That's a really neat catch on yanamono's JS, gtoledo3.

It does indeed automatically update the global work size affected [x].

Thing is it looks likely that for more OpenCL capable cards than less, using implicit [auto calculated work dimensions results in a slower fps than using explicit work dimensions.

This doesn't seem to affect the nVidia card range the same way as the ATI range.

http://www.geeks3d.com/20100115/test-gpu-computing-geforce-and-radeon-op...

Interesting to note how the use of transcendental functions affords efficiencies in process.

See attached example, both implicit and explicit seem capable of obtaining between 120 to 170 fps.

BTW, has OS X OpenCL support arrived for ATI cards as yet?

I'm beginning to wonder just what difference it does make having both x and y work dimensions updating congruently [ the most important point ] :-).

For a structure or process only reading the x value array, probably little or nothing.

However on Apple's site it does say

Quote:

Manually setting the work sizes or output dimensions is required when input arrays differ from one another in size or differ from the size of output arrays

PreviewAttachmentSize
ImplicitOrExplicit.qtz48.42 KB

toneburst's picture
Re: OpenCL : Array Size at run time - Implicit or Explicit ...

cybero wrote:
Thing is it looks likely that for more OpenCL capable cards than less, using implicit [auto calculated work dimensions results in a slower fps than using explicit work dimensions.

Would this be indicative of CPU-fallback?

a|x

gtoledo3's picture
Re: OpenCL : Array Size at run time - Implicit or Explicit ...

I took that quote about manually setting worksize/output dimension to mean, as in when you have a kernel that takes in multiple structures(arrays) of varying length. One thing is for sure... it certainly isn't erroring out or causing problems when manipulating a single array now, and Apple examples don't do that when operating on single array (but that fact doesn't mean too much either, to me).

There are a number of qtz's in the framework that comprise patches, which all set workspace automatically though... It certainly doesn't make it easy to learn OpenCL properly, or know when we are being "babysat" or not.

All of these lurking qtz patches that have would up in SL are a blessing and a real curse, insofar as what is happening behind the scenes being obscured.

BTW, It looks like if you run something through the mesh transform patch it will manipulate the workspace automatically (though I don't know if it will do the little workaround/kernel reset thing that was happening in the other scenario).

gtoledo3's picture
Re: OpenCL : Array Size at run time - Implicit or Explicit ...

On my Macbook Pro, difference is negligible, and auto calculated spikes above "explicit" much of the time, and vice versa. Not getting any measurable difference at all (nVida 9400/9600 combo).

Also, @cybero keep your Y global work size at 0. In this case it doesn't need to be changed, and results in the structure mis-rendering (a random dot in the middle when you invoke that). As for the nuts and bolts of why, I'm not 100% sure, I just know that it is. This is something I've encountered regularly, and am not sure why Y global work size doesn't need to be adjusted.

gtoledo3's picture
Re: OpenCL : Array Size at run time

It's weird to me that doing that only exposes X dimensions.

Can we just throw all of this out and start from scratch? This was designed with a totally non-QC-centric style (the patch-set).

cybero's picture
Re: OpenCL : Array Size at run time - Implicit or Explicit ...

In a word, no.

It's more to do with how the different cards [nVidia & ATI] have their respective scalar units and compute units in parallel.

Looks like aside product of the specific cards design.

Looks like the current ATI cards have a great range of OpenCL facilities, but they do seem to prefer, not with all ATI cards though, explicit work dimension values.

Transcendental functions also soup things up.

cybero's picture
Re: OpenCL : Array Size at run time

Scratch and start from a blank OpenCL kernel.

Sounds like a good idea.

(I like erasing all the other example kernels, maybe keeping the input / output guide for ready reference.

Then the kernel name pops up automatically when blanking out the default OpenCL kernel name on the patch.

Nice for uniquely naming routines.)

gtoledo3's picture
Re: OpenCL : Array Size at run time - Implicit or Explicit ...

It looked like that article was in reference to Windows systems, and might have some interplay with drivers (eg., on a Mac system, and as drivers update, this may have no bearing).

gtoledo3's picture
Re: OpenCL : Array Size at run time

No, I guess I meant the whole OpenCL/Mesh patch assortment. I feel like they need to be thrown out, and re-conceptualized, along with the Shadow engine, Feedback, and Interaction. I'm not needlessly downing on any of the patches, they are just all flawed from the get go. (Lest I be accused of needlessly criticizing, I could point out fundamental flaws with the whole OpenCL/Mesh implementation, and have indeed filed bug reports on the majority of the problems).

cybero's picture
Re: OpenCL : Array Size at run time

Oh I see.

Far more radical.

LOL.

A real new broom approach to the whole OpenCL / Mesh [& Shadow, Feedback & Interaction] .

gtoledo3's picture
Re: OpenCL : Array Size at run time

Here's some fun ones to ponder...

-Create an OpenCL Context Info patch. Now look for the OpenCL Device Info patch to hook it to ;-)

-Reconfigure a Mesh Creator to Volume mode, using Settings. Now, flip to Point Sprite Mode. Note the missing texture input. Now, flip to Volume again. Note no pixel inputs. Now, flip to any other mode. Note how now all you have is a Vertices input. Cute!

-How are you supposed to know how many textures there are, to set the "Get Texture" with full functionality?

-OpenCL compiler doesn't compile in the background to show errors, as advertised, you have to actually run the kernel. In addition, I feel as though it was introduced into QC too soon (as seen by the moving target nature of what the OpenCL compiler accepts, vs. doesn't accept, as it is implemented in QC to reflect the actual OpenCL language more accurately).

-The DAE loader doesn't even have as many options as viewing a DAE file in Preview (no scene/cam presets).

-Feedback can be coaxed into not restoring inputs properly, feedback can already be done via insert splitters (albeit, not always in a straightforward way), and Feedback patch needlessly configures input ports based on all published outs (not necessary).

-Shadows... oh where to start? They don't fall in the right places, don't work with an iterator, break GLSL, and texture the back sides of objects when they shouldn't. That's for starters.

-Interaction doesn't reset properly when stopping/starting a composition, has a problem with mouse down inside of iterators, paradigms break down entirely in 3 Dimensions (hey Apple, you bought a 3D ENGINE, NOT a 2D one), is implemented willy-nilly (Sprites, Billboards, and then Mesh renderers (a 3D object), but no Cubes or Spheres? Huh?) and is sort of gimmicky in that it could already be done with existing patches.

-In addition, the Interaction paradigm breaks down when used in a Render In Image when rendering to objects that aren't equal the size of the Render In Image, and/or have non-2D geometry - like a Sphere. Effecting with CI also breaks down the concept of Interaction.

I would rather see concepts that can't possibly work as is thrown out, or savagely reworked TO work. Introducing new features that don't work COMPLETELY, or break old functionality, can in no way be construed as logical, or positive.

I've given a really cursory rundown. I could get more in depth, but it's getting really OT, and I dislike the possibility of being seen as needlessly "downing on" QC, when, in fact, I simply want to see the technologies get tested and vetted a little better before getting implemented.

cybero's picture
Re: OpenCL : Array Size at run time

Fair enough :-)

I'd have been well into an extended beta testing of OpenCL / QC whilst the default was for no OpenCL/QC on 10.6.

Still, no turning back the clock now.

{

totally unrelated

[Brazil v N.Korea - shades of South Park]

}

Actually, you've not mentioned my favourite beef - the cache related interface glitching, interface degradation. Seems partially related to / aggravated by GROWL or menu bar items

I've noted quite a few peculiarities with the Mesh Creator patch and switching between different modes when going into and out of Volume, the patch can be reduced down to about two inputs and one output, funny.

yanomano's picture
Re: OpenCL : Array Size at run time - Implicit or Explicit ...

I've an ATI 3870 which is not supported by opencl, so computation is only on the CPU...

cybero's picture
Re: OpenCL : Array Size at run time - Implicit or Explicit ...

When will ATI be supported for OpenCL on OS X ?

How do the ATI SDK materials work for you on that ATI card?

ATI cards appropriate compute units differently from nVidia, I wonder how much of a stumbling block that is proving to be.

Ironic if it is when OpenCL is meant to be that , an open specification.