OpenCL : output array length.

benoitlahoz's picture

Hello everybody,

As M.Ooostrik and toneburst adviced me, I'm trying to enter the world of OpenCL...

It's hard to manage with, but well, I begin to have some (little) results.

My main trouble is to output an array with a length I don't know before the openCL treatment.

I'm beginning to think that it is not possible, but if someone know a way to do this without using JS after the output (I have a 20K length array in output)...

BTW, I know there's an objective C plugin but I can't manage to found it anymore.

Thanks !

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

cybero's picture
Re: OpenCL : output array length.

Perhaps this will help

See attached.

Quote:

Changing the point count doesn't update the array size to the output of the OpenCL Kernel unless the Ouput Dimensions are available as a variable input, then they do update when fed new values

cybero 2010

PreviewAttachmentSize
OpenCL_ArraySize_Update.qtz36.88 KB

idlefon's picture
Re: OpenCL : output array length.

If you enable the "show advanced settings" in the settings tab and then enable the "work item dimensions" and "output dimensions" you'll have control of the output size and the input processing boundaries.

Normally the output size = X Global Work size * Y Global Work Size * Z Global Work Size ( X/Y/Z Global Work size>0 )

with the settings I mentioned you can control that as well.

benoitlahoz's picture
Re: OpenCL : output array length.

Thanks cybero and idlefon,

My trouble is to generate an array size while running, like in Javascript : I'm testing if points in a moving BW pic are black or white, and I would like to output only the black points. So I want to fill my array only if points are black (for example)...

I would really like not to use Javascript or iterators as it results in a terrible framerate with the quantity of points I'm treating...

dust's picture
Re: OpenCL : output array length.

this will give you an array float4 of where in the grid the black pixels are.

__kernel void test_rgbaFFFF(__rd image2d_t srcimg, __wr image2d_t dstimg, float depth, __global float4 *pixelS)
{
      int       tid_x = get_global_id(0),
            tid_y = get_global_id(1),
            indx = tid_y * get_global_size(0) + tid_x,count=0;
 
 
      int2   pos = (int2)(get_global_id(0), get_global_id(1));
      float4   color = read_imagef(srcimg,       CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, pos);
      float4 mask = (float4)(0.f);
      if(((color.x+color.y+color.z)/3)<depth)
      {
      mask = (float4)(0.f);      
      }
      else
      {
      mask = (float4)(1.f);
      }
 
      pixelS[indx] = mask;
      write_imagef(dstimg, pos, mask);
}

now if you want to only fill an array with just black pixels, you would want to make an output lets call it outputBlack. __global float2 * outputBlack so it would like above above but you would want to do like in js a counter. i would suggest keeping the image size really small to start with. looping can bog down CL. where as normally you can say if pixel == black assign to new array. this however will be the same size as your input float length. pixel by pixel. like above.

you can however do something like

int count = 0;
 
outputBlack[indx] = (float2)(0.f,0.f);
 
if (((color.x+color.y+color.z )/3)<.1)
{
outputBlack[count++] = (float2)(pos.x,pos.y);
}

not really sure what your trying do once you get the black pixel. is just packing the xy pos of the pixel into a float2.

if you prefer like js you can make a structure like this.

int       tid = get_global_id(0), size =  get_global_size(0);
outputArray[tid] = (float)(0.f);
if( size !> tid) {for ( int i = 0; i < size; i++ ){outputArray[i] += i + 0.5;}}

idlefon's picture
Re: OpenCL : output array length.

Dust I checked the "count" method and it doesn't work. You mean something like this, right?

int counter;

__kernel void test_rgbaFFFF(__rd image2d_t srcimg, float threshhold, __global float4 *mesh, __global float4 *col)

{ int2 pos = (int2)(get_global_id(0), get_global_id(1));

int tid_x=get_global_id(0); int tid_y=get_global_id(1); int size_x=get_global_size(0); int size_y=get_global_size(1);

if(tid_x+tid_y==0) { counter=0; } float2 coord= (float2)(tid_x/(size_x-1.),tid_y/(size_y-1.));

float4 color = read_imagef(srcimg, CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST, pos);

if(length(color)>threshhold) { mesh[counter]=(float4)(coord,0.,1.); col[counter]=(float4)(1.); counter ++; }

}

cybero's picture
Re: OpenCL : output array length.

that kernel produces some interesting glitchy results. :-), idlefon

M.Oostrik's picture
Re: OpenCL : output array length.

As far as I know what you want cannot be done.

You can not set the size of output structure from inside the openCL patch. Changing the global worksize from the advanced functions, while running often results in crashes ( I experienced that on all my systems) Anyways changing the worksize won't help you in this case because you don't know the size on forehand.

I'm still breeding on a workaround...

benoitlahoz's picture
Re: OpenCL : output array length.

Thank you all,

Yes, I was dreaming of your PixelS but with a bigger image on my little MBP 13" :-)

Well, I'll try to treat the "not black" pixels from inside the same kernel. OpenCL seems quite a strange world !

Ben

idlefon's picture
Re: OpenCL : output array length.

That's right Cybero :D

I don't know why the counter method won't work. Is it not initializing the counter or the problem is somewhere else?

cybero's picture
Re: OpenCL : output array length.

I love the glitchiness it produces; who knew moving patches on the Editor stage could be so expressive upon the Viewer?

gtoledo3's picture
Re: OpenCL : output array length.

When that stuff starts happening, all bets of QC being "sane" are off. It might even take a restart of the computer to clear up problems.

gtoledo3's picture
Re: OpenCL : output array length.

The one thing you can do is to re-order... it's also possible to use the W column like a secondary index value. Basically, the idea is to do whatever makes most sense with the code, to push your needed value to index 0, or multiple indices to a known range.

The concept of parallel processing isn't too crazy or strange (there have been some mentions of that). One of the reasons things can happen quickly with OpenCL is because one is operating in parallel, and structure sizes aren't constantly being re-evaluated, etc.

Unfortunately, you can't do stuff like have the vector value contained within an index equal nan, and then have it not output. That will not work (at least when I tried), with openCL 1.0.

cybero's picture
Re: OpenCL : output array length.

Great for glitchiness though , GT. Makes me laugh.

dust's picture
Re: OpenCL : output array length.

here is an example of how to dynamically update your output size without changing your work group size. kernels have i/o so you can do a feedback loop and change your output size to anything you want. i have this set to 640 x 480 so the total size is like 300 thousand but the black pixel array is only 2-3 hundred in size. not sure if this what your trying to do. the image is rendered and kept in cl via vertex.points. so to see what is going on i would de-res it down to 50 x 50 then change the point size to the grid. you could use length() or magnitude i'm just using an average. seeing your black pixel positions are vertex points not image then you can pull out a x and y for each black pixel in qc units, with the smaller size array you can do more processing with js etc...

PreviewAttachmentSize
nector6.qtz9.85 KB

cybero's picture
Re: OpenCL : output array length.

That is really neat , dust :-). Just beginning to get my head around how this works and to what uses it can be put.

benoitlahoz's picture
Re: OpenCL : output array length.

Thank you everybody, and sorry for the time I took to reply. I was "disconnected" for a while.

Is there a way to sort the structure using ObjectiveC ? I had, some time ago, an Objective C plugin, but I can't manage to find it anymore...

cybero's picture
Re: OpenCL : output array length.

Objective-C plugin is available on quartzcomposer.com .

dust's picture
Re: OpenCL : output array length.

all you want is a structure of black pixels ? to do your shadow tracking with ? obj-c plugin will crunch a 20k array pretty fast. the thing with open cl is its best to keep everything in cl. although an obj-c plugin would be your second fastest option.

i was making a plugin do this one day while helping make an xml exporter. the trick is qc sends its array as a dictionary to a plugin so you have to some sorting to keep things in order but sending the array back is easy as you can call an output like @property(assign) NSArray * outputBlackPixels.

you don't have to keep the array in dictionary format... although you can use a c array but with c arrays you need to declare the size before hand so i would suggest using an NSMutableArray. with a mutable array you just init it with an arbitrary length then behind the scenes cocoa will either copy things to a new array and resize the old one then copy it back with a new size or some operation to those sort of effects.

once you do your processing you can return the structure back as an NSArray even if your using a mutable array or a c array. just saying self.outputBlackPixels = c[i]. or you could do something multi dimensional like this, self.outputBlackPixels = [NSArray arrayWithObjects: x[i],y[i],z[i],w[i],nil]. so you see obj-c is very flexible in these regards. For the sake of pseudo code and running them.... the x[i] which is a c array object would actually need to be cast to an object so. x[i] would be replaced with [NSNumber numberWithDouble:x[i]]

i tested saving a float4 color structure down to xml file then re assemble all 300k plus indexes back to an image. not the ideal way to save an image but it is possible and much faster than anything js or a qc iterator will do.

obj-c is a bit wordy i find it more elegant at times than other languages and at other times its very difficult to read. like if you have a method calling the same method or something. your code can get really long depending on the amounts inputs your function has. like in c it looks like js.

float method(float x, float y, float z, float w) {return x+y+z+w;}  

but with objc its a bit longer. arguably clearer

-(NSNumber) method:(NSNumber*)x ypos:(NSNumber*)y zpos:(NSNumber*)z wpos:(NSNumber*)w { return x+y+z+w; }

so sometimes i like to use c methods and mix them with cocoa. also renaming your .m to .mm lets you do c++ things.

so if your dealing with images in a cl kernel you probably found for looping an image is costly at least in a 1 to 1 or for i in thread index context so using a qc plug to do your sorting, lopping merging type of things will be way faster than doing that stuff in cl.

although like GT mentioned if you keep your output to a known fixed size then cl will be the fastest still. no need to jump out of the pipeline.

you could however declare a local array in cl and use that dynamically by not declaring your output size. you would want to set up barriers so you don't go to large as you don't know how many black pixels there will be.

if you just want to turn pixelS into a vector field you could also in cl make some definitions.

#define LOOP_COUNT 100

then make a few different outputs at various init lengths.

#if LOOP_COUNT <= 100

you could make series of this or some other logic to fill array c when lopp count == x;

benoitlahoz's picture
Re: OpenCL : output array length.

Wow ! Thanks dust for this long explanation ! I'll try this... It may take a while as I don't even know Objective C (seems much harder than C. Is it because it's "objective" ???).

dust's picture
Re: OpenCL : output array length.

obj-c at first was like a foreign language to me. they do not teach it at my school. my java teacher said small talk the style of messaging ob-c uses was developed so children could program. i kind of laughed when he said that but once you get the syntax and its wordiness it makes a lot of sense. plus you can do a lot of programming without using ivars because everything is a reference to an object. sort of like how java works.

your class object id references your ivar. like NSString * myString; so when you need something of the nsstring type you can just type @"myString" the @ symbol declares the string as an object. equally you could tell the object NSString to make a string and add a number to the string all without declaring a variable. [NSString stringWithFormatting:@"myStringPlusFloat: %f", 33.333]; obviously you can assign that to a ivar for multiple uses.

so needless to say it took sometime for me to understand. it was the bracket small talk style messaging that was confusing i think at first. with obc-c 2 you can use the dot syntax so it feels a little more like other languages. remember obj-c is c based so if you can do carbon style of programming if you want.

i know people program without x-code but i find x-code's code completion to be your best friend, then when i'm not sure of a particular data type because it is so buried deep in the frameworks. a highlight and right click go to definitions seems to help me out more than looking up docs. although the docs and examples are equally as important, its just that either apple examples are like full on packed with every bell an whistle of a framework or they are sparse to just get you started so its hit and miss with me. i find hitting the escape key and seeing the options i have available and exploring that way more beneficial.

what size image are trying to pixelS ?

benoitlahoz's picture
Re: OpenCL : output array length.

Right now I'm fighting with accessing a multidimensional array (I want to test the value of someting like myArray[i][3])... Can't get it... Argh !

Actually, I divide the size of my image by 4 and I get something like 390K points !!! But i would be glad to have 20K points (as I only want to use few of them at the end (shadow edges).

Thanks again for your explanation... My head is burning as tinkering with OpenCL + Objective C at the same time is... pfffff... But so enthusiastic ! :-)

[EDIT : Here is my code (obviously full of errors I guess !)

int count = 0;
 
   NSMutableArray *tempOut = [ [NSMutableArray alloc] init];
   NSMutableArray *tempIn = [NSArray arrayWithObjects: self.inputStructure, nil];
 
   for(int i = 0; i < sizeof(self.inputStructure); i++) {
 
      if ([tempIn getObjects:[(i) getObjects:(3)]] != 0) {
 
         tempOut[count] = tempIn[i];
         count++;
 
      }
 
   }
 
   self.outputStructure = [NSArray arrayWithObjects: tempOut, nil];
   return YES;

dust's picture
Re: OpenCL : output array length.

that looks like it will run. here is how i'm getting blobs and positions from cl 2 plugin back 2 cl. pretty fast 60 fps. check out my cl centroid thread for the plugin source to build it. it will show you how to access float4 components to make a pixelS mono structure or in this case to track white blob pixels.

- (BOOL) execute:(id<QCPlugInContext>)context atTime:(NSTimeInterval)time withArguments:(NSDictionary*)arguments
{
   if (self.inputRun==0.0) {
 
 
   NSEnumerator *enumerator = [self.inputColor keyEnumerator];
 
   id key;
 
   NSMutableArray *color = [[NSMutableArray alloc] init];
   NSMutableArray *mesh = [[NSMutableArray alloc] init];
   NSMutableArray *pixelS = [[NSMutableArray alloc] init];
   NSMutableArray *blobs = [[NSMutableArray alloc] init];
 
 
   int i = 0;
 
   while ((key = [enumerator nextObject])) {
 
      NSNumber *iter = [NSNumber numberWithInt:i];
      [color addObject:[self.inputColor objectForKey:iter]];
      [mesh addObject:[self.inputMesh objectForKey:iter]];
 
      i++;
 
   }
 
 
   int index;
 
   pixelS = color;
   for (index = 0; index < i; index++)
   {
 
      pixelS = [color objectAtIndex:index];
 
      NSNumber * r = [pixelS objectAtIndex:0];
      NSNumber * g = [pixelS objectAtIndex:1];
      NSNumber * b = [pixelS objectAtIndex:2];
      NSNumber * a = [pixelS objectAtIndex:3];
 
      double averageMean = [r doubleValue]+[g doubleValue]+[b doubleValue]/3;
 
      if (averageMean > self.inputThreshhold) {
         [blobs addObject:[mesh objectAtIndex:index]];
      }
 
   }
 
         self.outputBlobs = blobs;
 
   }
 
 
   return YES;
}

benoitlahoz's picture
Re: OpenCL : output array length.

Yes ! What a good job ! I can't wait for the polygons you're talking about in the thread.

It would be of great use for me and my shadows !

Here is the code I'm building to get the "interesting vertices", but it doesn't works.

- (BOOL) execute:(id<QCPlugInContext>)context atTime:(NSTimeInterval)time withArguments:(NSDictionary*)arguments
{
   /*
   Called by Quartz Composer whenever the plug-in instance needs to execute.
   Only read from the plug-in inputs and produce a result (by writing to the plug-in outputs or rendering to the destination OpenGL context) within that method and nowhere else.
   Return NO in case of failure during the execution (this will prevent rendering of the current frame to complete).
 
   The OpenGL context for rendering can be accessed and defined for CGL macros using:
   CGLContextObj cgl_ctx = [context CGLContextObj];
   */
 
 
   int count = 0;
 
   NSMutableArray *tempOut = [[NSMutableArray alloc] initWithCapacity:10];
   NSArray *tempIn = self.inputStructure;
 
   for(int i = 0; i < sizeof(tempIn); i++) {
 
      NSArray *tempI = [NSArray arrayWithObjects: [tempIn objectAtIndex:i], nil];
 
      if ([tempI objectAtIndex:3] != 0) {
 
         NSArray *tempX = [tempI objectAtIndex:0]; 
         NSArray *tempY = [tempI objectAtIndex:1];
         NSArray *tempZ = [tempI objectAtIndex:2];
         NSArray *tempW = [tempI objectAtIndex:3];
 
         NSArray *temp = [NSArray arrayWithObjects: tempX, tempY, tempZ, tempW, nil];
 
         [tempOut insertObject:temp atIndex:count];
 
         // [tempOut replaceObjectAtIndex:i withObject: objectToAdd]
         count++;
 
      }
 
   }
 
   //self.outputStructure = [NSArray arrayWithObjects: tempOut, nil];
 
   self.outputStructure = tempOut;
   return YES;
}

No errors but I have a message from Quartz Composer :

0x85d5ec21: -[QCContext nextExecutionTimeForPatch:time:arguments:]
0x85d5e9ca: -[QCGraphicsContext nextExecutionTimeForPatch:time:arguments:]
0x85d5e7ad: -[QCOpenGLContext nextExecutionTimeForPatch:time:arguments:]
0x0000d5f2
0x85da6731: -[QCView render:arguments:]
0x85da5e6c: -[QCView startRendering:]
0x0000cd80
0x843c4a66: _nsnote_callback
0x89072000: __CFXNotificationPost
0x8905e578: _CFXNotificationPostNotification
0x843bb9ce: -[NSNotificationCenter postNotificationName:object:userInfo:]
0x87f41b1b: -[NSWindow _reallyDoOrderWindow:relativeTo:findKey:forCounter:force:isModal:]
0x87f4177a: -[NSWindow orderWindow:relativeTo:]
0x87f3f100: -[NSWindow makeKeyAndOrderFront:]
0x88144335: -[NSWindowController showWindow:]
0x0000ca7f
0x1600f732
0x884e4fbf: -[NSToolbarButton sendAction:to:]
0x88156135: -[NSToolbarItemViewer mouseDown:]
0x8804334f: -[NSWindow sendEvent:]
0x87f78a86: -[NSApplication sendEvent:]
0x0000a994
0x87f0f4da: -[NSApplication run]

dust's picture
Re: OpenCL : output array length.

the first thing i do is my input structure is of type NSDictionary. i enumerate that into an NSMutableArray to keep index orders then loop through. so if your ordering is messed up try that first. here if this code works below your should be able to view gl point structure or cl mesh.

NSMutableArray *tempOut = [[NSMutableArray alloc] initWithCapacity:10];
NSMutableArray *tempIn = [[NSMutableArray alloc] initWithCapacity:10];
NSMutableArray *tempI = [[NSMutableArray alloc] initWithCapacity:10];
 
tempIn = self.inputStructure;
 
for(int i = 0; i < [tempIn count]; i++) {
 
tempI = [tempIn objectAtIndex:i];
 
NSNumber * z = [tempI objectAtIndex:3];
 
       if ([z doubleValue] != 0) {
 
           NSNumber *tempX = [tempI objectAtIndex:0]; 
           NSNumber *tempY = [tempI objectAtIndex:1];
           NSNumber *tempZ = [tempI objectAtIndex:2];
           NSNumber *tempW = [tempI objectAtIndex:3];
 
           NSArray *temp = [NSArray arrayWithObjects: tempX, tempY, tempZ, tempW, nil];
 
           [tempOut addObject:temp];
 
       }
 
   }
 
  self.outputStructure = tempOut;

benoitlahoz's picture
Re: OpenCL : output array length.

Thank you Dust !!!! It is working with a slight modification, but I'm passing from 60 to 15 FPS, with a 10K points structure in input and a 130 points structure in output.

Is there a "deallocation" thing to do to keep FPS at a high rate ?

- (BOOL) execute:(id<QCPlugInContext>)context atTime:(NSTimeInterval)time withArguments:(NSDictionary*)arguments
{
   /*
   Called by Quartz Composer whenever the plug-in instance needs to execute.
   Only read from the plug-in inputs and produce a result (by writing to the plug-in outputs or rendering to the destination OpenGL context) within that method and nowhere else.
   Return NO in case of failure during the execution (this will prevent rendering of the current frame to complete).
 
   The OpenGL context for rendering can be accessed and defined for CGL macros using:
   CGLContextObj cgl_ctx = [context CGLContextObj];
   */
 
   NSMutableArray *tempOut = [[NSMutableArray alloc] initWithCapacity:10];
   // NSMutableArray *tempIn = [[NSMutableArray alloc] initWithCapacity:10];
   NSMutableArray *tempI = [[NSMutableArray alloc] initWithCapacity:10];
 
   NSArray *tempIn = self.inputStructure;
 
   for(int i = 0; i < [tempIn count]; i++) {
 
      tempI = [tempIn objectAtIndex:i];
 
      NSNumber * z = [tempI objectAtIndex:3];
 
      if ([z doubleValue] != 0) {
 
         NSNumber *tempX = [tempI objectAtIndex:0]; 
         NSNumber *tempY = [tempI objectAtIndex:1];
         NSNumber *tempZ = [tempI objectAtIndex:2];
         NSNumber *tempW = [tempI objectAtIndex:3];
 
         NSArray *temp = [NSArray arrayWithObjects: tempX, tempY, tempZ, tempW, nil];
 
         [tempOut addObject:temp];
 
      }
 
   }
 
   self.outputStructure = tempOut;
   return YES;
 
}

franz's picture
Re: OpenCL : output array length.

[tempOut release]

benoitlahoz's picture
Re: OpenCL : output array length.

Merci !!!

cwright's picture
Re: OpenCL : output array length.

Don't alloc tempI either -- you've got this:

NSMutableArray *tempI = [[NSMutableArray alloc] initWithCapacity:10];

followed by

tempI = [tempIn objectAtIndex:i];

which means you just leaked an array (and you're leaking per-execute, which is really bad). just have this:

NSMutableArray *tempI;

dust's picture
Re: OpenCL : output array length.

nice note chris. you may also want to make the local variables, global and declare them in your .h with properties then synthesize them like you do with plugin inputs and outputs. just like @dynamic but @synthesize. that way you can in your init method just allocate your working arrays once, then release them in dealloc.

i think what chris is saying and i'm not one to pimp code like he does but on each execution a new array is getting allocated, which isn't being really used because your not adding to the allocated array, your assigning the component array to it which is just reallocating i suppose.

i do the memory leak things as the last step to finish up things. i suggest using an analyzer like http://clang-analyzer.llvm.org/ as not always can i see where things are leaking so clang will show you the spots it thinks there may be leaks. that doesn't mean its perfect but a good tool that i have needed a few times.

also like i mentioned before sometimes you don't even need to make local ivars. like this

   NSNumber *tempX = [tempI objectAtIndex:0]; 
         NSNumber *tempY = [tempI objectAtIndex:1];
         NSNumber *tempZ = [tempI objectAtIndex:2];
         NSNumber *tempW = [tempI objectAtIndex:3];
 
         NSArray *temp = [NSArray arrayWithObjects: tempX, tempY, tempZ, tempW, nil];

could also be written like this

 [tempOut  addObject: [NSArray arrayWithObjects: [tempI objectAtIndex:0], [tempI objectAtIndex:1], [tempI objectAtIndex:2], [tempI objectAtIndex:3], nil]]

didn't want something like that to be confusing.... so seeing things stored to local variables is sometimes easier to understand i guess but your not really using the tempX, tempY etc... local ivars.

 [tempI objectAtIndex:0]

is all ready a NSNumber so there is no need in this case to assign it to a local ivar unless you need to process it even then you could do....

[[tempI objectAtIndex:0] doubleValue];

the latter being if you want to do a calculation with your xpos. didn't want to confuse you to much by not using many ivars so thought declaring some locals would help you understand the NSObject thing.

cwright's picture
Re: OpenCL : output array length.

dust wrote:
i think what chris is saying and i'm not one to pimp code like he does but on each execution a new array is getting allocated, which isn't being really used because your not adding to the allocated array, your assigning the component array to it which is just reallocating i suppose.

No, totally wrong -- it's LEAKING. Let me be more explicit: Every time this patch executes, it chews up more memory that it NEVER RETURNS TO THE SYSTEM EVER. This means after a long time, performance will still degrade, the harddrive will fill up, and random applications (including potentially the kernel!) will crash -> bad news. This isn't a matter of style, this is a fundamental correctness issue.

It isn't "just reallocating" (which implies that the previous stuff is neatly dealloc'd or reused), it's leaking which is a programming error.

(sorry for the passion -- this is the type of mistake that causes all kinds of problems down the road, and Ive seen enough JS crap lately that I suspect it's starting to cross over. In JS it doesn't leak, it's just idiotic and wasteful, but in ObjC/C it's permanently bad and dangerous.)

dust's picture
Re: OpenCL : output array length.

no need to apologize for the passion chris, i'm glad you joined the thread. i suppose i worded myself wrong, plus pre-phrased with "i think" which implied i was a little un clear but yes reallocating implies you dealloc'd in first place, which is a very good point. which i believe would also solve the leak. releasing tempI inside the execute function that is.

vade's picture
Re: OpenCL : output array length.

Err.

You are not re-allocating, you are changing the object reference (the pointer value) of temp1 to another object, completelty loosing reference to the original NSArray that was alloc/initted. Thats the leak. The memory is used, and you no longer know where it is. You are loosing the reference. A better (and more correct) thing to do would be to never alloc in the first place, which is what Chris said, ie;

NSArray * tempI = [tempIn objectAtIndex:i];

or probably better;

id tempI = [tempIn objectAtIndex:i];

Since you may not know whats at index 0. You can then do runtime object introspection like;

if([temp1 respondsToSelector:@selector(objectAtIndex:)])
// or
if([temp1 isKindOfClass:[NSArray class]])

But isKindOfClass: isMemberOfClass: can back fire for you if you have class clusters, I think. It gets nuanced fast. Generally speaking, ask it if it accepts the method, because knowing whats a class cluster and isn't, isn't always clear.

So, yea, drop the alloc init all together. Doing an alloc/init and then a dealloc is not just .. weird, its wasteful and slow, malloc literally locks the thread where malloc is being called while memory is being allocated. Its generally fast enough, but, something to avoid because cleaning up and tracking memory still takes its toll.

On that note of things to avoid, don't use Obj-C methods in tight loops, the method calling over-head can be a real performance killer. I found a nice tight loop of mine all of a sudden doubling CPU load from 40% to 80%, because of one Obj-C @synthesized getter being called in the loop. Really.

Knowing is half the battle.™

gtoledo3's picture
Re: OpenCL : output array length.

I've found that tight loops seem to be an issue in OpenCL as well...that's just my anecdotal experience. Night and day stuff. After some experiences with that, I keep my eyes open for it all the time...

Is it true that it doesn't make a difference in some languages? That surprises me.

cwright's picture
Re: OpenCL : output array length.

vade wrote:
So, yea, drop the alloc init all together. Doing an alloc/init and then a dealloc is not just .. weird, its wasteful and slow, malloc literally locks the thread where malloc is being called while memory is being allocated. Its generally fast enough, but, something to avoid because cleaning up and tracking memory still takes its toll.

I completely agree that it's wasteful and unnecessary (allocations should be done outside the execute method -- in the setup when possible, and never per-frame unless absolutely necessary), but it doesn't "lock the thread". 10.6's malloc implementation was revamped to typically have thread-local allocation pools (when the request can't be satisfied due to size/pressure, it then and only then falls back to locking the process-wide heap, and even then I think it's somewhat fine-grained as far as locking goes?). It's really quite slick.

(Also: glGet* isn't a synchronization point unless you're using the MT engine, which explains why QC with the MT GL engine is terrible -- unrelated to this thread, but it's a bit of "conventional wisdom" that's actually wrong)

vade wrote:
On that note of things to avoid, don't use Obj-C methods in tight loops, the method calling over-head can be a real performance killer. I found a nice tight loop of mine all of a sudden doubling CPU load from 40% to 80%, because of one Obj-C @synthesized getter being called in the loop. Really.

Did IMP-caching help any? (that's what I often resorted to for kineme stuff -- and when that was too slow, direct access where available).

And: @synchronized is wildly expensive -- virtually all other locking mechanisms (pthreads, NSLock, dispatch semaphores, etc.) run circles around it.

cwright's picture
Re: OpenCL : output array length.

gtoledo3 wrote:
Is it true that it doesn't make a difference in some languages?

It depends. In plain C, a tight loop, like this:

for (int i = 0; i < 16777216; ++i)
   x[i] = y[i] + z[i];

is literally 4 instructions (the add, the increment, the compare, and the jump). so this would take roughly 16777216 * 4 cycles (disregarding cache effects, which should be essentially zero since we're doing a trivial linear traversal, so prefetching is going to eat it up). That's ~64million cycles. On a 1GHz cpu, that's ~0.064 seconds (and probably less, because the cpu can do several ops simultaneously, even if you don't use SSE/SIMD techniques explicitly).

In objc, message sends cannot be optimized away, so something like this

for(int i = 0; i < [someArray count]; ++i)
   doStuff([someArray objectAtIndex:i]);

is going to have 2 messages per iteration (the -count, and the -objectAtIndex -- assuming it doesn't issue more messages under the hood). each message send costs ~5-12 clock cycles under ideal circumstances, so a 16777216 element NSArray in the above loop will take at least 3 times longer (if you were accessing 2 arrays and storing into a third, you'd have way more messages -- you can see how this gets ridiculous quickly). Messages also have the fun side effect of having a few branches (so you trash your branch prediction unit), and they do gathers from ram, so you thrash L1 cache too a bit (and I cache a bit too, due to the actual message send code itself).

Message sends are pretty cheap generally, but they aren't free and QC spends a lot of time doing them.

OpenCL's a weird beast: on the CPU, it shouldn't matter (because the CPU's good at doing loops and stuff, especially when it can predict the jumps well). A GPU doesn't like loops. it doesn't like branches. doing loops will typically trash its caches, and hammer its register file (Cliff Click gives some good talks on how CPUs work -- GPUs are kinda the same, but souped up in some areas, and woefully lacking in others - if you want a rundown, I just chatted with a few intel engineers this week at GDC, so this stuff's particularly fresh in my mind ;)

So it's not so much the language as it is the evaluation hardware (there's no language that can do quick loops on the GPU, for example, because they just aren't designed that way). Languages on top can exploit tight loops (C, C++ usually), but not always (ObjC).

vade's picture
Re: OpenCL : output array length.

I wasn't using @syncronized object locking, I was just using the synthesize getter to get the values in a 640x480 loop.

static void depth_cb(freenect_device *dev, void *rgb, uint32_t timestamp)
{
    v002_Open_KinectPlugIn* self = freenect_get_user(dev);
 
    // Cache our Obj-C getter result for performance reasons.
    vImage_Buffer myDepthU = self.vDepthU16;
    vImage_Buffer myDepthF = self.vDepthF;
    float* finalDepth = self.finalDepthF;
 
    // This is techncially a hack (the values) are eyeballed but seem pretty damn correct
    vImageConvert_16UToF(&myDepthU, &myDepthF, -1.0, 1.0/1024 * 2.0, 0); 
 
    // invert our depth buffer.
    NSUInteger width = 640;
    NSUInteger height = 480;
 
    float* depthF = (float *)myDepthF.data;
    for(int i = 0; i < (width * height); i++)
    {
        float *value = depthF + i;
        *(finalDepth + i) = 1.0 - *value;
    }
 
}

from the old Obj-C;

static void depth_cb(freenect_device *dev, void *rgb, uint32_t timestamp)
{
    v002_Open_KinectPlugIn* self = freenect_get_user(dev);
 
    // This is techncially a hack (the values) are eyeballed but seem pretty damn correct
    vImageConvert_16UToF(&(self.vDepthU16), &(self.vDepthF), -1.0, 1.0/1024 * 2.0, 0); 
 
    // invert our depth buffer.
    NSUInteger width = 640;
    NSUInteger height = 480;
 
    float* depthF = (float *)self.myDepthF.data;
    for(int i = 0; i < (width * height); i++)
    {
        float *value = (float*)self.myDepthF.data + i;
        *(self.finalDepthF + i) = 1.0 - *value;
    }  

gtoledo3's picture
Re: OpenCL : output array length.

The example of the obj-c vs. C breakdown explains that clearly.

I understand the reasoning behind loops on the GPU not working well after wondering why some example code worked so badly, and also, why some CL example code was written in a way that avoided that. I wouldn't mind that rundown though, because I'm curious about more specifics.

Sometimes I don't really know if I think CL makes sense as a language in and of itself, using it on the CPU, but +1.0 updates have introduced more functionality, and made it more user friendly.

benoitlahoz's picture
Re: OpenCL : output array length.

Wow ! Getting back from a week-end without any computers I discover your posts... Sometimes a little hard to understand for me, but Chris + Vade posts made me gain 5 fps :-)

I'm worst than a newbie, and I'm sorry for the stupid mistakes I'm making ! But I'm learning a lot with you all.

Thank you !

M.Oostrik's picture
Re: OpenCL : output array length: the Plugin

Nice discussion!

I wanted to see how much this would speed up things. So i recreated the plugin based on all the comments.

It took me some time (i'm not very experienced in the whole objective C thing) And I want to save other people the trouble, so here it is.

The plugin is about 10 times faster then a javascript patch with the same functionality (see example) It would be nice to see the plain C implementation George is talking about.

Cheers!

p.s. I changed the code a little bit so it would respond to 1 instead of non zero values, because this better suited my needs.

PreviewAttachmentSize
WisOne.zip20.94 KB

benoitlahoz's picture
Re: OpenCL : output array length: the Plugin

Wow ! Great and very elegant !

Thank you Mathias !

monobrau's picture
Re: OpenCL : output array length: the Plugin

Wow great stuff! Thanks for sharing!!!!