OpenCL Hair-Tearing

toneburst's picture

I've been tearing my hair out all day trying to do something that should be very simple in OpenCL.

I need to write a Kernel with 2 different image inputs and an array input. The only way I can get it to work is if I make all the inputs the same 'size' (ie both the images have identical dimensions, and the array/structure has exactly the same number of members as the total pixel-count of each of the images). One of the images doesn't need to be as big as the other, but padding it out as a pre-process isn't too much of an issue. The array is more of a problem though, as I'm using a JS patch to generate it, and I really don't want or need it to have several hundred thousand members. Anyone any ideas how/if I can fix this?

a|x

gtoledo3's picture
Re: OpenCL Hair-Tearing

OpenCL processes want to be computed in a parallel fashion naturally. I totally sympathize with your scenario. OpenCL is one way (the only way?) in QC to do many kind of manipulation of very large image/structure, etc., but it's pretty natural want to operate on structures of different sizes. I run into this all the time, and have had plenty of crash and burns before figuring different principles out.

There's probably a few approaches that could work depending on the particulars of your scenario.

You may want to shift to thinking of having your process work in blocks of index ranges when structures aren't the same size, and then adding in iterations of some dud value for the blocks where there is a remainder in one of the structures, or picking multiples. It's hard to say the correct approach; the description is concrete but vague enough to make it hard enough to give a valid suggestion.

This isn't totally applicable to QC, but led to some "aha!" moments: http://www.mikeash.com/pyblog/friday-qa-2010-04-02-opencl-basics.html

If you had way more pixels than some javascript thing that is outputting an array, maybe you setup the javascript so that it's outputting an index size that matches the pixel width. Then, you can make your operations so that the javascript patch that is outputting less pixels than your image overall, and is equivalent to one row in width. Then, add in the javascript structure over each row. Conversely, perhaps the best route is to go ahead and queue off the top, to get your structure to the same size.

Maybe the easiest way, depending on if it handles what you need to accomplish, is to use a queue structure in parallel or queue number in parallel patch. Remember that if it's outputting a single lane structure, and you pass it to a float4 index input, the structure will convert to float4... so I think your first 4 index vals will convert to one index/vector. I'm not 100% sure on that, but that's what happens when you hit the mesh creator input with a single lane for sure.

I believe the queue struct in parallel should instantly make a queue populated with all of the values you need to kick off the kernel, without having the "fill up" time of the normal queue, which would be a crash and burn affair probably. So, if you feed it a struct of 307200, queue size 10, it should instantly kick off with a structure size of 3072000 without a fill lag. That is pretty handy for getting your chunks equal sizes, if that route helps solve things. I'm not sure what you need to do, but I think about if I had an audio struct that was smaller, and had to have it work with an image... I would probably attempt to queue in parallel first, with the idea that every structure count of the javascript out would be equivalent to one row X of the image sample, because of the ease of it.

Instead of javascript, maybe you can conceive it as a CI image out that delivers pixel strips, and you read the color to get your values.

dust's picture
Re: OpenCL Hair-Tearing

you can create a structure of any size in js and feed it into a kernel with a different thread id count. so if you take your global id index and multiply by the global size and assign your input array to global out. then your java script array will be in the cl structure up until your input size is exceeded. then if your not trying to share that thread cl will just put 0 for all the other indexes otherwise if your using that thread with another process then sometimes junk from the other process gets placed into you output structure.

as far as image size is concerned if your just processing the image and writing it back out to an image normally you should be able to resize your view but defiantly never change your global sizes while in execution. using images of different sizes. you just have to store the small image data locally and assign it to a different process that is the same size as your large image. i like to throw negative number into the null parts of the local structure if it is image data that way it easy when you are processing the data down the pipeline to test for when your real data ends. any number works just something out of the bounds of real data.

if your trying to mash image data with vertex data like using a depth image to extrude a mesh then i would say you should follow size conventions of powers of 2. use images of the same size by image resize with gl before going to cl is what i do. normally when i work with images of different sizes i split them into different kernels. defiantly if doing an image and mesh process never change the global size or render window size while executing or you will go down with no warning.

if you are running out of vram or what ever your frame rate will drop never let it go below 3 fps. so if you drop down to 4fps you still have time to force quit qc before your whole system goes down.

good question though in reference to different image sizes. i have been thinking about how i can up sample a fluid sim offline with qc cl kernel. although i'm thinking its possible in real time will just have to not use my view size for input to the fluid sim. thinking i might be able to up size by using render in image and feedback my small image into my desired larger image size sort of expanding the pixels.

i just want to do what ever its called on the ipad where the iphone app gets resized because my gpu can't handle the simulation i'm running at the size i want it.

you do that stuff with glsl tb. you think doing a cl gl interop then using glsl to up size ?

just some thoughts. if i had hair left i defiantly would be pulling it out sometimes with cl. so those are some tips i have found that help me from keeping my system from not crashing.

toneburst's picture
Re: OpenCL Hair-Tearing

Thanks for your replies, guys - plenty to chew on there! I've for the moment got around the issue, but my new approach has thrown up other issues, unfortunately.

George, it's funny you should mention using image strips to pass data between OpenCL kernels. I've actually ended up doing that, but this seems to be problematic. I have a feeling colorspace or color-management issues are rearing their ugly heads again.

Thanks again for the detailed info guys.

a|x

toneburst's picture
Re: OpenCL Hair-Tearing

Just discovered how to change the pixel and colorspace format for an OpenCL kernel. Wish I'd discovered that last night- I've spent hours trying to work out why passing data from one Kernel to another has been producing weird results. It's because if you don't explicitly set the output colorspace for an OpenCL kernel to 'Absolute Linear', any output image comes out in 'DisplayRGB' format, which is useless if you're using the image to pass data..! Make a mental note, a|x...

gtoledo3's picture
Re: OpenCL Hair-Tearing

Oh yeah, absolutely. That's particularly crucial in some fluid sim stuff I've had going. It's always so hard to make suggestions or know where the problem is without knowing the full scenario. I have another kernel that makes a normal map on the fly, and it wasn't right until I made it Absolute Linear.

What's funny to me, is that I was suggesting you read image data to get your kernel going, and last night cwright and I were having a discussion on another thread where he's talking about using image data in a similar manner that I suggested to you earlier in the day... but I totally didn't get what he was saying at first (as it was to solve a different problem, and a bit of a different scenario overall).

toneburst's picture
Re: OpenCL Hair-Tearing

I probably should have given a bit more background info. The particular scenario is quite obscure- I was trying to work out a way to insert arbitrary text into the Apple ASCII Art filter. The idea was to use JavaScript and OpenCL to create a small texture, with each pixel representing a single character. This is then resized with Nearest Neighbour interpolation, and overlaid on top of the input image. If the brightness, size and placement of these blocks is correct, then you can basically insert whatever text you want into the output image. The only problem is in character-colouring mode, the different chars end up different colours....

I've actually rewritten the original Core Image filter in OpenCL, but in fact, this would probably work just as well with the original CIFilter (and be more stable, probably).

a|x

cybero's picture
Re: OpenCL Hair-Tearing

http://www.macresearch.org/opencl good source of OpenCL information, might be helpful to you.