swap file performance

psonice's picture

I'm about to do a fair bit of work on the core of my paint tool, and with the design I have in mind I'll hit memory issues fast.

Basically, I intend to store a series of images in a queue. The catch is that there will be 2 queues, both 2024*2024, one 8bit the other 16bit. There will be a lot of images too. The image gets stored at the end of the brush stroke so I don't think ram usage should affect performance much.

That's going to eat loads of ram, which means swapping to disk. How well is it likely to handle it? Anyone hit this kind of scenario before? Thought I'd better ask and redesign now if necessary rather than waste the effort!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

cwright's picture
hard to tell

It's difficult to tell how swap performance will be due to a number of factors.

OS X allocates swap space as separate swap files; this means that if you span multiple files with your working set, you may have huge disk seek times as the heads ping-pong between the different files (this is related to disk fragmentation) -- not really possible to measure this with built-in tools that I'm aware of.

swap performance overall is based on harddrive performance (read speed and seek speed specifically) -- each drive will have slightly different characteristics. Laptop drives will typically be slower.

The swap code in Tiger was absolutely terrible; beachballs were literally painful. Leopard did a lot to alleviate that (I was blown away at the usability difference between 10.4.10 and 10.5.0 when using 512MB of ram). That said, Apple can change and tweak their swap code further in the future, which may help or hinder performance, and none of that is under anyone's control.

OS X also appears to do aggressive working-set analysis (basically, it remembers which chunks are used with other chunks), so if you consistently use the same pieces from swap, it'll probably learn the patterns quickly, and do fairly well. If you're planning on randomly tromping through a massive amount of swap though, it'll not be able to figure anything out, and you'll have lower performance.

Do you have estimates on the number of images you're planning on using? do you have any plans on compressing the images (lossily or losslessly)? Depending on your plans, there are small tweaks you can make to help keep things running a bit more smoothly.

psonice's picture

I guess most of the memory usage will be write rather than read, and performance won't be so important for the read part anyway.

The way it works is that the artist paints a stroke, and on mouse off the whole canvas is stored as an image in the queue. The mouse/tablet position is sampled 15 times per second into a queue and that structure gets stored in a second queue on mouse off.

As the painting builds up you end up with a structure of images and one of corresponding brush stroke data. I need both because the brush data will be used to store and recreate the image (I'm aiming for 64k with high quality art ). The image queue is used for undo as it takes a while to rebuild the image otherwise. I'll also use the image data for comparison at the end to work out what data can be discarded.

I've no idea how many images will be used, it depends on the artist. I can see it being a few hundred though. I take it on a 32bit mac it'll crash if I hit 4gb and I'll get more space on 64bit?

cwright's picture

You could store stroke data, couldn't you, with periodic checkpoints (every N strokes, or N seconds, or a heuristic of both). This would keep you from needing to store massive texture data, and wouldn't take too long to rebuild if you were to work backwards (just go to previous checkpoint image, and work forwards).

hitting 2 or 3 GB on a 32bit app will die (I've had problems allocating just 1GB of ram on a 32bit mac, but that was in Tiger, and I haven't tried it on Leopard; I just made the app 64bit and left it at that). 64bit gives you 4 billion times as much address space, so the OS can map several GB's of addresses without collisions.

Overall, storing massive data to swap is probably not a good design. That's what tempfiles are for (not really possible in QC though, I totally understand)...

psonice's picture

Snapshots could be a good idea. Although this app will only run on a handful of macs.. if I'm lucky they'll all be 64bit, so I can be lazy ;)

It could be tricky to rework it later though so maybe I should just go for snapshots now. It'll still mean a delay while the image is recreated though, because the strokes have to be recreated realtime (as it has to draw one brush texture per frame to be identical to the original), which will get annoying if there's a few really long strokes.

cwright's picture
time machine

To further fuzz the boundaries, you could use a non-linear step between snapshots, kind of like timemachine (hourly for a day, daily for a week, weekly for forever). You could store 1 snapshot per stroke for the past 10(*) or so 'frames' (and assume that going back more than 10 levels is extraordinarily unlikely), and then toss out 9/10 or whatever past that, and then 99/100 after some other threshold. redrawing 10 strokes shouldn't take an unreasonable amount of time (esp in an unlimited-undo context), and even 100 strokes wouldn't be too crazy if you got that far back.

(*) 10 and 100 are arbitrary cut off points; feel free to use tuned parameters (or even adaptive ones based on available ram!) for a better user experience.

[edit: couldn't iterators be used with the stroke structure data to render a whole stroke (or even set of strokes) in a single frame?]

psonice's picture
sounds like a winner

I like the sound of that. I could store every frame for the last 10 strokes, which isn't unreasonable, and perhaps every 10th for the next 100, then every 50th. That should give a good mix of speed + memory usage.

Good idea on the iterator too - I hadn't thought of that. Can't see any reason why it wouldn't work :) The only catch would be that long strokes could choke the system and take a several frames to render, which wouldn't normally be an issue but could be a problem with the way I intend to do the demo - I have no way of properly synching the audio to the video, so it's essential everything takes a fixed amount of time (more bad design, but if I avoided bad design the whole project would be in the bin, it's one big dirty hack ;) Allowing a bit of extra time for these parts to complete should avoid that issue, although I also like the idea of replaying the painting live as a loading screen :)

tobyspark's picture
i'm shortly going to be working on a vector artwork patch...

...that might prove wonderful or a complete no-go. we'll see.

my interest in doing it that way is to then be able to manipulate them in space.

the canonical reference for why i'm doing it would be http://tagtool.org/