|Revision as of 15:29, 4 August 2010
John (Talk | contribs)
← Previous diff
|Revision as of 15:30, 4 August 2010
John (Talk | contribs)
(→4 August 2010)
Next diff →
|Line 2:||Line 2:|
|=== 4 August 2010 ===||=== 4 August 2010 ===|
|-||I've put up a tarball of the new Windows build system on the [http://www.vips.ecs.soton.ac.uk/supported/7.22/win32 supported download area].||+||I've put up a tarball of the new Windows build system on the [http://www.vips.ecs.soton.ac.uk/supported/7.22/win32 supported download area]. [[Build on windows]] has some instructions.|
|This system is based on [http://live.gnome.org/Jhbuild jhbuild] and can build the whole of nip2, including some pre-compiled dependencies and some that have to built from source, and some that need patching, in a single command. There are a couple of extra scripts which strip the build area down and make a setup.exe install as well.||This system is based on [http://live.gnome.org/Jhbuild jhbuild] and can build the whole of nip2, including some pre-compiled dependencies and some that have to built from source, and some that need patching, in a single command. There are a couple of extra scripts which strip the build area down and make a setup.exe install as well.|
Revision as of 15:30, 4 August 2010
4 August 2010
This system is based on jhbuild and can build the whole of nip2, including some pre-compiled dependencies and some that have to built from source, and some that need patching, in a single command. There are a couple of extra scripts which strip the build area down and make a setup.exe install as well.
Hopefully, we'll have a OS X binary built with almost the same system very soon as well.
31 July 2010
vips-7.23 has picked up it's first few features.
Tim Elliott has contributed a
im_vips2bufpng(), a function that can output a PNG image to a memory buffer. This is useful in web programming: a script can output an image directly to the client without having to go through the filesystem. He has a Ruby binding in preparation as well.
vips has a new open mode which opens via a disc file. At the moment, when you open a large image in a format which does not support random access (such as JPEG), the image is uncompressed to memory and then processed from that. So for example:
time vips im_rot180 wtc.jpg wtc90.png real 0m49.1s user 0m48.1s sys 0m0.5s peak RES 310mb
where wtc.jpg is a 10,000 x 10,000 pixel RGB image, and
im_rot180 does a 180 degree rotate, is processed like this:
- vips allocates a large memory buffer (300MB in this case) and runs
im_jpeg2vips()into this buffer.
- vips creates a "p" virtual image and runs
im_rot180()from the memory image into the virtual image.
- Finally, it runs
im_vips2png()from the virtual image to the output file name.
The problem here is that memory is limited and images can be very large. vips-7.23 has a new open mode, "rd", which is used everywhere. This mode allocates a temporary disc file and uncompresses to that rather than to memory.
Here's what you get now:
time vips im_rot180 wtc.jpg wtc90.png real 0m51.8s user 0m48.1s sys 0m1.1s peak RES 10mb
So higher systime, because of all the extra disc IO, but now there is very little memory use and there is no longer any filesize limit. You can use the
--vips-disc-threshold command-line flag and the
IM_DISC_THRESHOLD environment variable to turn the temp file feature on and off, see the docs for details.
26 May 2010
vips-7.22 is finally being prepared. A late addition has snuck in: we've been sent a translation to German by Chris Leick. We've fixed some problems in the i18n system and we now get:
$ LANG=de_DE.utf8 vips --help Aufruf: vips [OPTION …] - VIPS-Treiberprogramm Hilfeoptionen: -?, --help Hilfeoptionen anzeigen --help-all Alle Hilfeoptionen anzeigen --help-vips VIPS-Optionen anzeigen ......
19 April 2010
The new threading system is now everywhere in SVN trunk and the old one has been removed.
It really helps screen painting in nip2. The old repaint system could not scale beyond 4 processors and in practice never got more than about 3x faster. The new system should scale just as well as whole image calculation. If you have 4 or more processors (eg. a 2-core chip with hyperthreading, for example), you should see a good improvement.
22 March 2010
libvips has a new thread scheduler which should help scalability on very-many-way machines.
The current version uses a conventional threadpool system. When libvips generates an image it creates a pool of worker threads and a manager thread. The manager loops over the tiles in an image assigning tasks to workers as they become idle. As each section of tiles completes it sends that batch of pixels off to disc (actually, it's a bit more complicated than this: the manager is also sending evaluation progress messages, and the workers are sending tile-complete messages to an extra background write thread).
This system is simple and flexible, but if you consider the sequence of synchronisation operations that are performed to keep the threads in step, rather inefficient. For each tile we do something like this:
- the idle list is empty 99% of the time ... then a worker finishes a task
- the worker locks the idle list, adds itself, and unlocks idle
- the worker raises the 'idle' semaphore to wake up the manager thread
- the worker blocks on its 'go' semaphore
- the manager wakes up, locks the idle list, gets the worker, and unlocks idle again
- the manager assigns a task, then raises the thread's 'go' semaphore to set it working
- the manager sleeps again on the 'idle' semaphore
A semaphore operation involves a lock/unlock pair and either a wait or a signal on a condition variable, so in total the above list is 12 mutex operations and 4 condition variable operations (in fact the true picture is more complex than this). We have quite a complicated dance between workers and the manager.
Rather than having the manager pick tasks, what if workers did it themselves? Here's how the new system works (thanks to Christian Blenia for the idea):
- a worker finishes a task
- worker locks the assign-task mutex, runs a function to set new parameters, and unlocks
- the task can be 'generate a tile' or 'job done, you can quit', or 'there has been an error, abort' or anything really
- worker starts on the next thing
So that's two mutex operations per tile and no context switches, much simpler! (again, reality is more complex, workers actually send off two messages per tiles as well, one to update progress feedback, the other to trigger the background buffer write).
How great is the performance improvement? None at all in normal operation, sadly. On my two-core laptop I get:
$ time vips im_rot90 wtc.v wtc2.v real 0m5.020s user 0m2.040s sys 0m2.860s $ time vips --vips-wbuffer2 im_rot90 wtc.v wtc2.v real 0m4.978s user 0m1.920s sys 0m2.570s
(wtc.v is a 10,000 x 10,000 pixel RGB image, the --vips-wbuffer2 flag turns on the new system, the improvement in systime is just noise)
However if you switch to tiny tiles (the default is 64x64 pixels) and huge numbers of threads, you can see an improvement. I get these times:
$ time vips --vips-concurrency=1024 --vips-tile-width=16 --vips-tile-height=16 im_rot90 wtc.v wtc2.v real 0m10.155s user 0m7.940s sys 0m2.400s $ time vips --vips-wbuffer2 --vips-concurrency=1024 --vips-tile-width=16 --vips-tile-height=16 im_rot90 wtc.v wtc2.v real 0m6.067s user 0m1.450s sys 0m0.890s
So we've probably doubled the efficiency of the threading system, though unfortunately the threading stuff is not a bottleneck at the moment for most users.
On a 64-processor computer we did see a loss of linearity above 32 processors so perhaps this change will fix that. We've been offered some time on this monster machine in the next few months --- we'll be testing.
If you'd like to try the new code out, there's a tarball here:
or you can build from SVN trunk.
17 March 2010
I've just finished a series of changes to libvips and nip2 which should really help image repaints. The whole system had become a bit wobbly, but it's all overhauled and should now feel much faster, look a lot prettier and be more reliable.
The big changes are:
- nip2 always repaints images in sections following tile borders. If a tile is not yet ready, it defers that section of the paint action. This means it will never paint a black tile and then a moment later repaint with pixels.
- Invalidation is handled cleanly. If you paint on an image, downstream caches, including image tile caches, are all marked invalid. When vips later tries to reuse one of these cached areas, it knows to drop cache and recalculate. Invalidation is used no more frequently than necessary.
- The system for propagating changes through an image and its views has been rewritten and tuned. It should only need the minimum number of paint actions to update a view.
The new changes really help the nip2 paintbox. You can open a complex workspace, paint on one of the images, and it should keep up with your changes and never error or mispaint.
In another (small) improvement, libvips can now tell how many CPUs the host machine has and adjust concurrency for you automatically. You can override the detected setting with --vips-concurrency and IM_CONCURRENCY, as before.
4 February 2010
There's a branch in SVN for a vips that uses Orc:
Orc is a wrapper around sse/sse2/sse3/mmx/altivec/arm/ti-dsp etc. You write a bit of code in a pseudo-assembly language and at runtime it "compiles" it down to real instructions for the host CPU, using whatever capabilities it has. The idea is that many apps have to maintain multiple code paths for various annoying media instruction sets, so this thing abstracts that away and lets you write your code just once.
Here's VIPS doing im_add on a 10,000 x 10,000 pixel 8-bit RGB image:
$ time vips im_add wtc.v wtc.v wtc2.v real 0m10.699s user 0m1.890s sys 0m1.520s
Pretty quick, huh? That's with -O2 on a 2.7GHz Opteron 254. Now with ORC!
$ time vips --vips-orc im_add wtc.v wtc.v wtc2.v real 0m9.668s user 0m0.530s sys 0m1.410s
About a 3x to 4x speedup! You get more with a core2duo, it seems to have a better vector unit.
Orc is still rather experimental, so I don't want to spend too long rewriting operators yet. The next version should add addressing modes and then we'll be able to use it for things like im_conv().
The code for im_add() is here:
The ORC inner loop is:
357 p = add_programs[IM_BANDFMT_UCHAR]; 358 orc_program_append_ds_str( p, "convubw", "t1", "s1" ); 359 orc_program_append_ds_str( p, "convubw", "t2", "s2" ); 360 orc_program_append_str( p, "addusw", "d1", "t1", "t2" );
cast s1 up from unsigned byte to word in t1 cast s2 up from unsigned byte to word in t2 unsigned word add of add t1 and t2 to make d1
27 January 2010
The improved nohalo interpolators from last summer's Google Summer of Code have now landed in VIPS trunk:
We'll probably tidy up the three or four nohalo interpolators and just have one sensible one.
15 January 2010
Trunk has a new command-line program, vipsthumbnail:
This is a simple program to make image thumbnails. It's fast and needs very little memory. Run it like this:
$ time vipsthumbnail wtc.jpg real 0m0.452s user 0m0.410s sys 0m0.040s
That makes tn_wtc.jpg, the original 10,000 x 10,000 pixel RGB image sized down to fit inside 128 x 128. It needs about 5MB of memory. By contrast, ImageMagick is slower:
$ time convert -define jpeg:size=256x256 wtc.jpg -thumbnail 128x128 -unsharp 0x.5 tn_wtc.jpg real 0m3.772s user 0m3.230s sys 0m0.510s
And needs about 700m of memory.
- can thumbnail any image format supported by vips
- colour management
- three-stage resample: block average by integer factor to size above final dimensions, bilinear resample to final size, sharpen
- if the decompressed image is below a certain size, vipsthumbnail will decompress to memory before thumbnailing. Above this threshold, it decompresses to a temporary disc file and then shrinks from that. You can use this to limit the maximum memory that vips needs to thumbnail an image
- command-line options to control colour management, threading, file formats, thumbnail name, location and size, maximum memory use, and so on