Benchmarks
From VipsWiki
VIPS SMP benchmark
This benchmark is useful for testing the VIPS SMP system, for comparing host systems SMP implementations, and for testing for performance regressions or improvements between versions of VIPS. We have a separate set of benchmarks comparing VIPS to other image processing systems on the Speed and Memory Use page.
VIPS (from version 7.11.12) includes a benchmark for testing SMP systems. This benchmark is adapted from the system used to generate images for The National Gallery's Print on demand service. We have a couple of presentations about the background to POD available as well. The PARSEC benchmark suite includes this benchmark as one of their tests. There's a description of the benchmark, including some detail on application and performance, in the PARSEC technical report.
Images from a 10k by 10k studio digital camera are colour processed, resized, cropped and sharpened. You can see the exact sequence of operations the benchmark performs in the source code. This thing was originally processing images off a remote server over a 100 MBit network. No attempt was made to make it quick (there was no point); you could make im_benchmark a lot faster very easily if that was your aim.
There's a small shell script in vips-7.x/benchmark which runs im_benchmark on a test image with varying numbers of threads and reports the times. After building and installing VIPS you can type:
cd vips-7.x cd benchmark ./benchmarkn.sh
And see results for your system.
Results summary
| Processor | Clock (GHz) | CPUs | Time (secs real) | Speedup |
|---|---|---|---|---|
| dual E5649 6core | 2.5 | 12 (24 ht) | 0.80 | 10.8 x |
| Xeon X5560 | 2.8 | 8 (16 ht) | 1.08 | 13 x |
| Itanium2 | ? | 64 | 1.1 (est.) | 39.4 x |
| Xeon E5402 (64 bit) | 2.0 | 8 | 1.88 | 7.34 x |
| Opteron 8220 (64 bit) | 3.0 | 8 | 1.96 | 7.63 x |
| Dual quad-core intel (64 bit) | 3.0 | 8 | 2.8 | 7 x |
| Core 2 Extreme Quad (32 bit) | 2.66 | 4 | 3.69 | 3.78 x |
| Opteron 850 (HP server) | 2.4 | 4 | 4.25 | 3.7 x |
| Core 2 Duo (MacBook) | 2.26 | 2 | 5.81 | 1.85 x |
| Opteron 254 (HP workstation) | 2.7 | 2 | 6.77 | 1.9 x |
| P4 Xeon (64 bit) | 3.6 | 2 (4 ht) | 7 | 2.4 x |
| Core Duo (iMac) | 2.0 | 2 | 11.5 | 1.85 x |
| P4 Xeon (32 bit) | 3.0 | 2 (4 ht) | 19.7 | 1.6 x |
| PM (HP laptop) | 1.8 | 1 | 31.8 | -- |
| P4 (Dell desktop) | 2.4 | 1 | 36.6 | -- |
| EeePC atom/ssd | 1.6 | 1 (2 ht) | 41.5 | 1.6 x |
Time is lowest real time (wall clock time) in seconds, Speedup is (real-all-cpu-time / real-1-cpus-time).
Results in detail
The results we've collected. Please paste more here.
For each one we've noted uname -a, gcc --version and vips --version.
We configured VIPS with no extra optimisation options, ie. everything just has the default -O2.
2 x Xeon X5560 (64bit), 2.8GHz running Ubuntu 9.04 server
gcc (Ubuntu 4.3.3-5ubuntu4) 4.3.3 vips-7.18.1-Fri May 8 15:01:54 BST 2009 IM_CONCURRENCY=1 13.25user 0.36system 0:13.97elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k 368inputs+126936outputs (3major+34118minor)pagefaults 0swaps 13.15user 0.39system 0:13.98elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+34121minor)pagefaults 0swaps IM_CONCURRENCY=2 13.31user 0.31system 0:07.02elapsed 193%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126944outputs (0major+29047minor)pagefaults 0swaps 9.50user 0.19system 0:04.99elapsed 194%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+38221minor)pagefaults 0swaps IM_CONCURRENCY=3 10.38user 0.29system 0:03.69elapsed 288%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+29355minor)pagefaults 0swaps 8.76user 0.22system 0:03.18elapsed 282%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+29056minor)pagefaults 0swaps IM_CONCURRENCY=4 9.40user 0.32system 0:02.48elapsed 390%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+28875minor)pagefaults 0swaps 13.51user 0.21system 0:03.55elapsed 385%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+28957minor)pagefaults 0swaps IM_CONCURRENCY=5 8.68user 0.19system 0:01.86elapsed 475%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+28349minor)pagefaults 0swaps 7.71user 0.20system 0:01.74elapsed 454%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+36092minor)pagefaults 0swaps 120.151 IM_CONCURRENCY=6 9.10user 0.13system 0:01.75elapsed 526%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+33039minor)pagefaults 0swaps 9.81user 0.16system 0:01.79elapsed 556%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+31385minor)pagefaults 0swaps IM_CONCURRENCY=7 10.69user 0.26system 0:01.72elapsed 636%CPU (0avgtext+0avgdata 0maxresident)k 8inputs+126936outputs (0major+33137minor)pagefaults 0swaps 8.64user 0.24system 0:01.38elapsed 643%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+34416minor)pagefaults 0swaps IM_CONCURRENCY=8 10.25user 0.26system 0:01.56elapsed 671%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+30208minor)pagefaults 0swaps 9.55user 0.30system 0:01.39elapsed 707%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126936outputs (0major+32512minor)pagefaults 0swaps IM_CONCURRENCY=9 8.74user 0.26system 0:01.08elapsed 831%CPU (0avgtext+0avgdata 0maxresident)k 8inputs+126936outputs (0major+34851minor)pagefaults 0swaps 8.55user 0.17system 0:01.14elapsed 758%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+126944outputs (0major+32756minor)pagefaults 0swaps
2 * quad core Xeon E5405 2.0GHz
cc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) vips-7.18.1-Tue May 12 14:18:37 BST 2009 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 0m13.802s user 0m13.664s sys 0m0.287s real 0m13.882s user 0m13.714s sys 0m0.303s vips im_avg temp2.v 120.151 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 0m6.972s user 0m13.732s sys 0m0.273s real 0m6.995s user 0m13.722s sys 0m0.316s vips im_avg temp2.v 120.151 IM_CONCURRENCY=3 time -p vips im_benchmarkn temp.v temp2.v 1 real 0m4.720s user 0m13.767s sys 0m0.284s real 0m4.685s user 0m13.725s sys 0m0.331s vips im_avg temp2.v 120.151 IM_CONCURRENCY=4 time -p vips im_benchmarkn temp.v temp2.v 1 real 0m3.534s user 0m13.776s sys 0m0.267s real 0m3.564s user 0m13.862s sys 0m0.307s vips im_avg temp2.v 120.151 IM_CONCURRENCY=5 time -p vips im_benchmarkn temp.v temp2.v 1 real 0m2.903s user 0m13.886s sys 0m0.350s real 0m2.842s user 0m13.753s sys 0m0.288s vips im_avg temp2.v 120.151 IM_CONCURRENCY=6 time -p vips im_benchmarkn temp.v temp2.v 1 real 0m2.403s user 0m13.829s sys 0m0.312s real 0m2.391s user 0m13.808s sys 0m0.262s vips im_avg temp2.v 120.151 IM_CONCURRENCY=7 time -p vips im_benchmarkn temp.v temp2.v 1 real 0m2.090s user 0m13.860s sys 0m0.322s real 0m2.104s user 0m13.878s sys 0m0.331s vips im_avg temp2.v 120.151 IM_CONCURRENCY=8 time -p vips im_benchmarkn temp.v temp2.v 1 real 0m1.880s user 0m13.867s sys 0m0.344s real 0m1.880s user 0m13.833s sys 0m0.303s vips im_avg temp2.v 120.151
2 x Opteron 254 (64 bit), 2.7 GHz
Linux mm-jcupitt2 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 vips-7.26.1-Tue Aug 9 13:39:41 BST 2011 building test image ... tile=13 test image is 3770 by 5746 pixels max cpus = 2 starting benchmark ... /usr/bin/time -f %e vips --vips-concurrency=xx --vips-tile-width=64 --vips-tile-height=64 im_benchmarkn temp.v temp2.v 1 reported real-time is best of three runs cpus real-time 1 12.96 2 6.77
Pentium M (32 bit), 1.8 GHz
Linux banana 2.6.17-11-386 #2 Thu Feb 1 19:50:13 UTC 2007 i686 GNU/Linux gcc (GCC) 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5) vips-7.11.20-Tue Feb 13 13:47:53 GMT 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 31.83 user 31.41 sys 0.41 real 31.91 user 31.52 sys 0.37 vips im_avg temp2.v 120.134
Core Duo (32 bit), 2 GHz
Darwin pineapple.Belkin 9.4.0 Darwin Kernel Version 9.4.0: Mon Jun 9 19:30:53 PDT 2008; root:xnu1228.5.20~1/RELEASE_I386 i386 i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5465) vips-7.16.0-Thu Sep 4 11:43:34 BST 2008 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 21.35 user 20.09 sys 1.34 real 21.37 user 20.09 sys 1.35 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 11.62 user 20.76 sys 1.80 real 11.67 user 20.76 sys 1.86 vips im_avg temp2.v 120.134
4 x Opteron 850 (64 bit), 2.4 GHz
Linux roundtable 2.6.15-27-amd64-generic #1 SMP PREEMPT Fri Dec 8 17:50:54 UTC 2006 x86_64 GNU/Linux gcc (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) vips-7.11.20-Mon Feb 12 18:05:51 GMT 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 16.19 user 15.48 sys 0.59 real 15.81 user 15.36 sys 0.52 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 8.19 user 15.77 sys 0.47 real 8.33 user 15.95 sys 0.49 vips im_avg temp2.v 120.134 IM_CONCURRENCY=3 time -p vips im_benchmarkn temp.v temp2.v 1 real 6.18 user 15.82 sys 0.46 real 6.04 user 15.95 sys 0.53 vips im_avg temp2.v 120.134 IM_CONCURRENCY=4 time -p vips im_benchmarkn temp.v temp2.v 1 real 4.35 user 16.11 sys 0.55 real 4.25 user 15.86 sys 0.56 vips im_avg temp2.v 120.134
2 x Xeon (32 bit), 3 GHz
2.6.9-42.0.3.ELsmp gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3) vips-7.11.12-Fri Oct 6 13:15:22 BST 2006 IM_CONCURRENCY=1 time vips im_benchmark temp.v temp2.v real 0m35.270s user 0m34.366s sys 0m0.934s IM_CONCURRENCY=2 time vips im_benchmark temp.v temp2.v real 0m21.914s user 0m41.269s sys 0m1.681s IM_CONCURRENCY=3 time vips im_benchmark temp.v temp2.v real 0m20.598s user 0m57.306s sys 0m2.765s IM_CONCURRENCY=4 time vips im_benchmark temp.v temp2.v real 0m19.781s user 1m11.393s sys 0m4.246s
2 x Xeon (64 bit), 3.6 GHz
Linux turner 2.6.17-10-generic #2 SMP Tue Dec 5 21:16:35 UTC 2006 x86_64 GNU/Linux gcc (GCC) 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5) vips-7.11.18-Mon Dec 18 18:19:27 GMT 2006 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 17.60 user 16.58 sys 0.65 real 17.12 user 16.63 sys 0.59 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 9.01 user 17.18 sys 0.78 real 8.99 user 17.12 sys 0.76 vips im_avg temp2.v 120.134 IM_CONCURRENCY=3 time -p vips im_benchmarkn temp.v temp2.v 1 real 7.78 user 22.02 sys 0.83 real 7.79 user 21.99 sys 1.00 vips im_avg temp2.v 120.134 IM_CONCURRENCY=4 time -p vips im_benchmarkn temp.v temp2.v 1 real 7.03 user 25.74 sys 1.16 real 7.02 user 25.60 sys 1.25 vips im_avg temp2.v 120.134
1 x P4, 2.4 GHz
MINGW32_NT-5.1 MM-DDAVIES1 1.0.10(0.46/3/2) 2004-03-15 07:17 i686 unknown gcc.exe (GCC) 3.4.2 (mingw-special) vips-7.11.17-Wed Nov 29 12:01:14 GMTST 2006 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 36.59 user 0.01 sys 0.01 real 36.68 user 0.01 sys 0.01 vips im_avg temp2.v 120.072
Intel Core 2 Extreme Quad Core (QX6700), 2.66 GHz
A quick benchmark (11x11 unsharp mark of a 10kx10k image) shows:
1 Thread 166s 2 threads 82s 3 threads 55s 4 threads 42s
ie a linear speed-up
Linux degas.ecs.soton.ac.uk 2.6.19-1.2911.fc6 #1 SMP Sat Feb 10 15:51:47 EST 2007 i686 i686 i386 GNU/Linux gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51) vips-7.11.20-Fri Mar 2 12:47:29 GMT 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 15.73 user 14.70 sys 0.30 real 13.96 user 13.86 sys 0.27 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 7.15 user 14.02 sys 0.23 real 7.12 user 13.96 sys 0.29 vips im_avg temp2.v 120.134 IM_CONCURRENCY=3 time -p vips im_benchmarkn temp.v temp2.v 1 real 4.77 user 13.98 sys 0.26 real 4.78 user 13.97 sys 0.25 vips im_avg temp2.v 120.134 IM_CONCURRENCY=4 time -p vips im_benchmarkn temp.v temp2.v 1 real 4.28 user 13.65 sys 0.27 real 3.69 user 14.06 sys 0.28 vips im_avg temp2.v 120.134
8 x Opteron 8220 (64 bit), 3.0 GHz
Linux raphael 2.6.22-10-generic #1 SMP Wed Aug 22 07:42:05 GMT 2007 x86_64 GNU/Linux gcc (GCC) 4.1.3 20070825 (prerelease) (Ubuntu 4.1.2-15ubuntu3) vips-7.12.4-Fri Aug 31 12:02:06 BST 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 15.04 user 14.64 sys 0.62 real 15.22 user 14.83 sys 0.72 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 7.44 user 14.29 sys 0.61 real 7.01 user 13.36 sys 0.44 vips im_avg temp2.v 120.134 IM_CONCURRENCY=3 time -p vips im_benchmarkn temp.v temp2.v 1 real 4.58 user 13.29 sys 0.44 real 4.92 user 14.22 sys 0.44 vips im_avg temp2.v 120.134 IM_CONCURRENCY=4 time -p vips im_benchmarkn temp.v temp2.v 1 real 3.65 user 13.90 sys 0.60 real 3.93 user 14.59 sys 0.51 vips im_avg temp2.v 120.134 IM_CONCURRENCY=5 time -p vips im_benchmarkn temp.v temp2.v 1 real 2.98 user 14.25 sys 0.40 real 2.79 user 13.28 sys 0.38 vips im_avg temp2.v 120.134 IM_CONCURRENCY=6 time -p vips im_benchmarkn temp.v temp2.v 1 real 2.45 user 13.95 sys 0.42 real 2.32 user 13.07 sys 0.46 vips im_avg temp2.v 120.134 IM_CONCURRENCY=7 time -p vips im_benchmarkn temp.v temp2.v 1 real 12.57 user 13.43 sys 0.50 real 3.06 user 17.55 sys 0.58 vips im_avg temp2.v 120.134 IM_CONCURRENCY=8 time -p vips im_benchmarkn temp.v temp2.v 1 real 1.97 user 14.00 sys 0.44 real 2.16 user 15.31 sys 0.58 vips im_avg temp2.v 120.134
Asus Eee PC 1000 atom n270 1.6GHz SSD
Linux km-bigee 2.6.27-7-eeepc #1 SMP Fri Oct 31 11:36:36 MDT 2008 i686 GNU/Linux IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 69.22 user 67.38 sys 1.03 real 70.64 user 67.76 sys 1.04 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 42.09 user 76.66 sys 1.12 real 41.52 user 76.45 sys 1.10
Dual quad-core Intel E5320 (64-bit), 3 GHz
Intel Xeon E5320 x 2 so 8 cores.
vips-7.11.20-Wed Nov 28 11:39:32 GMT 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 19.89 user 19.43 sys 0.33 real 20.41 user 19.45 sys 0.35 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 10.01 user 19.61 sys 0.43 real 10.39 user 19.58 sys 0.41 vips im_avg temp2.v 120.134 IM_CONCURRENCY=3 time -p vips im_benchmarkn temp.v temp2.v 1 real 6.81 user 19.78 sys 0.38 real 6.78 user 19.80 sys 0.37 vips im_avg temp2.v 120.134 IM_CONCURRENCY=4 time -p vips im_benchmarkn temp.v temp2.v 1 real 5.17 user 19.87 sys 0.41 real 6.82 user 19.62 sys 0.37 vips im_avg temp2.v 120.134 IM_CONCURRENCY=5 time -p vips im_benchmarkn temp.v temp2.v 1 real 4.31 user 19.96 sys 0.46 real 5.16 user 19.79 sys 0.39 vips im_avg temp2.v 120.134 IM_CONCURRENCY=6 time -p vips im_benchmarkn temp.v temp2.v 1 real 3.48 user 19.83 sys 0.43 real 7.03 user 19.94 sys 0.40 vips im_avg temp2.v 120.134 IM_CONCURRENCY=7 time -p vips im_benchmarkn temp.v temp2.v 1 real 3.76 user 19.98 sys 0.44 real 3.68 user 19.86 sys 0.41 vips im_avg temp2.v 120.134 IM_CONCURRENCY=8 time -p vips im_benchmarkn temp.v temp2.v 1 real 2.84 user 20.06 sys 0.43 real 4.76 user 20.07 sys 0.48 vips im_avg temp2.v 120.134 IM_CONCURRENCY=9 time -p vips im_benchmarkn temp.v temp2.v 1 real 4.80 user 19.97 sys 0.47 real 2.79 user 20.04 sys 0.46 vips im_avg temp2.v 120.134
Intel Core2Duo P7550 (64 bit), 2.26 GHz
This is an Apple Macbook 6,1 running Ubuntu 11.04.
./benchmarkn.sh Linux banana 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux gcc (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 vips-7.26.2-Wed Aug 10 10:02:22 BST 2011 building test image ... tile=13 test image is 3770 by 5746 pixels max cpus = 2 starting benchmark ... /usr/bin/time -f %e vips --vips-concurrency=xx --vips-tile-width=64 --vips-tile-height=64 im_benchmarkn temp.v temp2.v 1 reported real-time is best of three runs cpus real-time 1 10.95 2 5.81
SGI Origin2000 supercomputer
VIPS 7.11.20 has also been run on a 64-CPU supercomputer (an SGI Origin2000) at Princeton. The results are:
| CPUs | Run time (s) | Speed up |
|---|---|---|
| 1 | 651.85 | 1 |
| 2 | 335.9 | 1.94 |
| 4 | 170.07 | 3.83 |
| 8 | 86.06 | 7.57 |
| 16 | 44.56 | 14.63 |
| 32 | 24.06 | 27.09 |
| 64 | 16.54 | 39.41 |
So about a 40 x speedup for 64 CPUs.
If you graph these numbers you get:
So it's pretty much linear up to about 30 CPUs (with a 27x speedup). The image being processed is 1.3GB so perhaps we are starting to see IO bandwidth limits.
12 core Dell
A 2011 dual 6 core server is fairly linear up to 12 threads, then improves more slowly for the next 12 hyperthreaded cores.

