Benchmarks
From VipsWiki
| Revision as of 16:35, 2 March 2007 Kirk (Talk | contribs) (→Results summary) ← Previous diff |
Revision as of 16:36, 2 March 2007 Kirk (Talk | contribs) (→Results summary) Next diff → |
||
| Line 25: | Line 25: | ||
| | Itanium (supercomputer) || - || 64 || - || 31 x | | Itanium (supercomputer) || - || 64 || - || 31 x | ||
| |- | |- | ||
| - | | Intel Quad Core) || 2.66 || 4 || 3.69 || 3.78 | + | | Intel Quad Core (32 bit) || 2.66 || 4 || 3.69 || 3.78 |
| |- | |- | ||
| | Opteron 850 (HP server) || 2.4 || 4 || 4.25 || 3.7 x | | Opteron 850 (HP server) || 2.4 || 4 || 4.25 || 3.7 x | ||
Revision as of 16:36, 2 March 2007
Contents |
VIPS SMP benchmark
VIPS (from version 7.11.12) includes a benchmark for testing SMP systems. This benchmark is adapted from the system used to generate images for The National Gallery's Print on demand service. We have a couple of presentations about the background to POD available as well.
Images from a 10k by 10k studio digital camera are colour processed, resized, cropped and sharpened. You can see the exact sequence of operations the benchmark performs in the source code.
This thing was originally processing images off a remote server over a 100 MBit network. No attempt was made to make it quick (there was no point); you could make it a lot faster very easily if that was your aim. As it is, it is useful for testing the VIPS SMP system, for comparing host systems SMP implementations, and for testing for performance regressions between versions of VIPS.
There's a small shell script in vips-7.x/benchmark which runs im_benchmark on a test image with varying numbers of CPUs and reports the times. After building and installing VIPS you can type:
cd vips-7.x cd benchmark ./benchmarkn.sh
And see results for your system.
If you have a pre-compiled VIPS, you can get the benchmark script and sample image from the development download area. This version of the benchmark script needs VIPS 7.11.17 or later.
Results summary
| Processor | Clock (GHz) | CPUs | Time (s) | Speedup |
|---|---|---|---|---|
| Itanium (supercomputer) | - | 64 | - | 31 x |
| Intel Quad Core (32 bit) | 2.66 | 4 | 3.69 | 3.78 |
| Opteron 850 (HP server) | 2.4 | 4 | 4.25 | 3.7 x |
| Opteron 254 (HP workstation) | 2.7 | 2 | 6.6 | 1.9 x |
| P4 Xeon (64 bit) | 3.6 | 2 (4 ht) | 7 | 2.4 x |
| Core Duo (iMac) | 2.0 | 2 | 11.5 | 1.85 x |
| P4 Xeon (32 bit) | 3.0 | 2 (4 ht) | 19.7 | 1.6 x |
| PM (HP laptop) | 1.8 | 1 | 31.8 | -- |
| P4 (Dell desktop) | 2.4 | 1 | 36.6 | -- |
Time is real time (wall clock time) in seconds, Speedup is (real-1-cpu-time / real-max-cpu-time). The supercomputer is running a slightly different version of the benchmark and so the times can't be compared.
Results in detail
The results we've collected. Please paste more here.
For each one we've noted uname -a, gcc --version and vips --version.
We configured VIPS with no extra optimisation options, ie. everything just has the default -O2.
2 x Opteron 254 (64 bit), 2.7 GHz
Linux mm-jcupitt2 2.6.15-28-amd64-k8 #1 SMP PREEMPT Thu Feb 1 16:12:58 UTC 2007 x86_64 GNU/Linux gcc (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) vips-7.11.20-Mon Feb 12 18:12:55 GMT 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 12.81 user 12.32 sys 0.48 real 12.97 user 12.52 sys 0.44 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 10.47 user 11.72 sys 0.33 real 6.64 user 12.76 sys 0.33 vips im_avg temp2.v 120.134
Pentium M (32 bit), 1.8 GHz
Linux banana 2.6.17-11-386 #2 Thu Feb 1 19:50:13 UTC 2007 i686 GNU/Linux gcc (GCC) 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5) vips-7.11.20-Tue Feb 13 13:47:53 GMT 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 31.83 user 31.41 sys 0.41 real 31.91 user 31.52 sys 0.37 vips im_avg temp2.v 120.134
Core Duo (32 bit), 2 GHz
Darwin pineapple.local 8.8.1 Darwin Kernel Version 8.8.1: Mon Sep 25 19:42:00 PDT 2006; root:xnu-792.13.8.obj~1/RELEASE_I386 i386 i386 i686-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5250) vips-7.11.20-Sun Feb 11 14:29:22 GMT 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 21.33 user 20.07 sys 1.43 real 21.39 user 20.10 sys 1.45 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 11.74 user 20.62 sys 2.15 real 11.49 user 20.59 sys 2.12 vips im_avg temp2.v 120.134
4 x Opteron 850 (64 bit), 2.4 GHz
Linux roundtable 2.6.15-27-amd64-generic #1 SMP PREEMPT Fri Dec 8 17:50:54 UTC 2006 x86_64 GNU/Linux gcc (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) vips-7.11.20-Mon Feb 12 18:05:51 GMT 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 16.19 user 15.48 sys 0.59 real 15.81 user 15.36 sys 0.52 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 8.19 user 15.77 sys 0.47 real 8.33 user 15.95 sys 0.49 vips im_avg temp2.v 120.134 IM_CONCURRENCY=3 time -p vips im_benchmarkn temp.v temp2.v 1 real 6.18 user 15.82 sys 0.46 real 6.04 user 15.95 sys 0.53 vips im_avg temp2.v 120.134 IM_CONCURRENCY=4 time -p vips im_benchmarkn temp.v temp2.v 1 real 4.35 user 16.11 sys 0.55 real 4.25 user 15.86 sys 0.56 vips im_avg temp2.v 120.134
2 x Xeon (32 bit), 3 GHz
2.6.9-42.0.3.ELsmp gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3) vips-7.11.12-Fri Oct 6 13:15:22 BST 2006 IM_CONCURRENCY=1 time vips im_benchmark temp.v temp2.v real 0m35.270s user 0m34.366s sys 0m0.934s IM_CONCURRENCY=2 time vips im_benchmark temp.v temp2.v real 0m21.914s user 0m41.269s sys 0m1.681s IM_CONCURRENCY=3 time vips im_benchmark temp.v temp2.v real 0m20.598s user 0m57.306s sys 0m2.765s IM_CONCURRENCY=4 time vips im_benchmark temp.v temp2.v real 0m19.781s user 1m11.393s sys 0m4.246s
2 x Xeon (64 bit), 3.6 GHz
Linux turner 2.6.17-10-generic #2 SMP Tue Dec 5 21:16:35 UTC 2006 x86_64 GNU/Linux gcc (GCC) 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu5) vips-7.11.18-Mon Dec 18 18:19:27 GMT 2006 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 17.60 user 16.58 sys 0.65 real 17.12 user 16.63 sys 0.59 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 9.01 user 17.18 sys 0.78 real 8.99 user 17.12 sys 0.76 vips im_avg temp2.v 120.134 IM_CONCURRENCY=3 time -p vips im_benchmarkn temp.v temp2.v 1 real 7.78 user 22.02 sys 0.83 real 7.79 user 21.99 sys 1.00 vips im_avg temp2.v 120.134 IM_CONCURRENCY=4 time -p vips im_benchmarkn temp.v temp2.v 1 real 7.03 user 25.74 sys 1.16 real 7.02 user 25.60 sys 1.25 vips im_avg temp2.v 120.134
1 x P4, 2.4 GHz
MINGW32_NT-5.1 MM-DDAVIES1 1.0.10(0.46/3/2) 2004-03-15 07:17 i686 unknown gcc.exe (GCC) 3.4.2 (mingw-special) vips-7.11.17-Wed Nov 29 12:01:14 GMTST 2006 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 36.59 user 0.01 sys 0.01 real 36.68 user 0.01 sys 0.01 vips im_avg temp2.v 120.072
Intel Quad-core 2.66GHz PC
A quick benchmark (11x11 unsharp mark os a 10kx10k image) shows:
1 Thread 166s 2 threads 82s 3 threads 55s 4 threads 42s
ie a linear speed-up
Linux degas.ecs.soton.ac.uk 2.6.19-1.2911.fc6 #1 SMP Sat Feb 10 15:51:47 EST 2007 i686 i686 i386 GNU/Linux gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51) vips-7.11.20-Fri Mar 2 12:47:29 GMT 2007 building test image ... tile=13 test image is 3770 by 5746 pixels starting benchmark ... chain=1 IM_CONCURRENCY=1 time -p vips im_benchmarkn temp.v temp2.v 1 real 15.73 user 14.70 sys 0.30 real 13.96 user 13.86 sys 0.27 vips im_avg temp2.v 120.134 IM_CONCURRENCY=2 time -p vips im_benchmarkn temp.v temp2.v 1 real 7.15 user 14.02 sys 0.23 real 7.12 user 13.96 sys 0.29 vips im_avg temp2.v 120.134 IM_CONCURRENCY=3 time -p vips im_benchmarkn temp.v temp2.v 1 real 4.77 user 13.98 sys 0.26 real 4.78 user 13.97 sys 0.25 vips im_avg temp2.v 120.134 IM_CONCURRENCY=4 time -p vips im_benchmarkn temp.v temp2.v 1 real 4.28 user 13.65 sys 0.27 real 3.69 user 14.06 sys 0.28 vips im_avg temp2.v 120.134
SGI Origin2000 supercomputer
VIPS 7.11.18 has also been run on a 64-CPU supercomputer (an SGI Origin2000) at Princeton. The results are:
| CPUs | Run time (s) |
|---|---|
| 1 | 4065.41 |
| 2 | 2000.88 |
| 4 | 1126.52 |
| 8 | 589.35 |
| 16 | 311.39 |
| 32 | 179.54 |
| 64 | 131.09 |
So about a 31 x speedup for 64 CPUs.
If you graph these numbers you get:
So we'll probably max out at about 128 CPUs and a 50x speedup, on this benchmark at least.
