Speed and Memory Use

From VipsWiki
Jump to: navigation, search

We've written programs using number of different image processing system to load a TIFF image, crop 100 pixels off every edge, shrink by 10% with bilinear interpolation, sharpen with a 3x3 convolution and save again. It's a trivial test but it does give some idea of the speed and memory behaviour of these libraries (and it's also quite fun to compare the code).

See also our main Benchmarks page for a more complex benchmark and timings on a variety of machines.

Results

E5-1650 @ 3.20GHz (HP workstation), Ubuntu 15.04

Software Run time (secs real) Memory (peak RSS MB) Times slower
VIPS C/C++ 8.1 0.20 43 1.0
Python VIPS 8.1 0.30 52 1.5
VIPS command-line 8.1 0.55 40 2.4
VIPS C/C++ 8.1, JPEG images 0.38 59 2.7
ymagine 0.7.0 1.07 2.7 2.8 (compared to vips-c JPEG)
GraphicsMagick 1.3.20 0.67 492 3.4
sips 10.4.4 0.74 (est.) 268 3.7
ImageMagick 6.8.9-9 0.78 484 3.9
VIPS nip2 8.1 0.79 78 4.0
RMagick 2.15.2 (ImageMagick 6.8.9) 0.87 684 4.4
NetPBM 10.0 0.93 76 4.7
Pillow 2.7.0 0.93 207 4.7
OpenCV 2.4.9 1.13 206 5.7
libgd 2.1.1 2.34 186 6.1 (compared to vips-c JPEG)
ExactImage 0.8.9 1.54 130 7.7
FreeImage 3.15.4 (incomplete) 1.63 183 8.1
gmic 1.5.7.1 1.87 700 9.35
ImageScience 1.2.6 (based on FreeImage 3.15.4, incomplete) 1.9 267 9.5
OpenImageIO 1.3.12 2.79 811 14
GEGL 0.2 16.2 410 43 (compared to vips-c JPEG)
Octave 3.8 30 (est.) 8500 (est.) 200

Notes

The benchmarks plus a simple driver program are in a github repository. See the README for details.

All timings are for a 5,000 by 5,000 pixel 8-bit RGB image in uncompressed tiled TIFF format, 128 by 128 pixel tiles. Each test was run with something like:

time ./vips.sh tmp/x.tif tmp/x2.ti

On a quiet system with the quickest real time of three runs recorded. There's no attempt to clear the disc cache, so disc speed should not be a factor. The peak memory column was found by sampling RES with "ps" using this script. I used the systems as packaged for Ubuntu unless otherwise indicated. I last ran these tests on 9 July 2015 and used the current stable version of every package.

The benchmark hardware has six hyperthreaded cores, so systems like ymagine, OpenCV and ImageScience, which do not thread automatically, are lower in the table than they should be. On a single-core machine the table would look quite different. When OpenCV3 finally hits Ubuntu it should make a big difference.

This test does a lot of file IO and relatively little processing, which flatters libvips.

VIPS runs a copy of the image processing pipeline for each thread, so for 12 threads you get 12 copies of each pixel buffer. If you turn off threading you will see a large drop in memory use.

Some systems, like ImageScience and nip2, have relatively long start-up times and this hurts their position in the table.

The VIPS command-line version generates a huge amount of disc traffic which makes it unsuitable for certain applications. This is not really considered in this table.

Pillow uses a high-quality resizing technique based on adaptive convolution. This hurts its speed on this test as the other systems here are using a simple affine transform plus a bilinear interpolator. I've set it to use NEAREST for this benchmark, which is probably closest to what everyone else is doing.

The OpenImageIO test uses oiiotool, which may not be the best way to test the library.

ExactImage will not read tiled tiff, so the benchmark uses a strip tiff for this test.

libgd, ymagine and GEGL will not read tiff, so I used jpeg. Their "times slower" column is against vips with a jpeg source. A lot of time is therefore being spent in libjpeg, which is slightly unfair to libvips.

GEGL is not really designed for batch-style processing -- it targets interactive applications, like paint programs.

Octave aims to be a very high-level prototyping language and is not primarily targeting speed. I timed a 2,000 by 2,000 pixel monochrome JPEG and extrapolated from that.

FreeImage does not have a sharpening or convolution operation so I skipped that part of the benchmark.

ImageScience is based on FreeImage and therefore does not support sharpening, so I've skipped that part of the test. The resize() method is always bicubic which is a little unfair as the other benchmarks here use bilinear.

sips was run on a different OS X machine. On that machine, vips-c took 0.28s and sips 1.03s, so I scaled the sips time up by the ratio of 0.28 / 0.20 (the time vips-c took on the benchmark machine). Not very realistic, unfortunately. sips only supports crop and resize, so I didn't time sharpen. The sips resize algorithm is unknown and is probably much fancier than the simple bilinear interpolation used in the other tests.

Why is VIPS quick

We have a How it works page which goes into some detail, but briefly:

Threaded image input-output (IO) system
Most image processing libraries have threaded operations. Each operation has code, generally using a framework like OpenMP, to run the operation over all the available processors. VIPS instead puts the threading into the image IO system and gives each thread a separate copy of the whole image pipeline to work on. This style of horizontal threading makes much better use of processor caches.
Overlapped input and output
VIPS is able to run the load, the process and the save parts of the program in parallel, even though the tiff library is single-threaded. It uses a set of threads for input and processing (which queue up on the load library), plus an extra background write-behind thread which runs whenever a line of tiles is completed.
VIPS is (almost) tile-less
Most image processing systems split images into tiles for processing: non-overlapping areas of pixels which can be cached and reused. Ensuring tiles do not overlap forces threads to continually synchronise, plus there needs to be special treatment of tile edges. VIPS instead uses regions: rectangular areas of images which can overlap. This removes a lot of housekeeping. It has a set of rules to try to keep overlap (and therefore recomputation) to a minimum.
VIPS is (almost) lock-less
Threads need to talk to each other to coordinate their work. On systems with a large number of processors this can become very expensive. VIPS has only one mutex on file read and one on file write --- the whole of the rest of the system does not need any locking or synchronisation.
Fast operations
The VIPS primitives are implemented carefully and some use techniques like run-time code generation. The convolution operator, for example, will examine the matrix and the image and at run-time write a short SSE3 program to implement exactly that convolution on exactly that image.
Demand-driven
VIPS is fully demand-driven: it only needs to keep a few pixel buffers in memory, it doesn't need to load the whole image. This reduces memory use.
Variety of image pixel formats
VIPS supports 10 pixel formats, from 8-bit unsigned to 128-bit complex, and almost all operations can work on any format. This means that it can process the 8-bit data in this test directly with no need to repack to another format for computation.

Graphically

Svt2.png

This graph was made by running "ps" very quickly and piping the output to a simple script that calculated total RSS for all processes associated with a task.

Memtrace.png

This is a fancier one generated by vipsprofile showing the memory behaviour of vips on this task. The bottom graph shows total memory, the upper traces show threads calculating useful results (green), threads blocked on synchronisation (red) and memory allocations (white ticks). There's a blog post with some more detail on how this was made.

Implementations

VIPS8 Python

#!/usr/bin/python

import sys

from gi.repository import Vips

im = Vips.Image.new_from_file(sys.argv[1])

im = im.crop(100, 100, im.width - 200, im.height - 200)
im = im.similarity(scale = 0.9)
mask = Vips.Image.new_from_array([[-1, -1,  -1],
                                  [-1,  16, -1],
                                  [-1, -1,  -1]], scale = 8)
im = im.conv(mask)

im.write_to_file(sys.argv[2])

ruby-vips

#!/usr/bin/ruby

require 'rubygems'
require 'vips'
include VIPS

im = Image.new(ARGV[0])

im = im.extract_area(100, 100, im.x_size - 200, im.y_size - 200)
im = im.affinei(:bilinear, 0.9, 0, 0, 0.9, 0, 0)
mask = [
    [-1, -1,  -1],
    [-1,  16, -1],
    [-1, -1,  -1]
]
m = Mask.new mask, 8, 0 
im = im.conv(m)

im.write(ARGV[1])

VIPS nip2

#!/home/john/vips/bin/nip2 -s

main
  = error "usage: infile -o outfile", argc != 2
  = (sharpen @ shrink @ crop) (Image_file argv?1)
{
  crop x = extract_area 100 100 (x.width - 200) (x.height - 200) x;
  shrink = resize Interpolate_bilinear 0.9 0.9;
  sharpen = conv (Matrix_con 8 0 [[-1, -1, -1], [-1, 16, -1], [-1, -1, -1]]);
}

VIPS command-line

#!/bin/bash

width=$(vipsheader -f Xsize $1)
height=$(vipsheader -f Ysize $1)

width=$((width - 200))
height=$((height - 200))

vips crop $1 t1.v 100 100 $width $height
vips similarity t1.v t2.v --scale 0.9 --interpolate bilinear

cat > mask.con <<EOF
3 3 8 0
-1 -1 -1
-1 16 -1
-1 -1 -1
EOF
vips conv t2.v $2 mask.con

rm t1.v t2.v

VIPS C++

#include <vips/vips8>

using namespace vips;

int
main( int argc, char **argv )
{
        if( VIPS_INIT( argv[0] ) )
                return( -1 );

        VImage in = VImage::new_from_file( argv[1] );

        VImage mask = VImage::new_matrixv( 3, 3,
                -1, -1, -1, -1, 16, -1, -1, -1, -1 );
        mask.set( "scale", 8 );

        in.
                crop( 100, 100, in.width() - 200, in.height() - 200 ).
                similarity( VImage::option()->set( "scale", 0.9 ) ).
                conv( mask ).
                write_to_file( argv[2] );

        return( 0 );
}

VIPS C

// compile with
// gcc -Wall vips.c `pkg-config vips --cflags --libs` -o vips-c

#include <vips/vips.h>

int 
main( int argc, char **argv )
{
        VipsImage *global;
        VipsImage **t;

        if( VIPS_INIT( argv[0] ) )
                return( -1 );

        global = vips_image_new();
        t = (VipsImage **) vips_object_local_array( VIPS_OBJECT( global ), 5 );

        if( !(t[0] = vips_image_new_from_file( argv[1],
                "access", VIPS_ACCESS_SEQUENTIAL,
                NULL )) )
                vips_error_exit( NULL );

        t[1] = vips_image_new_matrixv( 3, 3, 
                -1.0, -1.0, -1.0, 
                -1.0, 16.0, -1.0,
                -1.0, -1.0, -1.0 );
        vips_image_set_double( t[1], "scale", 8 );

        if( vips_extract_area( t[0], &t[2], 
                100, 100, t[0]->Xsize - 200, t[0]->Ysize - 200, NULL ) ||
                vips_similarity( t[2], &t[3], "scale", 0.9, NULL ) ||
                vips_conv( t[3], &t[4], t[1], NULL ) ||
                vips_image_write_to_file( t[4], argv[2], NULL ) )
                vips_error_exit( NULL ); 

        g_object_unref( global );

        return( 0 );
}

PIL (and Pillow)

#!/usr/bin/python

import sys
from PIL import Image, ImageFilter, PILLOW_VERSION

# just to confirm we are getting the right version 
# print("pillow.py: PILLOW_VERSION =", PILLOW_VERSION)

im = Image.open(sys.argv[1])

# Crop 100 pixels off all edges.
im = im.crop((100, 100, im.size[0] - 100, im.size[1] - 100))

# Shrink by 10%

# starting with 2.7, Pillow uses a high-quality convolution-based resize for 
# BILINEAR ... the other systems in this benchmark are using affine + bilinear,
# so this is rather unfair. Use NEAREST instead, it gets closest to what
# everyone else is doing
im = im.resize((int (im.size[0] * 0.9), int (im.size[1] * 0.9)), Image.NEAREST)

# sharpen
filter = ImageFilter.Kernel((3, 3),
          (-1, -1, -1,
           -1, 16, -1,
           -1, -1, -1))
im = im.filter(filter)

# write back again
im.save(sys.argv[2])

Octave

#!/usr/bin/octave -qf

pkg load image

im = imread(argv(){1});
im = im(101:end-100, 101:end-100);        % Crop
im = imresize(im, 0.9, 'linear');         % Shrink    
myFilter = [-1 -1 -1
            -1 16 -1
            -1 -1 -1]; 
im = conv2(double(im), myFilter);         % Sharpen
im = max(0, im ./ (max(max(im)) / 255));  % Renormalize
imwrite(argv(){2}, uint8(im));           % Write back again

ImageMagick

#!/bin/bash

# we crop on load, it's a bit quicker and saves some memory
# we can't crop 100 pixels with the crop-on-load syntax, so we have to
# find the width and height ourselves
width=$(vipsheader -f Xsize $1)
height=$(vipsheader -f Ysize $1)

width=$((width - 200))
height=$((height - 200))

set -x

convert "$1[${width}x${height}+100+100]" \
        -filter triangle -resize 90x90% \
        -convolve "-1, -1, -1, -1, 16, -1, -1, -1, -1" \
        $2

GraphicsMagick

#!/bin/bash

set -x

# GraphicsMagick does not have crop-on-load so we use -shave instead
gm convert $1 \
        -shave 100x100 \
        -filter triangle -resize 90x90% \
        -convolve "-1, -1, -1, -1, 16, -1, -1, -1, -1" \
        $2

ExactImage

#!/bin/bash

width=$(vipsheader -f Xsize $1)
height=$(vipsheader -f Ysize $1)

width=$((width - 200))
height=$((height - 200))

# set -x

econvert -i $1 \
	--crop "100,100,$width,$height" \
	--bilinear-scale 0.9 \
	--convolve "-1, -1, -1, -1, 16, -1, -1, -1, -1" \
	-o $2

GMIC

#!/bin/bash

width=$(vipsheader -f Xsize $1)
height=$(vipsheader -f Ysize $1)
crop_width=$((width - 200))
crop_height=$((height - 200))

gmic \
        -verbose - \
        -input $1 \
        -crop 100,100,$crop_width,$crop_height \
        -resize 90%,90%,1,3,3,1 \
        "(-1,-1,-1;-1,9,-1;-1,-1,-1)" -convolve[-2] [-1] -keep[-2] \
        -type uchar \
        -output $2

FreeImage

/* Compile with:

   gcc freeimage.c -lfreeimage

 */

#include <FreeImage.h>

int
main (int argc, char **argv)
{       
  FIBITMAP *t1;
  FIBITMAP *t2;
  int width;
  int height;

  FreeImage_Initialise (FALSE);

  t1 = FreeImage_Load (FIF_TIFF, argv[1], TIFF_DEFAULT);

  width = FreeImage_GetWidth (t1); 
  height = FreeImage_GetHeight (t1); 

  t2 = FreeImage_Copy (t1, 100, 100, width - 100, height - 100); 
  FreeImage_Unload (t1); 

  t1 = FreeImage_Rescale (t2, (width - 200) * 0.9, (height - 200) * 0.9,
                          FILTER_BILINEAR);
  FreeImage_Unload (t2); 

  /* FreeImage does not have a sharpen operation, so we skip that.
   */

  FreeImage_Save (FIF_TIFF, t1, argv[2], TIFF_DEFAULT);
  FreeImage_Unload (t1); 

  FreeImage_DeInitialise ();

  return 0;
}      

NetPBM

#!/bin/bash

cat > mask <<EOF
P2
3 3
32
14 14 14 
14 48 14
14 14 14
EOF

tifftopnm $1 | \
  pnmcut -left 100 -right -100 -top 100 -bottom -100 | \
  pnmscale 0.9 | \
  pnmconvol mask | \
  pnmtotiff -truecolor -color > $2

ImageScience

#!/usr/bin/ruby

require 'rubygems'
require 'image_science'

ImageScience.with_image(ARGV[0]) do |img|
    img.with_crop(100, 100, img.width() - 100, img.height() - 100) do |crop|
        crop.resize(crop.width() * 0.9, crop.height() * 0.9) do |small|
            small.save(ARGV[1])
        end
    end
end

OpenImageIO

#!/bin/bash

width=$(vipsheader -f Xsize $1)
height=$(vipsheader -f Ysize $1)

width=$((width - 200))
height=$((height - 200))

# resize with triangle is bilinear

# this will blur rather than sharpen, but the speed should be the same

oiiotool $1 \
	--crop $widthx$height+100+100 --origin +0+0 --fullpixels \
	--resize:filter=triangle 90% \
	--kernel gaussian 3x3 --convolve \
	-o $2

RMagick

#!/usr/bin/ruby

require 'rubygems'
require 'RMagick'
include Magick

im = ImageList.new(ARGV[0])

im = im.shave(100, 100)
im = im.resize(im.columns * 0.9, im.rows * 0.9, filter = TriangleFilter)
kernel = [-1, -1, -1, -1, 16, -1, -1, -1, -1]
im = im.convolve(3, kernel)
                   
im.write(ARGV[1])

OpenCV

/* compile with:

   g++ -g -Wall opencv.cc `pkg-config opencv --cflags --libs`

   code from Amadan@shacknews, thank you very much!

 */

#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>

using namespace cv;

int
main (int argc, char **argv)
{
  Mat img = imread (argv[1]);

  if (img.empty ())
    return 1;

  Mat crop = Mat (img, Rect (100, 100, img.cols - 200, img.rows - 200));

  Mat shrunk;
  resize (crop, shrunk, Size (0, 0), 0.9, 0.9);

  float m[3][3] = { {-1, -1, -1}, {-1, 16, -1}, {-1, -1, -1} };
  Mat kernel = Mat (3, 3, CV_32F, m) / 8.0;

  Mat sharp;
  filter2D (shrunk, sharp, -1, kernel, Point (-1, -1), 0, BORDER_REPLICATE);

  imwrite (argv[2], sharp);

  return 0;
}

sips

#!/bin/bash

width=$(vipsheader -f Xsize $1)
height=$(vipsheader -f Ysize $1)

crop_width=$((width - 200))
crop_height=$((height - 200))

resize_width=$((crop_width * 9 / 10))

# set -x

sips \
        --cropToHeightWidth $crop_height $crop_width \
        --resampleWidth $resize_width \
        $1 --out $2 &> /dev/null

gd

// compile with
// gcc -Wall gd.c `pkg-config gdlib --cflags --libs` -o gd

#include <stdio.h>
#include <stdlib.h>

#include <gd.h>

int
main( int argc, char **argv )
{
	FILE *fp;
	gdImagePtr original, cropped,resized;
	gdRect crop;

	if( argc != 3 ) {
		printf( "usage: %s in-jpeg out-jpeg\n", argv[0] );
		exit( 1 );
	}

	if( !(fp = fopen( argv[1], "r" )) ) {
		printf( "unable to open \"%s\"\n", argv[1] );
		exit( 1 );
	}
	if( !(original = gdImageCreateFromJpeg( fp )) ) {
		printf( "unable to load \"%s\"\n", argv[1] );
		exit( 1 );
	}
	fclose( fp );

	crop.x = 100;
	crop.y = 100;
	crop.width = original->sx - 200;
	crop.height = original->sy - 200;
	cropped = gdImageCrop( original, &crop );
	gdImageDestroy( original );
	original = 0;
	if( !cropped ) {
		printf( "unable to crop image\n" ); 
		exit( 1 );
	}

	resized = gdImageScale( cropped, crop.width * 0.9, crop.height * 0.9 );
	gdImageDestroy( cropped );
	cropped = 0;
	if( !resized ) {
		printf( "unable to resize image\n" ); 
		exit( 1 );
	}

	//gdImageSharpen is extremely slow
	gdImageSharpen( resized, 75 );

	if( !(fp = fopen( argv[2], "w" )) ) {
		printf( "unable to open \"%s\"\n", argv[2] );
		exit( 1 );
	}
	gdImageJpeg( resized, fp, -1 );
	fclose( fp );

	gdImageDestroy( resized );

	return( 0 ); 
}

GEGL

/* compile with
 
   gcc -g -Wall gegl.c `pkg-config gegl-0.2 --cflags --libs`

 */

#include <stdio.h>
#include <stdlib.h>

#include <gegl.h>

static void 
null_log_handler (const gchar *log_domain, 
		  GLogLevelFlags log_level, 
		  const gchar *message, 
		  gpointer user_data)
{
}

int
main (int argc, char **argv)
{
  GeglNode *gegl, *load, *crop, *scale, *sharp, *save;

  gegl_init (&argc, &argv);

  if (argc != 3) 
    {           
      fprintf (stderr, "usage: %s file-in file-out\n", argv[0]);
      exit (1);
    }

  g_log_set_handler ("GEGL-load.c", 
    G_LOG_LEVEL_WARNING | G_LOG_FLAG_FATAL | G_LOG_FLAG_RECURSION, 
    null_log_handler, NULL);
  g_log_set_handler ("GEGL-gegl-tile-handler-cache.c", 
    G_LOG_LEVEL_WARNING | G_LOG_FLAG_FATAL | G_LOG_FLAG_RECURSION, 
    null_log_handler, NULL);

  gegl = gegl_node_new ();
        
  load = gegl_node_new_child (gegl,
                              "operation", "gegl:load",
                              "path", argv[1], 
                              NULL);

  crop = gegl_node_new_child (gegl, 
                              "operation", "gegl:crop",
                              "x", 100.0,
                              "y", 100.0,
                              "width", 4800.0, 
                              "height", 4800.0, 
                              NULL);
                
  scale = gegl_node_new_child (gegl,
                               "operation", "gegl:scale",
                               "x", 0.9,
                               "y", 0.9,
                               "filter", "linear", 
                               "hard-edges", FALSE, 
                               NULL);
                
  sharp = gegl_node_new_child (gegl,
                               "operation", "gegl:unsharp-mask",
                               "std-dev", 1.0, // diameter 7 mask in gegl
                               NULL);

  save = gegl_node_new_child (gegl,
                              "operation", "gegl:save",
                              //"operation", "gegl:png-save",
                              //"bitdepth", 8,
                              "path", argv[2], 
                              NULL);

  gegl_node_link_many (load, crop, scale, sharp, save, NULL);
 
  //gegl_node_dump( gegl, 0 );

  gegl_node_process (save);

  //gegl_node_dump( gegl, 0 );
                
  g_object_unref (gegl);

  gegl_exit ();

  return (0);
}