Image Basics

Images can be characterized by a two dimensional spacial function of the form f(x,y) where x,y are the spacial coordinates and f is the intensity value proportional to the the radiated energy. Hence

The function ‘f’ can be decomposed into i(x,y) and r(x,y) where :
‘i’ is the measure of the amount of ‘illumination’.
‘r’ is the measure of ‘reflectance’.

Therefore the function f(x,y) = i(x,y)*r(x,y) such that:
0<i(x,y)<∞ and
0<r(x,y)<1 where r=0(total absorption) and r=1(total reflectance).

The intensity or gray level of a monochromatic image is given by l=f(x,y)
From the given conditions on ‘i’ and ‘r’ it can be concluded that ‘l’ lies within the range:
Lmin< l <Lmax


The interval [Lmin,Lmax] is called the ‘gray scale’. Generally the scale is shifted numerically to [0,L-1] where l=0 is black and l=L-1 is white and the intermediate values are various shades of gray.

Otsu’s Binarization

For global thresholding methods we gather the threshold values by trial and error. However suppose the image is Bimodal(Basically a bimodal image has two peaks in its histogram). For this the threshold value is gained by taking the value in between the peaks. This is what is done by Otsu’s Binarization. To use it we simply pass an extra flag cv2.THRESH_OTSU to the cv2.threshold function. Pass the maxVal as 0.

import cv2
import numpy as np
from matplotlib import pyplot as plt

img = cv2.imread('image.jpg',0)

# global thresholding
ret1,th1 = cv2.threshold(img,127,255,cv2.THRESH_BINARY)

# Otsu's thresholding
ret2,th2 = cv2.threshold(img,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

# Otsu's thresholding after Gaussian filtering
blur = cv2.GaussianBlur(img,(5,5),0)
ret3,th3 = cv2.threshold(blur,0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)

# plot all the images and their histograms
images = [img, 0, th1,
img, 0, th2,
blur, 0, th3]
titles = ['Original Noisy Image','Histogram','Global Thresholding (v=127)',
'Original Noisy Image','Histogram',"Otsu's Thresholding",
'Gaussian filtered Image','Histogram',"Otsu's Thresholding"]

for i in xrange(3):
plt.title(titles[i*3]), plt.xticks([]), plt.yticks([])
plt.title(titles[i*3+1]), plt.xticks([]), plt.yticks([])
plt.title(titles[i*3+2]), plt.xticks([]), plt.yticks([])

The result:

OpenCV+Python:Part3–Geometric Transformations

In this post I will explain how to go about rotating or translating images.


Scaling can be done by using the cv2.resize() function.The size can be provided manually or a scaling factor can be given.

import cv2
import numpy as np
img = cv2.imread('image.jpg')
height, width = img.shape[:2]
res = cv2.resize(img,(2*width, 2*height), interpolation = cv2.INTER_CUBIC)

The interpolation method used here is cv2.INTER_CUBIC
The default interpolation function is cv2.INTER_LINEAR


Shifting any objects location can be done using the cv2.warpAffine
To shift an image by (x,y) a transformation matrix M =[(1,0,Tx),(0,1,Ty)] using numpy array type np.float32. The following example code shifts the image by (200,100).

import cv2
import numpy as np
img = cv2.imread('image.jpg',0)
rows,cols = img.shape
M = np.float32([[1,0,100],[0,1,50]])
dst = cv2.warpAffine(img,M,(cols,rows))

This results into:


The cv2.warpAffine function takes in three arguments. The first is the image. Second is the transformation matrix for shifting. And the third is the output size.


OpenCV provides rotation with an adjustable center of rotation and a scaling factor. The transformation matrix for rotation M is:




import numpy as np
import cv2
img = cv2.imread('image.jpg',0)
rows,cols = img.shape

M = cv2.getRotationMatrix2D((cols/2,rows/2),90,1)
dst = cv2.warpAffine(img,M,(cols,rows))
cv2.waitKey(0) & 0xFF

To apply this transformation matrix we used the OpenCV function cv2.getRotationMatrix2D. We scale the image by half and rotate it by 90 degrees anticlockwise.
The result is:

Affine Transformation

In this transformation all the parallel lines are kept parallel in the final image.

import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('image.jpg')
rows,cols,ch = img.shape

pts1 = np.float32([[50,50],[200,50],[50,200]])
pts2 = np.float32([[10,100],[200,50],[100,250]])

M = cv2.getAffineTransform(pts1,pts2)

dst = cv2.warpAffine(img,M,(cols,rows))

cv2.waitKey(0) & 0xFF

To obtain the transformation matrix we need three points from the source image and three points of the destination image to define the planes of transformation. Then using the function cv2.getAffineTransform we get a 2×3 matrix which we pass into the cv2.warpAffine function.

The result looks like this:

Perspective Transform

This transformation leads to change in the point of view of the image. The straight lines remain as it is. For this transformation we need 4 points from the source and output image of which 3 should be non-collinear to define a plane. From these points we define a 3×3 matrix using cv2.getPerspectiveTransform and pass the resulting matrix into cv2.warpPerspective.

import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('image.jpg')
rows,cols,ch = img.shape

pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
pts2 = np.float32([[0,0],[300,0],[0,300],[300,300]])

M = cv2.getPerspectiveTransform(pts1,pts2)

dst = cv2.warpPerspective(img,M,(300,300))

cv2.waitKey(0) & 0xFF

The result looks like this :

Thats all !!

OpenCV+Python:Part 2–Image Arithmetics


You can add two images either using OpenCV: cv2.add() or Numpy: result = img1 + img2
(Both images should be of same depth and type.)There is a major difference between these two.

>>> x = np.uint8([250])
>>> y = np.uint8([10])

>>> print cv2.add(x,y) # 250+10 = 260 => 255

>>> print x+y # 250+10 = 260 % 256 = 4

*OpenCV provides better results


Adding images using the previous method is very blunt. Using blending you can get cool transition between two images.
Blending is done by using the OpenCV function cv2.addWeighted() using the formula:
f(x)=a*img1 + (1-a)img2 + z
where a is the weight.
What we basically do is provide weights to the two images such that they mix with different intensities.

The following code adds two images with weights 0.7 and 0.3.
(Both images should be of same depth and type)

img1 = cv2.imread('img1.png')
img2 = cv2.imread('img2.jpg')

result = cv2.addWeighted(img1,0.7,img2,0.3,0) # z is taken as 0


The final result looks somewhat like this:

OpenCV+Python:Part 2–Working with Images

–Access and Edit Pixel Values

All of the following steps can be performed using the Python terminal.
First of all load the image:

>>>import cv2
>>>import numpy as np
>>>img = cv2.imread('image.jpg')

To get a pixel value of a particular position:

>>>pix = img[x,y] #x,y are the coordinates
>>>print pix

To modify the pixel value of a particular point (x,y)

>>>img[x,y]=[B,G,R] #where B,G,R are integer values

A much faster method is using Numpy functions array.item() and array.itemset() to access and edit pixel values.However it only returns a scalar value.So to access the B,G,R values you need call the function array.item() separately for all.

–Image Properties
1.)>>>print image.shape
Its returns the a tuple with number of rows,columns and channels.
2.)>>>print image.size
Returns the numbers of pixels accessed by the image.
3.)>>>print img.dtype
Returns the Image datatype.

To select a particular region of image:
>>>part = img[x1:y1,x2:y2]

To paste the selected ROI at some other location:
>>>img[p1:q1,p2:q2] = part

–Splitting and Merging Channels

If you want to split B,G,R channels or merge them back use:

>>>b,g,r = cv2.split(img)
>>>img = cv2.merge(b,g,r)

However if you want to edit a particular channel a faster method would be to use numpy.
E.g. to set all red pixels to zero:

>>> img[:,:,2] = 0

Thats all in this post..

OpenCV+Python Part 1–Working with Videos

Learn how to load, display and save videos. I’ll explain this using the code snippets.The following program captures a video from the camera (I am using the in-built webcam of my laptop) and displays it.

import numpy as np
import cv2

x = cv2.VideoCapture(0)

# Capture frame-by-frame
ret, frame =

# Our operations on the frame come here
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# Display the resulting frame
if cv2.waitKey(1) & 0xFF == ord('q'):

# When everything done, release the capture

The first thing that we need to do is to create a Video Capture object ‘x’.The argument passed to it is either the Device Index(a number to specify which camera) or the name of a video file. Normally since only one camera is connected to the system a 0 is passed. To select a second camera you can pass a 1 and so on. checks if the frame is read correctly and returns a boolean value.

Next lets play a video from a file.

First of all–
Go to : OpenCV\3rdparty\ffmpeg\
Copy the dll files opencv_ffmpeg.dll or opencv_ffmpeg_64.dll (depending on your system architecture) and paste them into C:\Python27\

Now rename both these to opencv_ffmpeg24x.dll or opencv_ffmpeg24x_64.dll where x is the version of opencv you are using. For example I am using OpenCV 2.4.6 so I renamed them
opencv_ffmpeg246.dll or opencv_ffmpeg246_64.dll.

Then using the following code snippet you can play any video from the current directory.

import numpy as np
import cv2

cap = cv2.VideoCapture('video.mp4')

ret, frame =
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
if cv2.waitKey(1) & 0xFF == ord('q'):


Now the final step is to save a video from a cam.
The code captures from a Camera, flips every frame vertically and saves the video.

import numpy as np
import cv2

cap = cv2.VideoCapture(0)

# Define the codec and create VideoWriter object
fourcc =*'XVID')
out = cv2.VideoWriter('output.avi',fourcc, 20.0, (640,480))

ret, frame =
if ret==True:
frame = cv2.flip(frame,0)

# write the flipped frame

if cv2.waitKey(1) & 0xFF == ord('q'):

# Release everything if job is finished

For images we can easily used the function cv2.imwrite().However with videos it gets a bit tough.
Now along with a VideoCapture object we create a VideoWriter object where the arguments are the following:
1.)The name of the output file.
2.)Then we specify the FourCC code.FourCC is a 4-byte code used to specify the video codec.
Download the codec file for windows from — FourCC Codec.
3.)The number of frames per second(fps).
4.)The frame size.

That is all in this post..! All the best.

Installing OpenCV in Ubuntu14.04(trusty)/13.10(saucy)

First, download the latest version of Open-CV from here :

Now install the dependencies:


 sudo apt-get install build-essential 


 sudo apt-get install cmake 

Use Synaptic Package Manager to install these packages:
Python,Python-dev, Numpy, libavcodec-dev, libavformat-dev, libswscale-dev

Optional packages: libjpeg-dev, libpng-dev, libtiff-dev, libjasper-dev,lib1394 2.x

Now extract the opencv zip file and go to the extracted directory.

Next create a build directory and go into that:

 mkdir build
 cd build 

Next execute :

 cmake ../ 
 sudo make
 sudo make install 

If everything is done correctly Open-CV is installed and ready to work with.

SimpleCV with Raspberry Pi

This post is about installing SimpleCV onto your R pi .
First power up the pi and connect it to internet.
Next run the following command to install the necessary dependancies:
$sudo apt-get install ipython python-opencv python-scipy python-numpy python-setuptools python-pip

If you haven’t installed git you can do so by typing:

sudo apt-get install git

You can install SimpleCV from source.

mkdir ~/simplecv
cd ~/simplecv
git clone git://
cd SimpleCV
sudo pip install -r requirements.txt
sudo python develop

This will take a bit of time.
Next connect a compatible camera to the board input and open up the terminal.

raspberry@pi:~$ simplecv
SimpleCV:1> c = Camera()
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument
VIDIOC_QUERYMENU: Invalid argument

SimpleCV:2> c.getImage()

SimpleCV:3> exit()

Congratulations, your RaspberryPi is now running SimpleCV!