CS 180 Project 2: Fun with Filters and Frequencies!

Part 1.1: Finite Difference Operator

Starting with the cameraman image, we explore the capabilities of the finite difference operator. We define two filters D_x = [1, -1] and D_y = [1, -1]^T and convolve them with the cameraman image.

cameraman convolved with D_x
(cameraman_x)

cameraman convolved with D_y
(cameraman_y)

Next, we compute the gradient magnitude image by computing ((cameraman_x)+(cameraman_y))^0.5. In order to binarize this image, we set choose a threshold and set all values above it to be 1 and below it to be 0.

gradient magnitude image with threshold=0.1

gradient magnitude image with threshold=0.3

Part 1.2: Derivative of Gaussian (DoG) Filter

As the previous results are noisy, we utilize a smoothing operator to reduce this noise. In particular, we use a Gaussian filter, G. This creates a blurred version of the cameraman image by convolving with a Gaussian (low-pass filter). Thus, we repeat the process described above, starting with a blurred version of the cameraman image.

blurred cameraman image convolved with D_x

blurred cameraman image convolved with D_y

Notice that here we are having to do four convolutions on the cameraman image (two for calculating the blurred convolution with D_x and two for calculating the blurred convolution with D_y). However, recall that convolution is a commutative operation. Thus, it should be able to reduce the number of convolutions we perform by half. That is, perform the first convolution between the D_x/D_y filter with G to get a Derivative of Gaussian filter (DoG). This is preferable because it is less expensive than performing multiple convolutions on the larger cameraman image.

Now we should see (and we do) that the gradient magnitude image is identical when comparing between the approach with separated convolutions and the approach that utilizes the DoG filters. Note that any small differences come from small padding errors (convolution does not normally preserve the size of an image), but these errors are negligble.

gradient magnitude image with threshold=0.1
(4 convolutions on cameraman)

gradient magnitude image with threshold=0.1
(2 convolutions on cameraman by utilizing DoG filters)

Part 2.1: Image "Sharpening"

Recall that the Gaussian filter G is a low-pass filter. In order to create a high-pass filter, we subtract the blurred version of the image (which contains the low frequencies) from the original image (which contains all frequencies), granting us an image with only high frequencies.

We can utilize the linearity of convolution to make this process into a single convolution filter, called the unsharp mask filter. We get this filter F by selecting some parameter alpha and then computing (1 + alpha) * I - alpha * G, where I denotes the unit impulse filter and '*' denotes scalar-filter multiplication, not convolution. Using this filter F yields the following results:

Furthermore, we can attempt to 'regain' clarity from a blurred image by doing this process. That is, if we blur an image, we can attempt to 'unblur' it by sharpening it. Note that this doesn't recover the high frequencies that are lost during the blurring process, but it does emphasize the ones that remain.

Part 2.2: Hybrid Images

Next, we seek to create hybrid images, images which appear to be one thing up close and another when further away. This is done by exploiting the fact that the human eye percieves higher frequencies when up close and lower frequencies when further away. Thus, in order to create a hybrid image, one can use a low-pass filter on one image, a high-pass filter on another image, and then average them. Some examples of carrying out this process are given below:

Notice that the color works better for the low-frequency component, though it does work for both. This makes intuitive sense, as the broad strokes of the hybrid image are determined by the low-frequency component, and thus its colors end up mattering more.

For the above result (my favorite of the hybrid images), we can visualize the processes using frequency analysis with the log magnitude of the Fourier transform of the two input images, the filtered images, and the hybrid image.

banana
(log magnitude of the Fourier transform)

minions
(log magnitude of the Fourier transform)

low-pass filter on banana
(log magnitude of the Fourier transform)

high-pass filter on minions
(log magnitude of the Fourier transform)

Bananaaaaaaaaaaa hybrid image
(log magnitude of the Fourier transform)

Unfortunately, this process of making hybrid images does not always work. Sometimes, the images cannot be properly aligned, one of the images dominates too much, or both mess with each other so much that everything becomes unintelligible. In the following example, the hybrid image fails because all of the words become illegible.

a trick-or-treat sign (unmodified picture)

a live, laugh, love sign (unmodified picture)

Live, trick, laugh, love, or treat
(hybrid of the two signs)

Part 2.3: Gaussian and Laplacian Stacks

Next, we move forward by investigating multiresolution blending. In order to accomplish this task, it will be necessary to develop some machinery that can handle Gaussian and Laplacian stacks.

Gaussian stacks, G, are computed by taking an image and repeatedly applying a Gaussian filter onto it, placing an unblurred image in the start of the stack, G_0. Laplacian stacks, L, are computed by taking a Gaussian stack, [G_0, G_1, ..., G_n], and calculating L_i by G_i - G_(i+1) for i=0,1,...,n-1, and setting L_n to G_n. Thus, we note that L_0 + L_1 + ... + L_n is the original image.

The following are examples of Gaussian and Laplacian stacks, normalized for visualization purposes (with the first row being a Gaussian stack and the second row being a Laplacian stack):

Part 2.4: Multiresolution Blending (a.k.a. the oraple!)

Now that we can create Gaussian and Laplacian stacks, we are ready to create multiresolution blended images. We define blend_i = mGi * ALi + (1 - mGi) * BLi where mGi is the ith element of the Gaussian stack of the mask, ALi is the ith element of the Laplacian stack of image A, and BLi is the ith element of the Laplacian stack of image B. To get the final blended image, we sum up each of the blend_i. Note that all of the following images are once again normalized for visualization purposes.

Note that the irregular mask for the above process was created on my iPad by drawing over the initial photo of me. I'm very pleased with how all of the blended images turned out. The multiresolution blending process produces significantly better results than just simply stitching images together.