Starting with the cameraman image, we explore the capabilities of the finite difference operator. We define two filters D_x = [1, -1] and D_y = [1, -1]^T and convolve them with the cameraman image.
Next, we compute the gradient magnitude image by computing ((cameraman_x)+(cameraman_y))^0.5. In order to binarize this image, we set choose a threshold and set all values above it to be 1 and below it to be 0.
As the previous results are noisy, we utilize a smoothing operator to reduce this noise. In particular, we use a Gaussian filter, G. This creates a blurred version of the cameraman image by convolving with a Gaussian (low-pass filter). Thus, we repeat the process described above, starting with a blurred version of the cameraman image.
Notice that here we are having to do four convolutions on the cameraman image (two for calculating the blurred convolution with D_x and two for calculating the blurred convolution with D_y). However, recall that convolution is a commutative operation. Thus, it should be able to reduce the number of convolutions we perform by half. That is, perform the first convolution between the D_x/D_y filter with G to get a Derivative of Gaussian filter (DoG). This is preferable because it is less expensive than performing multiple convolutions on the larger cameraman image.
Now we should see (and we do) that the gradient magnitude image is identical when comparing between the approach with separated convolutions and the approach that utilizes the DoG filters. Note that any small differences come from small padding errors (convolution does not normally preserve the size of an image), but these errors are negligble.
Recall that the Gaussian filter G is a low-pass filter. In order to create a high-pass filter, we subtract the blurred
version of the image (which contains the low frequencies) from the original image (which contains all frequencies),
granting us an image with only high frequencies.
We can utilize the linearity of convolution to make this process into a single convolution filter,
called the unsharp mask filter. We get this filter F by selecting some parameter alpha and then computing
(1 + alpha) * I - alpha * G, where I denotes the unit impulse filter and '*' denotes scalar-filter
multiplication, not convolution. Using this filter F yields the following results:
Furthermore, we can attempt to 'regain' clarity from a blurred image by doing this process. That is, if we blur an image, we can attempt to 'unblur' it by sharpening it. Note that this doesn't recover the high frequencies that are lost during the blurring process, but it does emphasize the ones that remain.
Next, we seek to create hybrid images, images which appear to be one thing up close and another when further away. This is done by exploiting the fact that the human eye percieves higher frequencies when up close and lower frequencies when further away. Thus, in order to create a hybrid image, one can use a low-pass filter on one image, a high-pass filter on another image, and then average them. Some examples of carrying out this process are given below:
Notice that the color works better for the low-frequency component, though it does work for both. This makes intuitive sense, as the broad strokes of the hybrid image are determined by the low-frequency component, and thus its colors end up mattering more.
For the above result (my favorite of the hybrid images), we can visualize the processes using frequency analysis with the log magnitude of the Fourier transform of the two input images, the filtered images, and the hybrid image.
Unfortunately, this process of making hybrid images does not always work. Sometimes, the images cannot be properly aligned, one of the images dominates too much, or both mess with each other so much that everything becomes unintelligible. In the following example, the hybrid image fails because all of the words become illegible.
Next, we move forward by investigating multiresolution blending. In order to accomplish this task, it will
be necessary to develop some machinery that can handle Gaussian and Laplacian stacks.
Gaussian stacks, G, are computed by taking an image and repeatedly applying a Gaussian filter onto it, placing an
unblurred image in the start of the stack, G_0. Laplacian stacks, L, are computed by taking a Gaussian stack,
[G_0, G_1, ..., G_n], and calculating L_i by G_i - G_(i+1) for i=0,1,...,n-1, and setting L_n to G_n. Thus,
we note that L_0 + L_1 + ... + L_n is the original image.
The following are examples of Gaussian and Laplacian stacks, normalized for visualization purposes
(with the first row being a Gaussian stack and the second row being a Laplacian stack):
Now that we can create Gaussian and Laplacian stacks, we are ready to create multiresolution blended images. We define blend_i = mGi * ALi + (1 - mGi) * BLi where mGi is the ith element of the Gaussian stack of the mask, ALi is the ith element of the Laplacian stack of image A, and BLi is the ith element of the Laplacian stack of image B. To get the final blended image, we sum up each of the blend_i. Note that all of the following images are once again normalized for visualization purposes.