CS 180 Project 1: Images of the Russian Empire

Ryan Campbell

Introduction

Between 1909 and 1915, Sergey Mikhaylovich Prokudin-Gorsky travelled across the Russian Empire and documented a variety of things through photography. Interestingly, despite this being well before the time of widespread color photography, his records give us a colorful insight into the Russian Empire. This is because he was ingenious enough to capture three exposures of every photo, using a red, green, and blue filter.

Now, his photographs have been digitized by the Library of Congress. Each image is a vertical stack of the individual images captured by the blue, green, and red filters.

Using image processing techniques, one can convert the stacked images shown above into colorful photographs showcasing the beauty of the Russian Empire in the early 20th century.

Single-scale Alignment

First, I split the starting image into thirds (blue, green, and red). We take the approach of finding displacements for the green and red images and then placing them back on top of the blue image. In order to find the "optimal" displacements of the green and red images, we can choose a metric to score different displacements and then search over as many displacements as possible.

Some possible metrics include Euclidean Distance (L2 norm), Normalized Cross-Correlation (NCC), Structural Similarity Index Measure (SSIM), etc. For this project, we utilize SSIM as implemented in skimage.metrics . Although this metric takes slightly longer to compute than the others, for the larger images, I found this to have better results. Furthermore, I excluded the 5% outside border from the calculation because there is less correlation between the different channels on the borders.

Even on smaller images, it can take awhile to check every possible displacement. Thus, we must intelligently restrict our search space. Notice that some displacements can be ruled out without checking them, as it's likely that the optimal displacement is somewhat close to 0. This is because we assume that the blue, green, and red images are close to each other within a reasonable error. Thus, my algorithm checks a displacement range of [-15, 15] for each of green Δx, green Δy, red Δx, and red Δy.

Running this alignment procedure on the three smaller images yields the results shown below.

cathedral.jpg
Δ green: (5, 2)
Δ red: (12, 3)
runtime: 11.06s

monastery.jpg
Δ green: (-3, 2)
Δ red: (3, 2)
runtime: 11.19s

tobolsk.jpg
Δ green: (3, 3)
Δ red: (6, 3)
runtime: 11.09s

Multiscale Alignment

For images of a larger scale, the approach described above is infeasible as (1) the scale is larger, so more displacements need to be checked since each pixel is less noticeable and (2) calculating the metric takes longer for each check. Luckily, there is a way to vastly reduce the search space and still maintain a high quality output.

The following is the idea. Recursively scale down the image by a factor of two until a 'small-enough' size is achieved. Then, use the procedure from before to quickly find the 'optimal' displacement of this smaller image. Then, scaling this image up along with the calculated displacement yields a close approximation of the 'optimal' displacement for the higher quality image. Repeat this process until the image returns to its original size.

My algorithm calculates the 'small-enough' size by finding the depth such that smallest side of the image is roughly 100 pixels. Then, it checks a range of [-3, 3] for each displacement. After finding the 'optimal' displacement, I rescale up the current image and the current optimal displacement by a factor of 2. I repeat this process until I return to my original image, where I have to calculate very few displacements since my approximation should already be very close.

Running this multiscale alignment procedure on the larger images yields the results shown below (in a much faster time than it would take with the single-scale alignment procedure).

church.tif
Δ green: (25, 4)
Δ red: (59, -4)
runtime: 107.29s

emir.tif
Δ green: (50, 22)
Δ red: (105, 40)
runtime: 110.21s

harvesters.tif
Δ green: (59, 15)
Δ red: (122, 12)
runtime: 118.93s

icon.tif
Δ green: (39, 16)
Δ red: (89, 23)
runtime: 124.10s

lady.tif
Δ green: (57, 8)
Δ red: (120, 12)
runtime: 133.33s

melons.tif
Δ green: (80, 10)
Δ red: (177, 13)
runtime: 112.90s

onion_church.tif
Δ green: (52, 25)
Δ red: (108, 35)
runtime: 123.56s

sculpture.tif
Δ green: (33, -11)
Δ red: (140, -27)
runtime: 140.27s

self_portrait.tif
Δ green: (78, 28)
Δ red: (175, 37)
runtime: 159.43s

three_generations.tif
Δ green: (56, 17)
Δ red: (113, 10)
runtime: 111.33s

train.tif
Δ green: (40, 7)
Δ red: (85, 29)
runtime: 134.13s

Table of Displacements

filename	method	green dx	green dy	red dx	red dy	time
cathedral.jpg	single-scale	5	2	12	3	11.06
monastery.jpg	single-scale	-3	2	3	2	11.19
tobolsk.jpg	single-scale	3	3	6	3	11.09
church.tif	multiscale	25	4	59	-4	107.29
emir.tif	multiscale	50	22	105	40	110.21
harvesters.tif	multiscale	59	15	122	12	118.93
icon.tif	multiscale	39	16	89	23	124.10
lady.tif	multiscale	57	8	120	12	133.33
melons.tif	multiscale	80	10	177	13	112.90
onion_church.tif	multiscale	52	25	108	35	123.56
sculpture.tif	multiscale	33	-11	140	-27	140.27
self_portrait.tif	multiscale	78	28	175	37	159.43
three_generations.tif	multiscale	56	17	113	10	111.33
train.tif	multiscale	40	7	85	29	134.13

Bells and Whistles

Notice that after the alignment process is complete, each image is left with some borders. As these borders are not the same for each image, some custom border-cropping procedure is needed to clean up the photos (rather than setting a default crop). Thus, I devised the following procedure to automatically crop the aligned photos:

First, use the calculated 'optimal' displacement to make the obvious cuts. For example, if the red channel is shifted to the right by 20 pixels, then clearly the leftmost 20 pixels of the overall image can be safely cut as the lack of red will at the very least create an artificial border (if there isn't already a border there). Since the blue channel stays fixed, any displacement of the green and red channels gives us some safe cuts to make, as we won't want any parts of our picture to not contain all three color channels.

Second, convert the initial crop to grayscale in the following ways: (1) using all color channels along with the weights for each channel specified by skimage.color.rgb2array , (2) dropping the red channel, (3) dropping the green channel, and (4) dropping the blue channel. The reason for dropping channels is we want to be able to detect color edges beyond just black and white, and this is harder to do if we only use a singular gray image.

Third, run Canny edge detection on each grayscale image to get an approximation of the edges present in the image. I used the skimage.feature.canny implementation, with σ=3 and default values for everything else. Without runniny Canny edge detection, the results of the next step were too unpredictable and uncontrolled.

Fourth, run probabilistic Hough transform on the edges from the previous step, checking for vertical and horizontal lines. I used the skimage.transform.probabilistic_hough_line implementation, with threshold=5, line_length=200, line_gap=50 for the horizontal line check and threshold=50, line_length=300, line_gap=25 for the vertical line check. I made the constraints harder on the vertical lines because I found my algorithm struggling more with cropping horizontally than vertically.

Finally, use the detected lines (all of them from each of the four grayscale images) on the outer 10% of the image to make the final crop. Following this process yielded the following results:

The cropping process for emir.tif

The final crop of emir.tif

The cropping process for icon.tif

The final crop of icon.tif

The cropping process for melons.tif

The final crop of melons.tif

The cropping process for self_portrait.tif

The final crop of self_portrait.tif

All black and white borders were successfully cropped. Some remnants or partial color borders remained, so my algorithm has room for improvement there. Originally, when I only used the full grayscale image, my algorithm did even worse with the partial color borders, but including the grayscale images with each of the channels removed improved my results significantly.