ROB501: Assignment #1: Image Transforms and Billboard

Starting from:

~~$30~~

$24

Overview

In this project, you will gain experience with the perspective transformation operation (discussed in detail in the lectures), bilinear interpolation, and histogram equalization. You will use the perspective transform to replace a portion of an existing image with an alternate image (the ‘hack’). The goals are to:

• aid in understanding perspective transformations (or 2D homographies) and to help to visualize their application to images;
• experiment with inverse image warping and bilinear interpolation, to insert one image into another (respecting the appropriate geometry); and
• apply histogram equalization to improve the overall appearance (contrast) of an image.

The due date for project submission is Wednesday, October 5, 2022, by 11:59 p.m. EDT. All submissions will be in Python 3.8 via Autolab (more details will be provided in class and on Quercus); you may submit as many times as you wish until the deadline. To complete the project, you will need to review some material that goes beyond that discussed in the lectures—more details are provided below.

(a) Yonge & Dundas Square (b) Soldiers’ Tower

1 of 3
ROB501: Computer Vision for Robotics Assignment #1: Image Transforms and Billboard Hacking

Your main project task is to perform some billboard hacking (this is a basic demonstration of the use of computer vision and shows that it can be fairly easy to change ‘reality’). There are two images above: image

(a) is of Yonge and Dundas Square, an area that contains several large billboards, while image (b) is of Soldiers’ Tower on the University of Toronto campus. Conveniently, the image of Yonge and Dundas Square has very limited radial distortion, which makes it suitable for our purposes. Your assignment (should you choose to accept it) is to replace the billboard advertisement for the “CN Tower Edge Walk” with the photo of Soldiers’ Tower, such that the result looks natural (i.e., like the image of Soldiers’ Tower is meant to be there). The project has four parts, worth a total of 50 points.

Please clearly comment your code and ensure that you only make use of the Python modules and functions listed at the top of the code templates. We will view and run your code.

Part 1: Perspective Transformations via the DLT

To carry out this exercise (Part 1), you will need to determine the perspective homography that transforms or maps pixels from the (rectangular) Soldiers’ Tower image to the appropriate coordinates in the Y&D Square image, and vice versa. The homography can be computed using the Direct Linear Transform (DLT) algorithm, given four point correspondences between the two images.

We did not review the DLT algorithm in the lectures, however it is straightforward and easy to implement in Python using NumPy. Details can be found in Section 2.1 of the (very useful) M.A.Sc. thesis written by Elan Dubrofsky of UBC, which is available on Quercus. For the moment, we will consider the four point correspon-dences to be exact—in later lectures, we will show how an overdetermined system of correspondences can be solved to produce an optimal estimate. For this part of the project, you should submit:

• a single function, dlt_homography.py, that computes the perspective homography between two images, given four point correspondences (n.b., the ordering of the points is important).

Note that we are using four matching points in the DLT algorithm, and each point provides two constraints on the homography. However, there are nine numbers in the 3 × 3 homography matrix—recall that the a homography is defined up to scale only (any multiple of all the values in the homography is the same homog-raphy), and so you should normalize your matrix by scaling all entries such that the lower right entry in the matrix is 1.

Part 2: Bilinear Interpolation

With the perspective homography in hand, you can make use of the inverse warping and bilinear interpolation operations (discussed in the lectures and in the Szeliski text) to determine the best pixel value from the Soldiers’ Tower image to replace a pixel value in the Y&D Square image. Note that the Y&D Square image is in colour (it has three bands: R, G, and B), and so the same transform must be applied to each band (the Soldiers’ Tower image is a greyscale image). For this part of the project, you should submit:

• a single function, bilinear_interp.py, that performs bilinear interpolation to produce a pixel intensity value, given an image and a subpixel location (point).

Part 3: Histogram Equalization

You will notice that the image file provided, uoft_soldiers_tower_light.png, is quite bright (over-exposed) and has relatively low contrast. To fix this, you should implement the simple (discrete) histogram equalization algorithm discussed on page 115 of the Szeliski text (and in the course lectures). For this part of the project, you should submit:

• a single function in histogram_eq.py, which performs discrete histogram equalization on the input image (which will be 8-bit and greyscale only).

2 of 3
ROB501: Computer Vision for Robotics Assignment #1: Image Transforms and Billboard Hacking

Part 4: Billboard Hacking

You’re now ready to perform the billboard hack! Using the components you’ve built, you should: enhance the contrast of the Soldiers’ Tower image, compute the perspective homography (once) that defines the warp between the Y&D Square image and the Soldiers’ Tower image, and then perform bilinear interpolation over all of the corresponding pixels to place Soldiers’ Tower in the billboard position. Some portions of the code have already been filled in for you—in particular, the bounding box for the Edge Walk billboard, and the four pixel-to-pixel correspondences between the images, are available. For this (final) part of the project, you should submit:

• a single function in billboard_hack.py, that uses the other functions above to produce the composite, ‘hacked’ image.

The composite image must be stored in colour and which must be exactly the same size as the original Y&D image (in terms of rows and columns, i.e., do not change the image size!).

Grading

Points for each portion of the project will be assigned as follows:

• Perspective homography DLT function – 15 points (3 tests × 5 points per test)

Each test uses a different set of point correspondences. The root of the sum of squared projection errors (compared to the reference homography) must be below 0.1 (pixels) to pass.

• Bilinear interpolation function – 10 points (5 tests × 2 points per test)

Each test relies on a varied reference image and a different point location in that image. The absolute value of the brightness difference (between the reference interpolated brightness) must be less than or equal to 1 to pass (e.g., if the reference value is 212, your function must report 211, 212, or 213 to pass the test).

• Histogram equalization function – 10 points (2 tests; 2 points and 8 points)

There are two tests, one using the over-exposed version of the Soldiers’ Tower image, and one using a hidden reference image (see points allocated above). To pass either test, only 10% of less of the equalized pixel intensity values may be greater than 2 units (of intensity) away from the reference intensity values (this is a fairly generous bound).

• Image composition script – 15 points (3 tests × 5 points per test)

There are three tests, each of which applies a more stringent criterion for matching between your hacked image and the reference solution (in terms of the absolute intensity difference between pixels in the warped region only, evaluated by the mean and standard deviation). For now, the exact threshold parameters are being kept under wraps—if your support functions are working correctly, you should be able to pass the hardest test!

Total: 50 points

Grading criteria include: correctness and succinctness of the implementation of support functions, proper overall program operation and code commenting, and a correct composite image output (subject to some variation). Please note that we will test your code and it must run successfully. Code that is not properly commented or that looks like ‘spaghetti’ may result in an overall deduction of up to 10%.

3 of 3