A Wavelet Wading Pool

Copyright © 1996 Alex Nicolaou. All rights reserved.

CS 788 Term Paper

( This document was written using Maple.)

Abstract

The paper attempts to give a beginner an understanding of the wavelet transform, by practical application of the wavelet techniques to image compression. The Haar Wavelets are considered and applied to this problem, and the difficulties analyzed. In an attempt to change the artifacts seen in the Haar decomposition, a basis proposed by Richard Bartels is applied and experimented with as well. Both bases turn out to introduce too many artifacts into the image to challenge the JPEG encoding, but the COE basis produces better results than the Haar basis as expected.

Motivation

In the interests of understanding wavelet mathematics, it is much easier to follow some hands-on examples as opposed to reading proofs of the mathematics or abstract statements. By directly applying the wavelet transform, the paper hopes to show the reader how some simple wavelets work, by way of introduction to the heavier mathematics that are justified by these initial attempts.

The Haar Transform

The Haar transform uses square pulses to approximate the original function. The basis functions for Haar wavelets at some level all look like a unit pulse, shifted along the x-axis. The basis functions are called scales in wavelet terminology, and are usually denoted as functions [Maple Math] , where [Maple Math] denotes time. The Haar scales are all of the unit pulses

> [Maple Math] .

The functions [Maple Math] are the shifted pulses, shifted by [Maple Math] units to the right.

Figure 1: Plot of the unshifted Haar scale function.

To move to a higher level of detail, we consider the functions [Maple Math] , which are the same bars halved in width. Suppose we have a step function [Maple Math] whose value is 5 on the interval [0, 0.5) and whose value is 3 on the interval [0.5, 1), and 0 otherwise. Then we can represent [Maple Math] as the function [Maple Math] . The closest representation in the lower resolution would be [Maple Math] , which is determined by averaging the previous coefficients (clearly, the average of 5 and 3 is 4). The question is how to get from the lower resolution space back to the higher resolution space? The answer is to know the wavelets, or the functions that span the space that contains the information we have discarded in moving from one resolution to another. The wavelet can be expressed as a linear combination of the basis vectors for the higher detail space, since it lies inside the higher resolution space which is spanned by those basis functions. For the Haar case, there is one wavelet [Maple Math] , which looks like a zig-zag square wave:

Figure 2: The Haar wavelet

To reconstruct the original function, we can easily see that [Maple Math] . In general, we can take the two coefficients of the high resolution basis and re-express them as coefficients of the lower resolution basis plus the wavelets - all we are really doing is a change of basis, which we might usually write as a matrix multiplication!

To write this out with matrices, we would construct a matrix

[Maple Math]

The observant reader will notice that we seem to have neglected some scale factors in the non-matrix version. It is convenient to do so when we are looking at things loosely, but of course matrices won't permit such sloppy constructions. The precision of matrices is strong encouragement for getting to matrix form as soon as possible. Now we can consider our little vector of data [Maple Math] and compute the product [Maple Math] . Doing so yields the result [Maple Math] which corresponds to the equation for [Maple Math] , expressed in terms of [Maple Math] and [Maple Math] instead of in terms of the original basis [Maple Math] and [Maple Math] . Asking how to get back to the original basis couldn't be clearer now: we simply need multiply [Maple Math] by the inverse of [Maple Math] .

The inverse is

[Maple Math]

and the result of multiplying is [5, 3] as expected.

Compressing Images with the Haar Transform

The Haar basis is known to be poor for image compression because it introduces rectangular artifacts into the image. To understand why this is, we invest a bit of time developing and applying the Haar transform to images to see exactly what the results are.

In order to apply the transform to two dimensional data, we can apply the Haar transform first in one dimension, then in the other. Doing so and working out the numbers provides us with a matrix for changing from the high resolution space to the low resolution space:

[Maple Math]

The inverse transform is:

[Maple Math]

We can apply this transform once to the image, and create four smaller "images", one which is the averages of the pixels, and is the coefficients of the low resolution space, and the other three which are the wavelet coefficients that allow us to reconstruct the image. Looking at an image that has been decomposed in this way gives us an immediate intuitive idea of what we know is happening mathematically.

[Maple Bitmap]

Figure 3: A small picture of Lena after one pass of the Haar transform.

We can see clearly from the image that one quarter of it is a smaller version of the image - a lower resolution version. The other three are the details - they looked like the edges of the image have been embossed into grey sheets of no detail at all. What have we gained? We can go back to the original at any time, but we have extracted the structure of the underlying image . A simple compressor, such as the standard UNIX compress utility, can now compress this version of the image to one half of its original size - whereas virtually no compression was possible on the original.

However, we would like much more than 50% compression, with minor loss in image quality. To attempt to get even better results out of the wavelet transform for compression, we would usually quantize the coefficients of the wavelets.

Figure 4a: A plot of the histogram of the wavelet coefficients for the Lena image.

The histogram for the coefficients suggests that the biggest bang for our quantization effort exists around the centre of the coefficient values - that is, most of the coefficients are very close to zero and it is these that we'd like to collapse down to zero to create good compression. Applying this step is when the Haar basis suddenly introduces aliasing, as seen in the image below.

[Maple OLE 2.0 Object]

Figure 4b: A portion of Lena, before and after Haar compression and reconstruction.

What is happening in the figure is that areas of high detail - the feathers, the eyes, the shades of the nose - are preserved. This is because the quantization step is only throwing out the coefficients that are near 0 - in other words, the compression is acting as a type of smoothing function in areas of the image that are smooth. Unfortunately, the quantization's smoothing is also introducing aliasing of the worst kind - horizontal and vertical discontinuities in the reconstructed image. This problem has been tackled quite well by using smoother, overlapping bases and wavelet functions that don't appear to create the harsh aliases [STR96]. An alternative approach, suggested by Richard Bartels [BAR94], might be to use different basis functions that would introduce different aliasing, perhaps less noticable while still as efficient and as simple as the Haar transform.

The COE Transform

The transform proposed by Richard Bartels was dubbed the circle of eight transform by Rob Kroeger [KRO96], for reasons that will become clear as we investigate its form. My work on this transform has relied heavily on the prior work by Richard Bartels and Rob Kroeger, directly continuing their line of thought by producing an actual implementation to see the outcome of the transform's application to real images.

Rather than representing four pixels as four vertical bars of varying heights, which is how the Haar basis sees the image, the COE basis decomposes each set of four pixels into eight triangles arranged around the centre of the four pixels. One of these triangles is shown below.

Figure 6a: A single COE basis function.

The set of all eight basis functions make up four pixels of the original source. A view from the Z axis is shown, with the basis functions numbered from 1 to eight.

[Maple OLE 2.0 Object]

Figure 6b: The COE basis functions.

So the upper left pixel determines the coefficients for the first and last basis functions, and the other pixels map to the other basis functions in a similar way. Four groups of the original basis functions combine to make one new group of basis functions at a lower level of resolution, as shown below.

[Maple OLE 2.0 Object]

Figure 7: The transition from one resolution to another using COE bases.

The matrix that represents the downsampling step shown in figure 7 is reasonably straightforward, once one decides on a numbering scheme for the 32 elements of the high resolution level. We take the 32 elements and place them in a vector by putting the first eight from the upper left in first, then proceeding to the others in clockwise fashion. This choice of ordering is arbitrary, but there is no benefit to choosing a different one. Looking at the mapping carefully, we can see that this problem decomposes into eight subproblems. The first four values of the vector only contribute to one low resolution value and to three wavelet values; they do not contribute to anything else. So in fact, thinking about just one of the eight pieces of the overall resolution step allows us to construct the full matrix. For the top corner, we would like the resulting low resolution value to be the average of the pixels. The wavelets are chosen to minimize computation, leading to a sparse matrix:

[Maple Math]

This matrix shows only a portion of what will be the larger 32x32 matrix. The full matrix is constructed out of submatrices like this one which are moved into appropriate positions in the large matrix (this sometimes involves separating both the rows and columns of the small prototype shown). Inverting the matrix is a simple matter using any available mathematics tool.

Once the matrix is constructed, it needs to be applied to the image. However the image contains only half the number of inputs that are required, and ideally a visually clear method of representation is needed. Rob Kroeger devised a simple mapping for this purpose: simply stretch the image in one direction or another to produce twice as many input values. Correctly numbered, the stretched image corresponds perfectly in a mathematical sense to the COE triangles, and gives a reasonable visual representation while still allowing us to draw the intermediate steps quickly.

[Maple OLE 2.0 Object]

Figure 8: Transforming the COE to a rectangular grid.

After applying the stretching transform, and one step of the COE transform, we get a very similar result to what we saw with the Haar transform: the operation has extracted the structure of the image into three detail portions (the wavelet coefficients) and one low resolution representation of the image.

[Maple Bitmap]

Figure 9: After one pass of the COE transform.

The acid test is applying the same quantization step to the new wavelet coefficients that we used in the Haar case, to see if the aliasing problems are reduced. The theory is that since more of the aliasing will be on diagonals rather than on horizontals and verticals, the aliasing effects will be less bothersome. Although the results are better than with the Haar transform, they do not approach the clean look of the JPEG coder on the same image.

[Maple OLE 2.0 Object]

Figure 10: After reconstruction, triangular artifacts are clearly apparant.

Several other images were also tested using the COE transform, and similar triangular artifacts appear. Although the triangular artifacts are harder to see than boxy artifacts, especially for small image sizes, the end result cannot compete with other wavelet basis.

Conclusions and Future Work

Although the COE did produce less distortion in the images than the Haar method, the results do not compare well to other wavelet bases or to JPEG coding. Perhaps there would be some value to introducing random noise during reconstruction to hide the artifacts, or to researching the anti-aliasing literature for techniques that might be successfully applied as a post-process. Some additional measurement could be done to support the intuition that the COE basis produces "better" images than the Haar transform. A standard error metric could be applied to the image, or a subjective experiment carried out.

An experiment with the Haar scales using wavelet vectors that are not orthogonal to one another showed worse artifacts than in the Haar basis used in this paper's examples. This seems justifiable since two wavelet vectors that are not orthogonal that are both quantized introduce more error in the direction that they are not orthogonal, possibly exceeding the threshold for error that we are trying to control during the quantization step. Possibly choosing new COE wavelet vectors that are orthogonal to each other as well as to the scales would improve the COE transform's quality as well.

The existing literature suggests that bases that overlap more and are smoother produce better results with image processing, and some examples of these functions are discussed in [STR96]. In particular the wavelets that are used for the FBI fingerprint standard could be examined to see if they can be improved upon.

References

[BAR94] Private communication from Richard Bartels to Rob Kroeger.

[KRO96] Private communication from Rob Kroeger to the author.

[STR96] Gilbert Strang and Truong Nguyen, Wavelets and Filter Banks , Wellesley-Cambridge Press, 1996.


[888888] hits since Dec 17, 1996. Author: Alex Nicolaou. Last Modified in August, 1996.