OVERVIEW
Watermarking is a method of protecting digital material. The digital watermarking system is based on that a code is embedded in the image. The code can include various types of information for example ownership of the image, right to copy and terminal identification (a specific number for each user the image has been distributed to). The code is digitized, encoded as additional noise and incorporated in the picture. The size of the encoded information has to be kept to a minimum of bits to minimize noise .Every member has an unique code that works as an index to the register. This code is encoded into the image.
The two main requirements on the mark is that it should be irremovable and unalterable, and the change it introduces to the picture should be negligible. These requirements are in direct conflict, since pulse energy concentrated at low frequencies makes the embedded code more robust but noise at the low frequencies (the watermark is coded as noise) is also more noticeable than noise at the high frequency components. However, high spatial frequencies is often removed when an image is compressed, and there by the code is lost. The watermark must be present throughout the whole image in case parts get cut off or cropped. A watermark must be extractable even from degraded documents, that is it must be robust to withstand any processing, such as file format conversions, color conversions, slight blurring, sharpening, color adjustment and compression, but also printing, copying and scanning.
The algorithm transforms the digital image to frequency domain by implementing the Wavelet Transform .The watermark is added to the image and the Inverse Wavelet Transform is applied to recover the original image. This watermarked image is distributed .In case of extracting the watermark ,the original unmarked image is subtracted from the watermarked test image (which might have been attacked)to get the watermark. The original and extracted watermarks are correlated and if the value is greater than a predefined threshold then the image has been tampered else it is safe and authentic.
Wavelet analysis is fairly new to
mainstream signal processing. It is a projective technique similar to the FFT,
however, rather than simply decomposing the signal into sinusoids of varying
frequency, the data is represented as projections onto the affine group
(translations and dilations). This means the data set can be represented as time
translations of the mother wavelet (a basis function) and/or time dilations
(i.e. shrinking or expanding the time scale of observation).A wavelet transform
typically consists of a set of spatially located orthonormal linear filters
that split an image into oriented spatial-frequency bands. Because of the
multiresolution nature of the transform, various resolution images can be
reconstructed from the same coded image file or bit stream. The wavelet
function extracts information about differences between adjacent positions
within the data which will be designated the 'detail' of the data.. A companion function which retains
information about the averages between adjacent points is the scale function
and is applied to generate a second series of coefficients which will be
designated the 'smoothed' data. These two series of coefficients are each half
as long as the original data set. Each subimage at each scale of the wavelet
decomposition is a different band.


INTRODUCTION
In the middle ages, kings, dukes, barons, and anyone else who styled themselves important would carve a design onto a stamp or a ring. This was used to impress a wax or lead closure sealing the wrappings of items sent by courier, ostensibly proving that the document or package did indeed come from them and hence could be considered authentic.
In time, the stamp took the name of its function and became known as a "seal." When literacy became more widespread and it became common for people to have more than one name, the written signature replaced the seal on most legal documents and other important works. High rate transmission facilities are becoming less expensive and more generally available. It is now feasible and very economical to transmit images and video sequences using data networks rather than to send hard copies by post. In addition, images may be stored in databases in digital form. The process has been given new impetus by the advent of the Internet. However, until now applications have been confined to the distribution of free works or art or to commercial advertising. A major impediment to the use of electronic distribution and storage is the ease of intercepting, copying and redistributing electronic images and documents in their exact original form. As a result, publishers and art houses are understandably reluctant to use this means of disseminating material.
The technique of watermarking paper (rather than heads) is almost as old as paper manufacturing itself, dating back to the late middle ages. Their earliest use seems to have been to record the manufacturer’s trademark on the product so that the authenticity could be clearly established without degrading the aesthetics and utility of the stock . Experience with watermarks was so successful, that governments began to watermark their currencies, postage stamps, revenue stamps, etc. to thwart counterfeiting.
Since the use of the Internet has become widespread, the protection of intellectual property has become a major problem in the digital age. On the Internet today it is possible to duplicate digital information a million-fold and distribute it over the entire world in seconds. The ease of saving images off of the web has caused a very real problem for artists and content providers alike. If you have placed your intellectual property on the web chances are that sooner or later someone is going to ‘borrow’ a little bit of it ... without your permission.
These issues worry creators of intellectual property to the point that they even consider not publishing on the Internet at all. A software technology called digital watermarking can do the work of finding pirates on the Web. And down the road, the technology could be used to prevent copying of movies and music on digital videodisks, or DVDs. Some television networks currently use a visible video watermark, a small network logo that appears in the lower corner of the television screen.
To solve the problem of publishing digital images, researchers have come up with digital image watermarking. This method allows the owner of an original image to add an invisible watermark to the digital image before publishing it. The watermark serves to claim copyright on the image. The owner protects the watermark with a cryptographic secret key, inhibiting anybody that does not possess the secret key from reading or even detecting the watermark.The idea is to ensure the intellectual property rights of people who create digital media. Adding a digital watermark to an image identifies the owner and protects copyrights.
WHAT IS WATERMARKING
?
Watermarking is the process of embedding data called a watermark (also known as Digital Signature, Tag or Label) into a multimedia object such that watermark can be detected or extracted later to make an assertion about the object. It's simply adding invisible information into a picture so as to "sign" the picture for a more secure copyright protection. The object may be an audio, image or video. Ideally it is impossible to remove the embedded information (owner identification, customer information or other information about the multi-media data) without damaging the data.
A digital watermark is a digital signal or pattern inserted into a
digital image. Since this signal or
pattern is present in each unaltered copy of the original image, the digital
water-mark may also serve as a digital
signature for the copies. A given watermark may be unique to each copy (e.g.,
to identify the intended recipient), or be common to multiple copies (e.g., to
identify the document source). In either case, the water-marking of the
document involves the transformation of the original into another form.
With an automatic Web crawler you can surf
the web and try to search for your mark in order to find unauthorized users of
your pictures . These marking systems are supposed (by their sellers) to be
robust and stand up to any
"attacks" (attempt to remove the signature, by blurring or
cropping or any digital processing ).
This marking system is invisible to human eye .
Unlike
encryption, however, digital watermarking leaves the original image or (or
file) basically intact and recognizable. In addition, digital watermarks, as
signatures, may not be validated
without special software. Further, decrypted documents are free of any
residual effects of encryption, whereas
digital watermarks are designed to be persistent in viewing, printing, or
subsequent re-transmission .
You hold a sheet of expensive stationery up to the light to see it's watermark. But how would you hold up a digital photo ? And will the watermark spoil or distort your image. While the concept of the embedded watermark is the same, the similarity stops there. A digital watermark can not be seen on the image itself. The copyright information is embedded in the image, hiding in the naturally occurring variations throughout the image.
Summarizing, digital watermarking intends to
enable the proof of ownership on copyrighted material, detect the originator of
illegally made copies, monitor the usage of the copyrighted multimedia
data and analyze
the spread spectrum of
the data over
networks and servers.
A visible watermark is a visible translucent
image which is overlaid on the primary image. Perhaps consisting of the logo or
seal of the organization which holds the rights to the primary image, it allows
the primary image to be viewed, but still marks it clearly as the property of
the owning organization.
It is important to overlay the watermark in a
way which makes it difficult to remove, if the goal of indicating property
rights is to be achieved.
An example shows both a watermark and an image with the
watermark overlaid.
Original Image

Image with watermark

An invisible watermark is an overlaid image
which cannot be seen, but which can be detected
algorithmically. Different applications of this technology call for two very
different types of invisible watermarks:
A watermark which is destroyed when the image is
manipulated digitally in any way may be useful in proving authenticity of an
image. If the watermark is still intact, then the image has not been
"doctored." If the watermark has been destroyed, then the image has
been tampered with. Such a technology might be important, for example, in
admitting digital images as evidence in court.
An invisible watermark which is very resistant
to destruction under any image manipulation might be useful in verifying
ownership of an image suspected of misappropriation. Digital detection of the
watermark would indicate the source of the image.
Of the classification schemes which apply to
watermarks, the distinction between perceptible and imperceptible seems to be
the most fundamental. This distinction cuts across media types while at the
same time constraining the ultimate use of the watermark technology. For example,
visible watermarks, as perceptible watermarks, are used in much the same way as
their paper ancestors - to identify the source or ownership of a document.
Invisible watermarks, as imperceptible watermarks, on the other hand are kind
of like magic ink - they hide information in documents.
The type of watermark influences the
effectiveness of the watermark in various applications. For example, both
perceptible and imperceptible watermarks can deter theft, but they do so in
very different ways. Perceptible watermarks are especially useful for conveying
an immediate claim of ownership. The main advantage of perceptible watermarks,
in principle at least, is that they virtually eliminate the commercial value of
a document or media object without significantly lessening the document’s
utility for legitimate, authorized purposes. That is to say, the watermark in
the Figure makes it clear that the
document belongs to someone, but it does this without preventing the appreciation
of the artifact. A familiar example of a visible watermark is in the video
domain where CNN and other television networks place their translucent logo at
the bottom right of the screen image.
Imperceptible watermarks on the other hand are
only effective as a theft deterrent if the potential thief has a reason to
believe that watermarks might be present, and further that they may be used to
prosecute unauthorized possessions. If we assume that the majority of potential
thieves are at best computationally challenged, an imperceptible watermark would
be very ineffective. However, though weak in terms of discouraging theft,
imperceptible watermarks really shine as a means of identifying the source,
version or serial number, author, creator, owner, distributor or authorized
consumer of a document or image. For this purpose, the objective is to
permanently and unalterably mark the image so that the credit or assignment is
beyond dispute. In the event of illicit usage, the watermark would facilitate
the claim of ownership, the receipt of copyright revenues, or the success of
prosecution.
In some cases, both watermarking schemes are
equally effective. For example, both may be used to determine the location, and
trace image migration, of documents over the networks. Several computing
companies are developing the software to deploy watermark agents in network
patrols to detect infringement. It should be remembered that watermarking
software can assign a unique watermark to each document or object for each
authorized user or consumer.
In retrospect, perceptible and imperceptible watermarks act
as a deterrence to theft in different ways. Perceptible watermarks diminish the
commercial value of the document or image. Imperceptible watermarks increase
the likelihood of successful prosecution and may also act as a deterrent if the
criminal is sufficiently computer literate.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
What kind of Watermark do we need?
In order to be effective, a watermark should be:
Unobtrusive :
The watermark should be perceptually invisible, or its presence should not
interfere with the work being protected.
Robust :
The watermark must be difficult (hopefully impossible) to remove. If only
partial knowledge is available (for example, the exact location of the
watermark in an image is unknown) then attempts to remove or destroy a
watermark, should result in severe degradation in fidelity before the watermark
is lost. In particular, the watermark should be robust to :
1.Common
signal processing : The watermark should still be retrievable even if common
signal processing operations are applied to the data. These include,
digitaltoanalog and analogtodigital conversion, resampling, requantization
(including dithering and recompression), and common signal enhancements to
image contrast and color, or audio bass and treble, for example.
2.Common
geometric distortions (image and video data) : Watermarks in image and video
data should also be immune from geometric image operations such as rotation,
translation, cropping and scaling.
3.Subterfuge
Attacks: Collusion and Forgery In addition, the watermark should be robust to
collusion by multiple individuals who each possess a watermarked copy of the
data. That is, the watermark should be robust to combining copies of the same
data set to destroy the watermarks.
Further, if a digital watermark is to be used in
litigation, it must be impossible for colluders to combine their images to
generate a different valid watermark with the intention of framing a third
party.
Universal :
The same digital watermarking algorithm should apply to all three media under
consideration. This is potentially helpful in the watermarking of multimedia
products. Also, this feature is conducive to implementation of audio and
image/video watermarking algorithms on common hardware.
Unambiguous :
Retrieval of the watermark should unambiguously identify the owner.
Furthermore, the accuracy of owner identification should degrade gracefully in
the face of attack.
THE MECHANIZATION OF
IMAGERY
In the simplest and most common format, digital
images are represented by rectangular arrays of picture elements (pixels), each
of which may or may not be physically square. The common VGA display has 480
rows of 640 pixels each, for instance. The numbers representing a single pixel,
used to reconstruct its color and intensity, come in assorted flavors. The
simplest model for direct image representation is 8-bit monochrome. Each pixel
is one byte deep and can thus display 256 levels of brightness. A properly
scaled image will use this fairly narrow dynamic range to represent levels from
black to white. A poorly scaled image will be illegibly washed out in shades of
gray.
Normally, images only exist in space-wasting
direct forms while they are being edited and all their information must be
quickly at hand. When stored or transmitted, images are usually reduced in size
by conversion to a palletized form, by some sort of compression scheme, or
both. Some compression algorithms are so effective that they can lower storage
requirements to less than one bit per pixel. Unless these compression schemes
are "lossless," they will destroy any authentication information you
might embed into an image. Different forms of the sealing algorithm will have
to be developed for these cases.
Something is needed to verify that the received
image is exactly the same as the one sent, and as a side benefit to prove who
sent it. No effort needs to be made to hide or encrypt the image, as you are
simply trying to assure that the viewer see what he or she is meant to. (There
is a subtle philosophy operating here. When something has been encrypted, it
begs to be exposed. But a simple signature can, and most probably will, go
completely unnoticed.)
How can you ensure that an image remains unchanged?
The easiest way is to use a checksum scheme. Regardless of
the given pixel bit depth, you just add all of them up using an unsigned
integer-summing variable which equals or exceeds the bit depth of the image
pixels. The overflow bit is ignored. The result is a single integer that
changes if any single pixel is changed. The probability that any two images
will have the same checksum is related to the bit width of that sum. If you
only use an 8-bit integer, there is a 1:256 possibility that any two images
share the same one. Clearly, you don't want such a high probability that your
test will come out positive even if an image has been changed.
If you increase the checksum width to 24 bits, the
possibility recedes to a more secure 1:16,777,216; and for that really warm
feeling, you can use checksums of bit width nxpixel
depth simply by concatenating n
adjacent pixels to make larger integers.
Once you have extracted a checksum, it would be nice to
embed it into the image itself, so that no separate piece of information
exists. Historically, 8-bits-per-pixel for monochrome images was driven
primarily by the desire to map system memory to display hardware one byte at a
time, speeding memory access. Even 24-bit images are often stored as 32-bit
quantities to allow the use of longword transfers when editing or painting.
Many of the less-significant bits are, however, purely noise caused by the
imaging device. From a visual standpoint, in live "natural" images,
the noise at this bit level is entirely masked by the complexity of the scene.
Most visual information is carried in the top nibble, with the bottom nibble
going along for the ride. Consequently, you can disguise your checksum as
noise.
You can view the checksum as an array of bits N in length.
A uniformly distributed pseudorandom number generator is used to map these bits
onto a path of randomly selected pixel locations within the limits of the
image.
At each location, the least-significant bit (LSB) of the pixel
value is forced to match the value of the corresponding nth checksum bit as n
goes from 0 to N--1. The human eye cannot distinguish an LSB shift in intensity
on most commercial display systems that have eight or more bits of dynamic
range. Note that, on the average, 50 percent of the pixels will not be changed.
Since the LSBs at random-walk locations clearly cannot be allowed to contribute
to the checksum calculation, you will have to use a three-stage process.
Obtain an acceptable sealing string from the user and build
a random walk from it.
Go over the entire image to construct a checksum out of the
seven most-significant bits of each pixel.
Embed the checksum bits, one by one, into the pixels at the
random-walk addresses.
To check the authenticity of an image, you essentially
reverse the process on the receiving end: Generate the random walk based on a
user-entered seal, and extract the embedded checksum based on the bits hidden
in the LSBs of pixels along the walk. Then measure the checksum using the upper
seven bits of each pixel. If the two numbers match, the image has not been
tampered with.
How Secure is it Really?
In order to modify an image, yet maintain the seals intact,
an interloper would have to determine the location and order of all the hidden
checksum bits, the sequence with which the checksum was constructed, and
exactly how many seals are in the image. He would then be able to add the
incriminating evidence with his software program, and reseal the image.
Let's assume for a moment that somehow an intruder has
figured out which pixels contain the checksum. He may know the overall length
of the checksum because he has access to the source code used to seal the
image. Now, in what order are they? For n bits, there are n! possible combinations.
For a 64-bit checksum, this is a very large number. And each and every one of
these is just one possible seal, so there's no way of determining how many
seals have been applied. Even with magic knowledge of both the (coded) checksum
value and the exact random sequence of a seal, a very large number of seals
could still be embedded into the image. An appropriate scenario might have more
than one person in authority independently contribute a seal to a sensitive
image. Which ones were used?
The number of seals that can be embedded at one time in an
image is infinitesimally small compared to the set of all possible seals.
Remember that the pixel-address space used for embedding checksum bits must not
intersect itself.
You are therefore limited by the relationship where n is the maximum number of image seals; m is the pseudorandom modulus and
maximum bit width of a seal; Nr
and Nc are the height and
width of the image, respectively; and r is the percentage of the image you use
up for your chunks of seal space.
If m=128 bits, Nr=Nc=128, and you require at least 50 percent of the image
to be unmodified, then a maximum of only 64 seals can be embedded.
If the industrious interloper has managed to locate the 50
percent of the seal space that was visible to him (25 percent of the image),
perhaps by using his stolen copy of the unsealed image and subtracting to find
changed pixels, there are still 4096 bits floating around in 8192 possible
locations. Pure permutation tells you that you have 8192!/4096! potential
random sequences--an impossibly huge number.
An interloper cannot tell if an image has been sealed and
has no way of finding the unchanged LSBs, and cannot blanket your image with
all possible seals. For larger images, the job is just harder.
TECHNIQUES FOR
WATERMARKING
Generally, the watermarking scheme is distinguished in
three particular algorithms:
a) The Watermark Production algorithm (WPA)
b) The Watermark Embedding algorithm (WEA)
c) The Watermark Detection algorithm (WDA)
The security of the watermark hiding is based exclusively
on the privacy of the watermark key.
WPA
WPA is applied having as input the private key and the data
of the digital product. Each pair (key,
product) is transformed to a watermark digital signal. By passing the data
of the product in the algorithm, we manage to achieve the statistical
invisibility of the watermark. WPAs are based on pseudorandom number generators
The determination of a key that produces a predefined watermark signal is
impossible, i.e. watermark production is a non-invertible procedure preventing
an attacker to define counterfeit watermarks. WPA can be considered as a part
of the embedding and detection algorithms.
WEA
WEA requires as arguments the digital product and the
produced watermark. It embeds the watermark in a digital still image or in a
video frame by producing alterations in the luminance of the pixels. Since
video and audio signals require large amounts of memory, data are input in WEAs
and watermarked sequentially.
WDA
Detection is the most crucial part in the watermarking
framework. WDAs should be trustworthy producing insignificant false alarm
errors. Therefore, the result of the detector should be an indisputable
indication about the ownership of a digital product. WDA’s based on statistical
hypothesis testing. They provide the certainty,
which characterizes a possible decision that accepts the existence of a
particular watermark. Numerical experiments indicate that watermark detection
is enough reliable and suitable to prove copyright ownership. WDA does not
require the original digital product in order to proceed to the watermark
detection. This characteristic is a great advantage since it provides fast and
automatic implementation and ``web-crawling’’
For watermark to be robust, the watermark information must
be embedded in the target medium in such a way that removing this information
irreparably damages the medium. Robust image watermarking commonly takes on two
forms: spatial domain and frequency domain.
Spatial domain watermarking technique slightly modifies the
pixels in one of two randomly selected subsets of an image. Modification might
include, e.g., flipping the low-order bit of each pixel representation. Spatial
watermarking can also be applied using color separation such that the watermark
appears in only one of the color bands. This renders the watermark sufficiently
subtle that it can be for all intents and purposes imperceptible. However, the
watermark appears immediately when the colors are separated for printing. This
renders the document useless to the printer unless the watermark can be removed
from the color band. This approach is used commercially for journalists to
inspect digital pictures from a photo-stockhouse before buying un-watermarked
versions.
After receipt, an image may endure many common
transformations that are broadly categorized as geometric distortions or signal
distortions. Geometric distortions are specific to images and video, and
include such operations as rotation, translation, scaling and cropping. By
manually determining a minimum of four or nine corresponding points between the
original and the distorted watermark, it is possible to remove any two or
three-dimensional affined transformation. However, an affined scaling
(shrinking) of the image leads to a loss of data in the high frequency spectral
regions of the image. Cropping, or the cutting out and removal of portions of
an image, leads to irretrievable loss of image data, which may seriously
degrade any spatially based watermark.
However, a frequency-based scheme spreads the
watermark over the whole spatial extent of the image, and is therefore less
likely to be affected by cropping. Common signal distortions include
digitaltoanalog and analogtodigital conversion, resampling, requantization,
including dithering and recompression. However, the fact that the original
image is known allows many signal transformations to be undone, at least
approximately.
Spread Spectrum
Coding Of A Watermark
The above discussion illustrates that the
watermark should not be placed in perceptually insignificant regions of the
image (or its spectrum) since many common signal and geometric processes affect
these components. For example, a watermark placed in the high frequency
spectrum of an image can be easily eliminated with little degradation to the
image by any process that directly or indirectly performs low pass filtering.
The problem then becomes how to insert a watermark into the most perceptually
significant regions of the spectrum in a fidelity preserving fashion. Clearly,
any spectral coefficient may be altered, provided such modification is small.
However, very small changes are very susceptible to noise. To solve this
problem, the frequency domain of the image or sound at hand is viewed as a
communication channel, and correspondingly, the watermark is viewed as a signal
that is transmitted through it. Attacks and unintentional signal distortions
are thus treated as noise that the immersed signal must be immune to.
The watermark is spread over very many frequency bins so
that the energy in any one bin is very small and certainly undetectable.
Spreading the watermark throughout the spectrum of an image ensures a large
measure of security against unintentional or intentional attack: First, the
location of the watermark is not obvious. Furthermore, frequency regions should
be selected in a fashion that ensures severe degradation of the original data
following any attack on the watermark.
Upon applying a frequency transformation to the data, a
perceptual mask is computed that highlights perceptually significant regions in
the spectrum that can support the watermark without affecting perceptual
fidelity. The watermark signal is then inserted into these regions in a manner
described below. The owner only knows the precise magnitude of each
modification. By contrast, an attacker may only have knowledge of the possible
range of modification. To be confident of eliminating a watermark, an attacker
must assume that each modification was at the limit of this range, despite the
fact that few such modifications are typically this large. As a result, an
attack creates visible (or audible) defects in the data. Similarly,
unintentional signal distortions due to compression or image manipulation must
leave the perceptually significant spectral components intact, otherwise the
resulting image will be severely degraded. This is why the watermark must be
robust.
We now give a high level overview of a basic watermarking
scheme; many variations are possible. In its most basic implementation, a
watermark consists of a sequence of real numbers X = x 1 ; : : : ; xn . In
practice, we create a watermark where each value x i is chosen independently
according to N(0; 1) . We extract from each document D a sequence of values V =
v 1 ; : : : ; vn , into which we insert a watermark X = x 1 ; : : : ; xn to
obtain an adjusted sequence of values V 0 = v 0 1 ; : : : ; v 0 n . V 0 is then
inserted back into the document in place of V to obtain a watermarked document
D 0 . One or more attackers may then alter D 0 , producing a new document D’ .
Given D and D’, a possibly corrupted watermark X’ is extracted and is compared
to X for statistical significance. We extract X’ by first extracting a set of
values
V’ = v’1 ; : : : ;
v’n from D’ (using information about D) and then generating X’ from V’ and V .
INSERTING AND
EXTRACTING THE WATERMARK
When we insert X into V to obtain V 0 we specify a scaling
parameter ff which determines the extent to which X alters V . Three natural
formulae for computing V 0 are:
v 0 i = v i + ffx i
------(1)
v 0 i = v i (1 + ffx i ) ------(2)
v 0 i = v i (e ffx i )
------(3)
Equation 1 is always invertible, and Equations 2 and 3 are
invertible if
v i 6= 0, which
holds in all of our experiments. Given V’ we can therefore compute the inverse
function to derive X’ from V’ and V.
Equation 1 may not be appropriate when the v i values vary
widely.
If v i = 10 6 then adding 100 may be insufficient for
establishing a mark, but if v i = 10 adding 100 will distort this value
unacceptably. Insertion based on Equations 2 or 3 are more robust against such
differences in scale. We note that Equations 2 and 3 give similar results when
ffx i is small. Also, when
v i is positive
then Equation 3 is equivalent to lg (v 0 i ) = lg(v i ) + ffx i , and may be
viewed as an application of Equation 1 to the case where the logarithms of the
original values are used.
Fragile invisible watermark -
manipulating least significant bits (LSBs)
To hide information in
the least-significant bits of a source image:
- goal: hide image-B in image-A
- replace one LSB of image-A with the corresponding one most-significant-bit
(MSB) of image-B
- replace two LSBs of image-A with the corresponding two MSBs of image-B
- compare the results of the two manipulations with the original image-A
- how does the embedded data look?
- in general, replace k LSBs of image-A with the corresponding k MSBs of
image-B, and observe the results .
ATTACKS
The possible attacks against watermarks are wide and
varied.
Image modification attacks: These use image transformations such as those listed
above.
Bit-level attacks: If the attacker has access to a watermark presence
detector, the contents and location of the watermark can be derived. This also
makes it much easier to remove a watermark.
Watermark-insertion attacks: If the attacker has access to a watermark insertion
device, and the watermarking process is not a one-way function, it is possible
to recover the original, unwatermarked image, by pre-distorting the copy, and
rewatermarking it.
Statistical averaging attacks: The attacker uses multiple watermarked images to estimate the watermark, and then subtracts this from the image. This is especially a problem with video.
Scrambling attacks: By inserting a scrambler before the watermark detector, and a de-scrambler after it, detection of the watermarking can be avoided.
Resilience To
Multiple Document (Collusion) Attacks
The most general attack consists of using ’n’ multiple watermarked copies D01 ; : : : ; D0 n of document D to produce an unwatermarked document D’.
A coding scheme for use in situations in which one can insert many relatively weak 0/1 watermarks into a document. They assume that if the ith watermark is the same for all n copies of the document then it cannot be detected, changed or removed. Using their coding scheme the number of weak watermarks to be inserted scales according to n^4, which may limit its usefulness in practice.
To illustrate the power of multiple document attacks, consider watermarking schemes in which V’i is generated by either adding 1 or -1 at random to Vi. Then as soon as one finds two documents with unequal values for Vi, one can determine Vi, hence completely eliminates this component of watermark. With n documents, one can on average eliminate all but a 2^(n-1) fraction of the components of the watermark. Note that this attack does not assume anything about the distribution on Vi. While a more intelligent allocation of a +-1 values to the watermarks will better resist this simple attack, the discrete nature of the watermark components makes them much easier to completely eliminate. Our use of continuous valued watermarks appears to give greater resilience to such attacks. Interestingly, we have experimentally determined that if one chooses the Xi uniformly over some range, then one can remove the watermark using only 5 documents. Use of the normal distribution seems to give better performance than the distributions considered above. We note that the crucial performance measure to consider is the value of max i (X’ * X i), where X’ is the Watermark extracted from an document D’ generated by attacking documents D 1 ; : : : ; D n with respective watermarks X 1 ; : : : ; X n .
The denominator sqrt (X’ * X’) of our similarity measure can always be made larger by, for example, adding noise. This causes the similarity measure to shrink, at the expense of distorting the image. Hence, we can view max i (X’ * Xi) as determining a fidelity/undetectability tradeoff curve and the value of sqrt(X’ * X’) as picking a point on this curve. When X i is inserted into D by a linear update rule, then an averaging attack, which sets D’ = (D 1 + …… +D n) / n will result in
X’= (X 1 + …… +X n) / n.
In this case, max i (X’ * X i) ~ 1/ n [max i (X i * X i)] (assuming i X j ~0).
That is, there is a 1/n behavior in the detector output.
As we can see from this list, even with a perfect watermarking method, there are various system-level attacks that can frustrate the secure use of watermarks in a copyright scheme.
Prioritizing Attack Resistance
As making a watermark resistant to a large number of image transformation attacks is a difficult task, it is important to prioritize these transforms. A watermark should aim to be most resistant to the most common attacks; as we have pointed out, complete watermark security is an unachievable goal; it is better to target (say) 90% watermark recovery. It is tempting to concentrate on the various complex attacks that malicious attackers might employ. However, this is pointless if the method is not properly resistant to image edits made by either the original owner or valid users of the image. Thus a watermarking method must handle:
Scaling, especially (filtered) down sampling , Simple brightness, contrast, or gamma adjustment & Border cropping.

SPECIFICATIONS/LIMITATIONS
Image dimensions:
X and Y dimensions have to be in
powers of 2.
Image depth:
It should currently handle 24 bit
grayscale and color images.
File formats:
It should read PPM(refer to appendix)
24 bit image files.
Features: The code has been designed for experimentation. It's very modular and
should allow for simple replacements of individual components. One can easily
replace the the wavelet filters, watermark inserter and extractor.
The target of the project is security of digital images, which can be achieved by addition of watermark to the digital image. The watermarked image is then distributed .The watermark is extracted from those images which have been intentionally or unintentionally attacked, to prove ownership and for the purpose of authenticity.
The project has following modules :
![]()

![]()
![]()
![]()
![]()



The first module reads the pixel values of the image.
The second module adds the invisible watermark to the image.
The third module extracts the watermark from the test image.
Finally, the last module correlates the original watermark with the extracted one to check for attacks.
SOFTWARE AND
HARDWARE REQUIREMENTS
Hardware Environment :
PC –386/486/Pentium & above
Linux/Unix Operating System
Software Environment :
C++ Compiler for Unix
Other Tools :
Office 2000
Paint Shop Pro 5.0

FUNDAMENTAL
CONCEPTS
AND
AN
OVERVIEW OF THE WAVELET THEORY

First of all, why do we need a transform, or what is a transform anyway?
Mathematical transformations are applied to signals to obtain a further information from that signal that is not readily available in the raw signal. In the following tutorial I will assume a time-domain signal as a raw signal, and a signal that has been "transformed" by any of the available mathematical transformations as a processed signal.
There are number of transformations that can be applied, among which the Fourier transforms are probably by far the most popular.
Most of the signals in practice, are TIME-DOMAIN signals in their raw format. That is, whatever that signal is measuring, is a function of time. In other words, when we plot the signal one of the axes is time (independent variable), and the other (dependent variable) is usually the amplitude. When we plot time-domain signals, we obtain a time-amplitude representation of the signal. This representation is not always the best representation of the signal for most signal processing related applications. In many cases, the most distinguished information is hidden in the frequency content of the signal. The frequency SPECTRUM of a signal is basically the frequency components (spectral components) of that signal. The frequency spectrum of a signal shows what frequencies exist in the signal.
Intuitively, we all know that the frequency is something to do with the change in rate of something. If something ( a mathematical or physical variable, would be the technically correct term) changes rapidly, we say that it is of high frequency, where as if this variable does not change rapidly, i.e., it changes smoothly, we say that it is of low frequency. If this variable does not change at all, then we say it has zero frequency, or no frequency. For example the publication frequency of a daily newspaper is higher than that of a monthly magazine (it is published more frequently).
The frequency is measured in cycles/second, or with a more common name, in "Hertz". For example the electric power we use in our daily life in the US is 60 Hz (50 Hz elsewhere in the world). This means that if you try to plot the electric current, it will be a sine wave passing through the same point 50 times in 1 second. Now, look at the following figures. The first one is a sine wave at 3 Hz, the second one at 10 Hz, and the third one at 50 Hz. Compare them.

So how do we measure frequency, or how do we find the frequency content of a signal? The answer is FOURIER TRANSFORM (FT). If the FT of a signal in time domain is taken, the frequency-amplitude representation of that signal is obtained. In other words, we now have a plot with one axis being the frequency and the other being the amplitude. This plot tells us how much of each frequency exists in our signal.
The frequency axis starts from zero, and goes up to infinity. For every frequency, we have an amplitude value. For example, if we take the FT of the electric current that we use in our houses, we will have one spike at 50 Hz, and nothing elsewhere, since that signal has only 50 Hz frequency component. No other signal, however, has a FT which is this simple. For most practical purposes, signals contain more than one frequency component. The following shows the FT of the 50 Hz signal:

Figure 1.4 The FT of the 50 Hz signal given in Figure 1.3
One word of caution is in order at this point. Note that two plots are given in Figure 1.4. The bottom one plots only the first half of the top one. Due to reasons that are not crucial to know at this time, the frequency spectrum of a real valued signal is always symmetric. The top plot illustrates this point. However, since the symmetric part is exactly a mirror image of the first part, it provides no additional information, and therefore, this symmetric second part is usually not shown. In most of the following figures corresponding to FT, I will only show the first half of this symmetric spectrum.
Why do we need the frequency information?
Often times, the information that cannot be readily seen in the time-domain can be seen in the frequency domain.
Let's give an example from biological signals. Suppose we are looking at an ECG signal (ElectroCardioGraphy, graphical recording of heart's electrical activity). The typical shape of a healthy ECG signal is well known to cardiologists. Any significant deviation from that shape is usually considered to be a symptom of a pathological condition.
This pathological condition, however, may not always be quite obvious in the original time-domain signal. Cardiologists usually use the time-domain ECG signals which are recorded on strip-charts to analyze ECG signals. Recently, the new computerized ECG recorders/analyzers also utilize the frequency information to decide whether a pathological condition exists. A pathological condition can sometimes be diagnosed more easily when the frequency content of the signal is analyzed.
This, of course, is only one simple example why frequency content might be useful. Today Fourier transforms are used in many different areas including all branches of engineering.
Although FT is probably the most popular transform being used (especially in electrical engineering), it is not the only one. There are many other transforms that are used quite often by engineers and mathematicians. Hilbert transform, short-time Fourier transform (more about this later), Wigner distributions, the Radon Transform, and of course our featured transformation , the wavelet transform, constitute only a small portion of a huge list of transforms that are available at engineer's and mathematician's disposal. Every transformation technique has its own area of application, with advantages and disadvantages, and the wavelet transform (WT) is no exception.
For a better understanding of the need for the WT let's look at the FT more closely. FT (as well as WT) is a reversible transform, that is, it allows to go back and forward between the raw and processed (transformed) signals. However, only either of them is available at any given time. That is, no frequency information is available in the time-domain signal, and no time information is available in the Fourier transformed signal. The natural question that comes to mind is that is it necessary to have both the time and the frequency information at the same time?
As we will see soon, the answer depends on the particular application, and the nature of the signal in hand. Recall that the FT gives the frequency information of the signal, which means that it tells us how much of each frequency exists in the signal, but it does not tell us when in time these frequency components exist. This information is not required when the signal is so-called stationary .
Let's take a closer look at this stationarity concept more closely, since it is of paramount importance in signal analysis. Signals whose frequency content do not change in time are called stationary signals . In other words, the frequency content of stationary signals do not change in time. In this case, one does not need to know at what times frequency components exist , since all frequency components exist at all times !!! .
For example the following signal
x(t)=cos(2*pi*10*t)+cos(2*pi*25*t)+cos(2*pi*50*t)+cos(2*pi*100*t)
is a stationary signal, because it has frequencies of 10, 25, 50, and 100 Hz at any given time instant. This signal is plotted below:

Figure 1.5
And the following is its FT:

Figure 1.6
The top plot in Figure 1.6 is the (half of the symmetric) frequency spectrum of the signal in Figure 1.5. The bottom plot is the zoomed version of the top plot, showing only the range of frequencies that are of interest to us. Note the four spectral components corresponding to the frequencies 10, 25, 50 and 100 Hz.
Contrary to the signal in Figure 1.5, the following signal is not stationary. Figure 1.7 plots a signal whose frequency constantly changes in time. This signal is known as the "chirp" signal. This is a non-stationary signal.

Figure 1.7
Let's look at another example. Figure 1.8 plots a signal with four different frequency components at four different time intervals, hence a non-stationary signal. The interval 0 to 300 ms has a 100 Hz sinusoid, the interval 300 to 600 ms has a 50 Hz sinusoid, the interval 600 to 800 ms has a 25 Hz sinusoid, and finally the interval 800 to 1000 ms has a 10 Hz sinusoid.

Figure 1.8
And the following is its FT:

Figure 1.9
Do not worry about the little ripples at this time; they are due to sudden changes from one frequency component to another, which have no significance in this text. Note that the amplitudes of higher frequency components are higher than those of the lower frequency ones. This is due to fact that higher frequencies last longer (300 ms each) than the lower frequency components (200 ms each). (The exact value of the amplitudes are not important).
Other than those ripples, everything seems to be right. The FT has four peaks, corresponding to four frequencies with reasonable amplitudes... Right
WRONG (!)
Well, not
exactly wrong, but not exactly right either...
Here is why:
For the first signal, plotted in Figure 1.5, consider the following question:
At what times (or time intervals), do these frequency components occur?
Answer:
At all times! Remember that in stationary signals, all frequency components that exist in the signal, exist throughout the entire duration of the signal. There is 10 Hz at all times, there is 50 Hz at all times, and there is 100 Hz at all times.
Now, consider the same question for the non-stationary signal in Figure 1.7 or in Figure 1.8.
At what times these frequency components occur?
For the signal in Figure 1.8, we know that in the first interval we have the highest frequency component, and in the last interval we have the lowest frequency component. For the signal in Figure 1.7, the frequency components change continuously. Therefore, for these signals the frequency components do not appear at all times!
Now, compare the Figures 1.6 and 1.9. The similarity between these two spectrum should be apparent. Both of them show four spectral components at exactly the same frequencies, i.e., at 10, 25, 50, and 100 Hz. Other than the ripples, and the difference in amplitude (which can always be normalized), the two spectrums are almost identical, although the corresponding time-domain signals are not even close to each other. Both of the signals involves the same frequency components, but the first one has these frequencies at all times, the second one has these frequencies at different intervals. So, how come the spectrums of two entirely different signals look very much alike? Recall that the FT gives the spectral content of the signal, but it gives no information regarding where in time those spectral components appear . Therefore, FT is not a suitable technique for non-stationary signal, with one exception:
FT can be used for non-stationary signals, if we are only interested in what spectral components exist in the signal, but not interested where these occur. However, if this information is needed, i.e., if we want to know, what spectral component occur at what time (interval) , then Fourier transform is not the right transform to use.
For practical purposes it is difficult to make the separation, since there are a lot of practical stationary signals, as well as non-stationary ones. Almost all biological signals, for example, are non-stationary. Some of the most famous ones are ECG (electrical activity of the heart , electrocardiograph), EEG (electrical activity of the brain, electroencephalograph), and EMG (electrical activity of the muscles, electromyogram).
Once again please note that, the FT gives what frequency components (spectral components) exist in the signal. Nothing more, nothing less.
When the time localization of the spectral components are needed, a transform giving the TIME-FREQUENCY REPRESENTATION of the signal is needed.
THE ULTIMATE SOLUTION:
THE WAVELET TRANSFORM
The Wavelet transform is a transform of this type. It provides the time-frequency representation. (There are other transforms which give this information too, such as short time Fourier transform, Wigner distributions, etc.)
Often times a particular spectral component occurring at any instant can be of particular interest. In these cases it may be very beneficial to know the time intervals these particular spectral components occur. For example, in EEGs, the latency of an event-related potential is of particular interest (Event-related potential is the response of the brain to a specific stimulus like flash-light, the latency of this response is the amount of time elapsed between the onset of the stimulus and the response).
Wavelet transform is capable of providing the time and frequency information simultaneously, hence giving a time-frequency representation of the signal.
How wavelet transform works is completely a different fun story, and should be explained after short time Fourier Transform (STFT) . The WT was developed as an alternative to the STFT. The STFT will be explained in great detail in the second part of this tutorial. It suffices at this time to say that the WT was developed to overcome some resolution related problems of the STFT, as explained later.
To make a real long story short, we pass the time-domain signal from various highpass and low pass filters, which filters out either high frequency or low frequency portions of the signal. This procedure is repeated, every time some portion of the signal corresponding to some frequencies being removed from the signal.
Here is how this works: Suppose we have a signal which has frequencies up to 1000 Hz. In the first stage we split up the signal in to two parts by passing the signal from a highpass and a lowpass filter (filters should satisfy some certain conditions, so-called admissibility condition) which results in two different versions of the same signal: portion of the signal corresponding to 0-500 Hz (low pass portion), and 500-1000 Hz (high pass portion).
Then, we take either portion (usually low pass portion) or both, and do the same thing again. This operation is called decomposition .
Assuming that we have taken the lowpass portion, we now have 3 sets of data, each corresponding to the same signal at frequencies 0-250 Hz, 250-500 Hz, 500-1000 Hz.
Then we take the lowpass portion again and pass it through low and high pass filters; we now have 4 sets of signals corresponding to 0-125 Hz, 125-250 Hz,250-500 Hz, and 500-1000 Hz. We continue like this until we have decomposed the signal to a pre-defined certain level. Then we have a bunch of signals, which actually represent the same signal, but all corresponding to different frequency bands. We know which signal corresponds to which frequency band, and if we put all of them together and plot them on a 3-D graph, we will have time in one axis, frequency in the second and amplitude in the third axis. This will show us which frequencies exist at which time ( there is an issue, called "uncertainty principle", which states that, we cannot exactly know what frequency exists at what time instance , but we can only know what frequency bands exist at what time intervals , more about this in the subsequent parts of this tutorial).
However, I still would like to explain it briefly:
The uncertainty principle, originally found and formulated by Heisenberg, states that, the momentum and the position of a moving particle cannot be known simultaneously. This applies to our subject as follows:
The frequency and time information of a signal at some certain point in the time-frequency plane cannot be known. In other words: We cannot know what spectral component exists at any given time instant. The best we can do is to investigate what spectral components exist at any given interval of time. This is a problem of resolution, and it is the main reason why researchers have switched to WT from STFT. STFT gives a fixed resolution at all times, whereas WT gives a variable resolution as follows:
Higher
frequencies are better resolved in time, and lower frequencies are better
resolved in frequency. This means that, a certain high frequency component can
be located better in time (with less relative error) than a low frequency
component. On the contrary, a low frequency component can be located better in
frequency compared to high frequency component.
Take a look at the following grid:
f ^
|******************************************* continuous
|* * * * * * * * * * * * * * * wavelet transform
|* * * * * * *
|* * * *
|* *
--------------------------------------------> time
Interpret the above grid as
follows: The top row shows that at
higher frequencies we have more
samples corresponding to smaller
intervals of time. In other
words, higher frequencies can be resolved
better in time. The bottom row
however, corresponds to low
frequencies, and there are less
number of points to characterize the
signal, therefore, low
frequencies are not resolved well in time.
^frequency
|
|
|
| *******************************************************
|
|
|
| * * * * * * * * * * * * * * * * * * * discrete time
| wavelet tansform
| * * * * * * * * * *
|
| * * * * *
| * * *
|----------------------------------------------------------> time
In discrete time case, the time
resolution of the signal works the same
as above, but now, the frequency
information has different resolutions
at every stage too. Note that,
lower frequencies are better resolved in
frequency, where as higher
frequencies are not. Note how the spacing
between subsequent frequency
components increase as frequency increases.
Below , are some examples of
continuous wavelet transform:
Let's take a sinusoidal signal,
which has two different frequency components at
two different times:
Note the low frequency portion
first, and then the high frequency.

Figure 1.10
The continuous wavelet transform
of the above signal:

Figure 1.11
Note however, the frequency axis
in these plots are labeled as
scale . The concept of the scale
will be made more clear in the subsequent
sections, but it should be noted
at this time that the scale is inverse
of frequency. That is, high
scales correspond to low frequencies, and
low scales correspond to high
frequencies. Consequently, the little
peak in the plot corresponds to
the high frequency components in the
signal, and the large peak
corresponds to low frequency components
(which appear before the high
frequency components in time) in the
signal.
You might be puzzled from the
frequency resolution shown in the plot,
since it shows good frequency
resolution at high frequencies. Note
however that, it is the good scale resolution that looks good
at high frequencies (low scales),
and good scale resolution means poor
frequency resolution and vice
versa.

Let's have a
short review of the fundamentals.
We basically need Wavelet Transform (WT) to analyze non-stationary signals,
i.e., whose frequency response varies in time. I have written that Fourier
Transform (FT) is not suitable for non-stationary signals, and I have shown
examples of it to make it more clear. For a quick recall, let me give the
following example.
Suppose we have two different signals. Also suppose that they both have the same spectral components, with one major difference. Say one of the signals have four frequency components at all times, and the other have the same four frequency components at different times. The FT of both of the signals would be the same, as shown in the example in part 1 of this tutorial. Although the two signals are completely different, their (magnitude of) FT are the SAME !. This, obviously tells us that we can not use the FT for non-stationary signals.
But why does this happen? In other words, how come both of the signals have the same FT? HOW DOES FOURIER TRANSFORM WORK ANYWAY?
An Important Milestone in Signal Processing:
THE FOURIER TRANSFORM
I will not go
into the details of FT for two reasons:
1. It is too wide of a subject to discuss in this tutorial.
2. It is not our main concern anyway.
However, I would like to mention a couple important points again for two
reasons:
1. It is a necessary background to understand how WT works.
2. It has been by far the most important signal processing tool for many (and I
mean many many) years.
In 19th century, the French mathematician J. Fourier, showed that any periodic function can be expressed as an infinite sum of periodic complex exponential functions. Many years after he had discovered this remarkable property of (periodic) functions, his ideas were generalized to first non-periodic functions, and then periodic or non-periodic discrete time signals. It is after this generalization that it became a very suitable tool for computer calculations. In 1965, a new algorithm called fast Fourier Transform (FFT) was developed and FT became even more popular.
Now let us take
a look at how Fourier transform works:
FT decomposes a signal to complex exponential functions of different
frequencies. The way it does this, is defined by the following two equations:

Figure 2.1
In the above equation, t stands for time, f stands for frequency, and x denotes the signal at hand. Note that x denotes the signal in time domain and the X denotes the signal in frequency domain. This convention is used to distinguish the two representations of the signal. Equation (1) is called the Fourier transform of x(t), and equation (2) is called the inverse Fourier transform of X(f), which is x(t).
For those of you who have been using the Fourier transform are already familiar with this. Unfortunately many people use these equations without knowing the underlying principle.
Please take a closer look at equation (1):
The signal x(t), is multiplied with an exponential term, at some certain frequency "f" , and then integrated over ALL TIMES !!! (The key words here are "all times" , as will explained below).
Note that the exponential term in Eqn. (1) can also be written as:
Cos(2.pi.f.t)+j.Sin(2.pi.f.t).......(3)
The above expression has a real part of cosine of frequency f, and an imaginary part of sine of frequency f. So what we are actually doing is, multiplying the original signal with a complex expression which has sines and cosines of frequency f. Then we integrate this product. In other words, we add all the points in this product. If the result of this integration (which is nothing but some sort of infinite summation) is a large value, then we say that : the signal x(t), has a dominant spectral component at frequency "f". This means that, a major portion of this signal is composed of frequency f. If the integration result is a small value, than this means that the signal does not have a major frequency component of f in it. If this integration result is zero, then the signal does not contain the frequency "f" at all.
It is of particular interest here to see how this integration works: The signal is multiplied with the sinusoidal term of frequency "f". If the signal has a high amplitude component of frequency "f", then that component and the sinusoidal term will coincide, and the product of them will give a (relatively) large value. This shows that, the signal "x", has a major frequency component of "f".
However, if the signal does not have a frequency component of "f", the product will yield zero, which shows that, the signal does not have a frequency component of "f". If the frequency "f", is not a major component of the signal "x(t)", then the product will give a (relatively) small value. This shows that, the frequency component "f" in the signal "x", has a small amplitude, in other words, it is not a major component of "x".
Now, note that the integration in the transformation equation (Eqn. 1) is over time. The left hand side of (1), however, is a function of frequency. Therefore, the integral in (1), is calculated for every value of f.
IMPORTANT(!) The information provided by the integral, corresponds to all time instances, since the integration is from minus infinity to plus infinity over time. It follows that no matter where in time the component with frequency "f" appears, it will affect the result of the integration equally as well. In other words, whether the frequency component "f" appears at time t1 or t2 , it will have the same effect on the integration. This is why Fourier transform is not suitable if the signal has time varying frequency, i.e., the signal is non-stationary. If only, the signal has the frequency component "f" at all times (for all "f" values), then the result obtained by the Fourier transform makes sense.
Note that the Fourier transform tells whether a certain frequency component exists or not. This information is independent of where in time this component appears. It is therefore very important to know whether a signal is stationary or not, prior to processing it with the FT.
The example given in part one should now be clear. I would like to give it here again:
Look at the following figure, which shows the signal:
x(t)=cos(2*pi*5*t)+cos(2*pi*10*t)+cos(2*pi*20*t)+cos(2*pi*50*t)
that is , it has four frequency components of 5, 10, 20, and 50 Hz., all occurring at all times.

Figure 2.2
And here is the
FT of it. The frequency axis has been cut here, but theoretically it extends to
infinity (for continuous Fourier transform (CFT). Actually, here we calculate
the discrete Fourier transform (DFT), in which case the frequency axis goes up
to (at least) twice the sampling frequency of the signal, and the transformed
signal is symmetrical. However, this is not that important at this time.) 
Figure 2.3
Note the four peaks in the above figure, which correspond to four different frequencies.
Now, look at the following figure: Here the signal is again the cosine signal, and it has the same four frequencies. However, these components occur at different times.

Figure 2.4
And here is the Fourier transform of this signal:

Figure 2.5
What you are supposed to see in the above figure, is it is (almost) same with the previous FT figure. Please look carefully and note the major four peaks corresponding to 5, 10, 20, and 50 Hz. I could have made this figure look very similar to the previous one, but I did not do that on purpose. The reason of the noise like thing in between peaks show that, those frequencies also exist in the signal. But the reason they have a small amplitude , is because, they are not major spectral components of the given signal, and the reason we see those, is because of the sudden change between the frequencies. Especially note how time domain signal changes at around time 250 (ms) (With some suitable filtering techniques, the noise like part of the frequency domain signal can be cleaned, but this has not nothing to do with our subject now. If you need further information please send me an e-mail).
By this time you should have understood the basic concepts of Fourier transform, when we can use it and we can not. As you can see from the above example, FT cannot distinguish the two signals very well. To FT, both signals are the same, because they constitute of the same frequency components. Therefore, FT is not a suitable tool for analyzing non-stationary signals, i.e., signals with time varying spectra.
Please keep this very important property in mind. Unfortunately, many people using the FT do not think of this. They assume that the signal they have is stationary where it is not in many practical cases. Of course if you are not interested in at what times these frequency components occur, but only interested in what frequency components exist, then FT can be a suitable tool to use.
So, now that we know that we can not use (well, we can, but we shouldn't) FT for non-stationary signals, what are we going to do?
Remember that, I have mentioned that wavelet transform is only (about) a decade old. You may wonder if researchers noticed this non-stationarity business only ten years ago or not.
Obviously not.
Apparently they
must have done something about it before they figured out the wavelet
transform....?
Well..., they sure did...
They have come up with ...
THE SHORT TERM FOURIER TRANSFORM
So, how are we going to insert this time business into our frequency plots? Let's look at the problem in hand little more closer.
What was wrong with FT? It did not work for non-stationary signals. Let's think this: Can we assume that , some portion of a non-stationary signal is stationary?
The answer is yes.
Just look at the third figure above. The signal is stationary every 250 time unit intervals.
You may ask the following question?
What if the part that we can consider to be stationary is very small?
Well, if it is too small, it is too small. There is nothing we can do about that, and actually, there is nothing wrong with that either. We have to play this game with the physicists' rules.
If this region where the signal can be assumed to be stationary is too small, then we look at that signal from narrow windows, narrow enough that the portion of the signal seen from these windows are indeed stationary.
This approach of researchers ended up with a revised version of the Fourier transform, so-called : The Short Time Fourier Transform (STFT) .
There is only a minor difference between STFT and FT. In STFT, the signal is divided into small enough segments, where these segments (portions) of the signal can be assumed to be stationary. For this purpose, a window function "w" is chosen. The width of this window must be equal to the segment of the signal where its stationarity is valid.
This window function is first located to the very beginning of the signal. That is, the window function is located at t=0. Let's suppose that the width of the window is "T" s. At this time instant (t=0), the window function will overlap with the first T/2 seconds (I will assume that all time units are in seconds). The window function and the signal are then multiplied. By doing this, only the first T/2 seconds of the signal is being chosen, with the appropriate weighting of the window (if the window is a rectangle, with amplitude "1", then the product will be equal to the signal). Then this product is assumed to be just another signal, whose FT is to be taken. In other words, FT of this product is taken, just as taking the FT of any signal.
The result of this transformation is the FT of the first T/2 seconds of the signal. If this portion of the signal is stationary, as it is assumed, then there will be no problem and the obtained result will be a true frequency representation of the first T/2 seconds of the signal.
The next step, would be shifting this window (for some t1 seconds) to a new location, multiplying with the signal, and taking the FT of the product. This procedure is followed, until the end of the signal is reached by shifting the window with "t1" seconds intervals.
The following definition of the STFT summarizes all the above explanations in one line:

Figure 2.6
Please look at the above equation carefully. x(t) is the signal itself, w(t) is the window function, and * is the complex conjugate. As you can see from the equation, the STFT of the signal is nothing but the FT of the signal multiplied by a window function.
For every t' and f a new STFT coefficient is computed (Correction: The "t" in the parenthesis of STFT should be "t'". I will correct this soon. I have just noticed that I have mistyped it).
The following figure may help you to understand this a little better:

Figure 2.7
The Gaussian-like functions in color are the windowing functions. The red one shows the window located at t=t1', the blue shows t=t2', and the green one shows the window located at t=t3'. These will correspond to three different FTs at three different times. Therefore, we will obtain a true time-frequency representation (TFR) of the signal.
Probably the best way of understanding this would be looking at an example. First of all, since our transform is a function of both time and frequency (unlike FT, which is a function of frequency only), the transform would be two dimensional (three, if you count the amplitude too). Let's take a non-stationary signal, such as the following one:

Figure 2.8
In this signal, there are four frequency components at different times. The interval 0 to 250 ms is a simple sinusoid of 300 Hz, and the other 250 ms intervals are sinusoids of 200 Hz, 100 Hz, and 50 Hz, respectively. Apparently, this is a non-stationary signal. Now, let's look at its STFT:

Figure 2.9
As expected, this is two dimensional plot (3 dimensional, if you count the amplitude too). The "x" and "y" axes are time and frequency, respectively. Please, ignore the numbers on the axes, since they are normalized in some respect, which is not of any interest to us at this time. Just examine the shape of the time-frequency representation.
First of all, note that the graph is symmetric with respect to midline of the frequency axis. Remember that, although it was not shown, FT of a real signal is always symmetric, since STFT is nothing but a windowed version of the FT, it should come as no surprise that STFT is also symmetric in frequency. The symmetric part is said to be associated with negative frequencies, an odd concept which is difficult to comprehend, fortunately, it is not important; it suffices to know that STFT and FT are symmetric.
What is important, are the four peaks; note that there are four peaks corresponding to four different frequency components. Also note that, unlike FT, these four peaks are located at different time intervals along the time axis . Remember that the original signal had four spectral components located at different times.
Now we have a true time-frequency representation of the signal. We not only know what frequency components are present in the signal, but we also know where they are located in time.
It is grrrreeeaaatttttt!!!! Right?
Well, not really!
You may wonder, since STFT gives the TFR of the signal, why do we need the wavelet transform. The implicit problem of the STFT is not obvious in the above example. Of course, an example that would work nicely was chosen on purpose to demonstrate the concept.
The problem with STFT is the fact whose roots go back to what is known as the Heisenberg Uncertainty Principle . This principle originally applied to the momentum and location of moving particles, can be applied to time-frequency information of a signal. Simply, this principle states that one cannot know the exact time-frequency representation of a signal, i.e., one cannot know what spectral components exist at what instances of times. What one can know are the time intervals in which certain band of frequencies exist, which is a resolution problem.
The problem with the STFT has something to do with the width of the window function that is used. To be technically correct, this width of the window function is known as the support of the window. If the window function is narrow, than it is known as compactly supported . This terminology is more often used in the wavelet world, as we will see later.
Here is what happens:
Recall that in the FT there is no resolution problem in the frequency domain, i.e., we know exactly what frequencies exist; similarly we there is no time resolution problem in the time domain, since we know the value of the signal at every instant of time. Conversely, the time resolution in the FT, and the frequency resolution in the time domain are zero, since we have no information about them. What gives the perfect frequency resolution in the FT is the fact that the window used in the FT is its kernel, the exp{jwt} function, which lasts at all times from minus infinity to plus infinity. Now, in STFT, our window is of finite length, thus it covers only a portion of the signal, which causes the frequency resolution to get poorer. What I mean by getting poorer is that, we no longer know the exact frequency components that exist in the signal, but we only know a band of frequencies that exist:
In FT, the kernel function, allows us to obtain perfect frequency resolution, because the kernel itself is a window of infinite length. In STFT is window is of finite length, and we no longer have perfect frequency resolution. You may ask, why don't we make the length of the window in the STFT infinite, just like as it is in the FT, to get perfect frequency resolution? Well, than you loose all the time information, you basically end up with the FT instead of STFT. To make a long story real short, we are faced with the following dilemma:
If we use a window of infinite length, we get the FT, which gives perfect frequency resolution, but no time information. Furthermore, in order to obtain the stationarity, we have to have a short enough window, in which the signal is stationary. The narrower we make the window, the better the time resolution, and better the assumption of stationarity, but poorer the frequency resolution:
Narrow window ===>good
time resolution, poor frequency resolution.
Wide window ===>good frequency resolution, poor time resolution.
In order to see these effects, let's look at a couple examples: I will show four windows of different length, and we will use these to compute the STFT, and see what happens:
The window function we use is simply a Gaussian function in the form:
w(t)=exp(-a*(t^2)/2);
where a determines the length of the window, and t is the time. The following figure shows four window functions of varying regions of support, determined by the value of a . Please disregard the numeric values of a since the time interval where this function is computed also determines the function. Just note the length of each window. The above example given was computed with the second value, a=0.001 . I will now show the STFT of the same signal given above computed with the other windows.

Figure 2.10
First let's look at the first most narrow window. We expect the STFT to have a very good time resolution, but relatively poor frequency resolution:

Figure 2.11
The above figure shows this STFT. The figure is shown from a top bird-eye view with an angle for better interpretation. Note that the four peaks are well separated from each other in time. Also note that, in frequency domain, every peak covers a range of frequencies, instead of a single frequency value. Now let's make the window wider, and look at the third window (the second one was already shown in the first example).

Figure 2.12
Note that the peaks are not well separated from each other in time,
unlike the previous case, however, in frequency domain the resolution is much
better. Now let's further increase the width of the window, and see what
happens:

Figure 2.13
Well, this should be of no surprise to anyone now, since we would expect a terrible (and I mean absolutely terrible) time resolution.
These examples should have illustrated the implicit problem of resolution of the STFT. Anyone who would like to use STFT is faced with this problem of resolution. What kind of a window to use? Narrow windows give good time resolution, but poor frequency resolution. Wide windows give good frequency resolution, but poor time resolution; furthermore, wide windows may violate the condition of stationarity. The problem, of course, is a result of choosing a window function, once and for all, and use that window in the entire analysis. The answer, of course, is application dependent: If the frequency components are well separated from each other in the original signal, than we may sacrifice some frequency resolution and go for good time resolution, since the spectral components are already well separated from each other. However, if this is not the case, then a good window function, could be more difficult than finding a good stock to invest in.
By now, you should have realized how wavelet transform comes into play. The Wavelet transform (WT) solves the dilemma of resolution to a certain extent, as we will see in the next part.
Although the time and frequency resolution problems are results of a physical phenomenon (the Heisenberg uncertainty principle) and exist regardless of the transform used, it is possible to analyze any signal by using an alternative approach called the multiresolution analysis (MRA) . MRA, as implied by its name, analyzes the signal at different frequencies with different resolutions. Every spectral component is not resolved equally as was the case in the STFT.
MRA is designed to give good time resolution and poor frequency resolution at high frequencies and good frequency resolution and poor time resolution at low frequencies. This approach makes sense especially when the signal at hand has high frequency components for short durations and low frequency components for long durations. Fortunately, the signals that are encountered in practical applications are often of this type. For example, the following shows a signal of this type. It has a relatively low frequency component throughout the entire signal and relatively high frequency components for a short duration somewhere around the middle.

The continuous wavelet transform was developed as an alternative approach to the short time Fourier transform to overcome the resolution problem. The wavelet analysis is done in a similar way to the STFT analysis, in the sense that the signal is multiplied with a function, {\it the wavelet}, similar to the window function in the STFT, and the transform is computed separately for different segments of the time-domain signal. However, there are two main differences between the STFT and the CWT:
1. The Fourier transforms of the windowed signals are not taken, and therefore single peak will be seen corresponding to a sinusoid, i.e., negative frequencies are not computed.
2. The width of the window is changed as the transform is computed for every single spectral component, which is probably the most significant characteristic of the wavelet transform.
The continuous wavelet transform is defined as follows

Equation 3.1
As seen in the above equation , the transformed signal is a function of two variables, tau and s , the translation and scale parameters, respectively. psi(t) is the transforming function, and it is called the mother wavelet . The term mother wavelet gets its name due to two important properties of the wavelet analysis as explained below:
The term wavelet means a small wave . The smallness refers to the condition that this (window) function is of finite length ( compactly supported). The wave refers to the condition that this function is oscillatory . The term mother implies that the functions with different region of support that are used in the transformation process are derived from one main function, or the mother wavelet. In other words, the mother wavelet is a prototype for generating the other window functions.
The term translation is used in the same sense as it was used in the STFT; it is related to the location of the window, as the window is shifted through the signal. This term, obviously, corresponds to time information in the transform domain. However, we do not have a frequency parameter, as we had before for the STFT. Instead, we have scale parameter which is defined as $1/frequency$. The term frequency is reserved for the STFT. Scale is described in more detail in the next section.
The parameter scale in the wavelet analysis is similar to the scale used in maps. As in the case of maps, high scales correspond to a non-detailed global view (of the signal), and low scales correspond to a detailed view. Similarly, in terms of frequency, low frequencies (high scales) correspond to a global information of a signal (that usually spans the entire signal), whereas high frequencies (low scales) correspond to a detailed information of a hidden pattern in the signal (that usually lasts a relatively short time). Cosine signals corresponding to various scales are given as examples in the following figure .
Figure
3.2
Fortunately in practical applications, low scales (high frequencies) do not last for the entire duration of the signal, unlike those shown in the figure, but they usually appear from time to time as short bursts, or spikes. High scales (low frequencies) usually last for the entire duration of the signal.
Scaling, as a mathematical operation, either dilates or compresses a signal. Larger scales correspond to dilated (or stretched out) signals and small scales correspond to compressed signals. All of the signals given in the figure are derived from the same cosine signal, i.e., they are dilated or compressed versions of the same function. In the above figure, s=0.05 is the smallest scale, and s=1 is the largest scale.
In terms of mathematical functions, if f(t) is a given function f(st) corresponds to a contracted (compressed) version of f(t) if s > 1 and to an expanded (dilated) version of f(t) if s < 1 .
However, in the definition of the wavelet transform, the scaling term is used in the denominator, and therefore, the opposite of the above statements holds, i.e., scales s > 1 dilates the signals whereas scales s < 1 , compresses the signal. This interpretation of scale will be used throughout this text.
Interpretation of the above equation will be explained in this section. Let x(t) is the signal to be analyzed. The mother wavelet is chosen to serve as a prototype for all windows in the process. All the windows that are used are the dilated (or compressed) and shifted versions of the mother wavelet. There are a number of functions that are used for this purpose. The Morlet wavelet and the Mexican hat function are two candidates, and they are used for the wavelet analysis of the examples which are presented later in this chapter.
Once the mother wavelet is chosen the computation starts with s=1 and the continuous wavelet transform is computed for all values of s , smaller and larger than ``1''. However, depending on the signal, a complete transform is usually not necessary. For all practical purposes, the signals are bandlimited, and therefore, computation of the transform for a limited interval of scales is usually adequate. In this study, some finite interval of values for s were used, as will be described later in this chapter.
For convenience, the procedure will be started from scale s=1 and will continue for the increasing values of s , i.e., the analysis will start from high frequencies and proceed towards low frequencies. This first value of s will correspond to the most compressed wavelet. As the value of s is increased, the wavelet will dilate.
The wavelet is placed at the beginning of the signal at the point which corresponds to time=0. The wavelet function at scale ``1'' is multiplied by the signal and then integrated over all times. The result of the integration is then multiplied by the constant number 1/sqrt{s} . This multiplication is for energy normalization purposes so that the transformed signal will have the same energy at every scale. The final result is the value of the transformation, i.e., the value of the continuous wavelet transform at time zero and scale s=1 . In other words, it is the value that corresponds to the point tau =0 , s=1 in the time-scale plane.
The wavelet at scale s=1 is then shifted towards the right by tau amount to the location t=tau , and the above equation is computed to get the transform value at t=tau , s=1 in the time-frequency plane.
This procedure is repeated until the wavelet reaches the end of the signal. One row of points on the time-scale plane for the scale s=1 is now completed.
Then, s is increased by a small value. Note that, this is a continuous transform, and therefore, both tau and s must be incremented continuously . However, if this transform needs to be computed by a computer, then both parameters are increased by a sufficiently small step size. This corresponds to sampling the time-scale plane.
The above procedure is repeated for every value of s. Every computation for a given value of s fills the corresponding single row of the time-scale plane. When the process is completed for all desired values of s, the CWT of the signal has been calculated.
The figures below illustrate the entire process step by step.
Figure
3.3
In Figure 3.3, the signal and the wavelet function are shown for four different values of tau . The signal is a truncated version of the signal shown in Figure 3.1. The scale value is 1 , corresponding to the lowest scale, or highest frequency. Note how compact it is (the blue window). It should be as narrow as the highest frequency component that exists in the signal. Four distinct locations of the wavelet function are shown in the figure at to=2 , to=40, to=90, and to=140 . At every location, it is multiplied by the signal. Obviously, the product is nonzero only where the signal falls in the region of support of the wavelet, and it is zero elsewhere. By shifting the wavelet in time, the signal is localized in time, and by changing the value of s , the signal is localized in scale (frequency).
If the signal has a spectral component that corresponds to the current value of s (which is 1 in this case), the product of the wavelet with the signal at the location where this spectral component exists gives a relatively large value. If the spectral component that corresponds to the current value of s is not present in the signal, the product value will be relatively small, or zero. The signal in Figure 3.3 has spectral components comparable to the window's width at s=1 around t=100 ms.
The continuous wavelet transform of the signal in Figure 3.3 will yield large values for low scales around time 100 ms, and small values elsewhere. For high scales, on the other hand, the continuous wavelet transform will give large values for almost the entire duration of the signal, since low frequencies exist at all times.
Figure
3.4
Figure
3.5
Figures 3.4 and 3.5 illustrate the same process for the scales s=5 and s=20, respectively. Note how the window width changes with increasing scale (decreasing frequency). As the window width increases, the transform starts picking up the lower frequency components.
As a result, for every scale and for every time (interval), one point of the time-scale plane is computed. The computations at one scale construct the rows of the time-scale plane, and the computations at different scales construct the columns of the time-scale plane.
Now, let's take a look at an example, and see how the wavelet transform really looks like. Consider the non-stationary signal in Figure 3.6. This is similar to the example given for the STFT, except at different frequencies. As stated on the figure, the signal is composed of four frequency components at 30 Hz, 20 Hz, 10 Hz and 5 Hz.

Figure 3.6
Figure 3.7 is the continuous wavelet transform (CWT) of this signal. Note that the axes are translation and scale, not time and frequency. However, translation is strictly related to time, since it indicates where the mother wavelet is located. The translation of the mother wavelet can be thought of as the time elapsed since t=0 . The scale, however, has a whole different story. Remember that the scale parameter s in equation 3.1 is actually inverse of frequency. In other words, whatever we said about the properties of the wavelet transform regarding the frequency resolution, inverse of it will appear on the figures showing the WT of the time-domain signal.

Figure 3.7
Note that in Figure 3.7 that smaller scales correspond to higher frequencies, i.e., frequency decreases as scale increases, therefore, that portion of the graph with scales around zero, actually correspond to highest frequencies in the analysis, and that with high scales correspond to lowest frequencies. Remember that the signal had 30 Hz (highest frequency) components first, and this appears at the lowest scale at a translations of 0 to 30. Then comes the 20 Hz component, second highest frequency, and so on. The 5 Hz component appears at the end of the translation axis (as expected), and at higher scales (lower frequencies) again as expected.

Figure 3.8
Now, recall these resolution properties: Unlike the STFT which has a constant
resolution at all times and frequencies, the WT has a good time and poor
frequency resolution at high frequencies, and good frequency and poor time
resolution at low frequencies. Figure 3.8 shows the same WT in Figure 3.7 from
another angle to better illustrate the resolution properties: In Figure 3.8,
lower scales (higher frequencies) have better scale resolution (narrower in
scale, which means that it is less ambiguous what the exact value of the scale)
which correspond to poorer frequency resolution . Similarly, higher scales have
scale frequency resolution (wider support in scale, which means it is more
ambitious what the exact value of the scale is) , which correspond to better
frequency resolution of lower frequencies.
The axes in Figure 3.7 and 3.8 are normalized and should be evaluated accordingly. Roughly speaking the 100 points in the translation axis correspond to 1000 ms, and the 150 points on the scale axis correspond to a frequency band of 40 Hz (the numbers on the translation and scale axis do not correspond to seconds and Hz, respectively , they are just the number of samples in the computation).
In this section we will take a closer look at the resolution properties of the wavelet transform. Remember that the resolution problem was the main reason why we switched from STFT to WT.
The illustration in Figure 3.9 is commonly used to explain how time and frequency resolutions should be interpreted. Every box in Figure 3.9 corresponds to a value of the wavelet transform in the time-frequency plane. Note that boxes have a certain non-zero area, which implies that the value of a particular point in the time-frequency plane cannot be known. All the points in the time-frequency plane that falls into a box is represented by one value of the WT.

Figure 3.9
Let's take a closer look at Figure 3.9: First thing to notice is that although the widths and heights of the boxes change, the area is constant. That is each box represents an equal portion of the time-frequency plane, but giving different proportions to time and frequency. Note that at low frequencies, the height of the boxes are shorter (which corresponds to better frequency resolutions, since there is less ambiguity regarding the value of the exact frequency), but their widths are longer (which correspond to poor time resolution, since there is more ambiguity regarding the value of the exact time). At higher frequencies the width of the boxes decreases, i.e., the time resolution gets better, and the heights of the boxes increase, i.e., the frequency resolution gets poorer.
Before concluding this section, it is worthwhile to mention how the partition looks like in the case of STFT. Recall that in STFT the time and frequency resolutions are determined by the width of the analysis window, which is selected once for the entire analysis, i.e., both time and frequency resolutions are constant. Therefore the time-frequency plane consists of squares in the STFT case.
Regardless of the dimensions of the boxes, the areas of all boxes, both in STFT and WT, are the same and determined by Heisenberg's inequality . As a summary, the area of a box is fixed for each window function (STFT) or mother wavelet (CWT), whereas different windows or mother wavelets can result in different areas. However, all areas are lower bounded by 1/4 \pi . That is, we cannot reduce the areas of the boxes as much as we want due to the Heisenberg's uncertainty principle. On the other hand, for a given mother wavelet the dimensions of the boxes can be changed, while keeping the area the same. This is exactly what wavelet transform does.
This section describes the main idea of wavelet analysis theory, which can also be considered to be the underlying concept of most of the signal analysis techniques. The FT defined by Fourier use basis functions to analyze and reconstruct a function. Every vector in a vector space can be written as a linear combination of the basis vectors in that vector space , i.e., by multiplying the vectors by some constant numbers, and then by taking the summation of the products. The analysis of the signal involves the estimation of these constant numbers (transform coefficients, or Fourier coefficients, wavelet coefficients, etc). The synthesis, or the reconstruction, corresponds to computing the linear combination equation.
All the definitions and theorems related to this subject can be found in Keiser's book, A Friendly Guide to Wavelets but an introductory level knowledge of how basis functions work is necessary to understand the underlying principles of the wavelet theory. Therefore, this information will be presented in this section.
Basis Vectors
Note: Most of the equations include letters of the Greek alphabet. These letters are written out explicitly in the text with their names, such as tau, psi, phi etc. For capital letters, the first letter of the name has been capitalized, such as, Tau, Psi, Phi etc. Also, subscripts are shown by the underscore character _ , and superscripts are shown by the ^ character. Also note that all letters or letter names written in bold type face represent vectors, Some important points are also written in bold face, but the meaning should be clear from the context.
A basis of a vector space V is a set of linearly independent vectors, such that any vector v in V can be written as a linear combination of these basis vectors. There may be more than one basis for a vector space. However, all of them have the same number of vectors, and this number is known as the dimension of the vector space. For example in two-dimensional space, the basis will have two vectors.

Equation 3.2
Equation 3.2 shows how any vector v can be written as a linear combination of the basis vectors b_k and the corresponding coefficients nu^k .
This concept, given in terms of vectors, can easily be generalized to functions, by replacing the basis vectors b_k with basis functions phi_k(t), and the vector v with a function f(t). Equation 3.2 then becomes

Equation 3.2a
The complex exponential (sines and cosines) functions are the basis functions for the FT. Furthermore, they are orthogonal functions, which provide some desirable properties for reconstruction.
Let f(t) and g(t) be two functions in L^2 [a,b]. ( L^2 [a,b] denotes the set of square integrable functions in the interval [a,b]). The inner product of two functions is defined by Equation 3.3:

Equation 3.3
According to the above definition of the inner product, the CWT can be thought of as the inner product of the test signal with the basis functions psi_(tau ,s)(t):

Equation 3.4
where,

Equation 3.5
This definition of the CWT shows that the wavelet analysis is a measure of similarity between the basis functions (wavelets) and the signal itself. Here the similarity is in the sense of similar frequency content. The calculated CWT coefficients refer to the closeness of the signal to the wavelet at the current scale .
This further clarifies the previous discussion on the correlation of the signal with the wavelet at a certain scale. If the signal has a major component of the frequency corresponding to the current scale, then the wavelet (the basis function) at the current scale will be similar or close to the signal at the particular location where this frequency component occurs. Therefore, the CWT coefficient computed at this point in the time-scale plane will be a relatively large number.
Two vectors v , w are said to be orthogonal if their inner product equals zero:

Equation 3.6
Similarly, two functions f and g are said to be orthogonal to each other if their inner product is zero:
![]()
A set of vectors {v_1, v_2, ....,v_n} is said to be orthonormal , if they are pairwise orthogonal to each other, and all have length ``1''. This can be expressed as:

Equation 3.8
Similarly, a set of functions {phi_k(t)}, k=1,2,3,..., is said to be orthonormal if

Equation 3.9
and

Equation 3.10
or equivalently

Equation 3.11
where, delta_{kl} is the Kronecker delta function, defined as:

Equation 3.12
As stated above, there may be more than one set of basis functions (or vectors). Among them, the orthonormal basis functions (or vectors) are of particular importance because of the nice properties they provide in finding these analysis coefficients. The orthonormal bases allow computation of these coefficients in a very simple and straightforward way using the orthonormality property.
For orthonormal bases, the coefficients, mu_k , can be calculated as

Equation 3.13
and the function f(t) can then be reconstructed by Equation 3.2_a by substituting the mu_k coefficients. This yields

Equation 3.14
Orthonormal bases may not be available for every type of application where a generalized version, biorthogonal bases can be used. The term ``biorthogonal'' refers to two different bases which are orthogonal to each other, but each do not form an orthogonal set.
In some applications, however, biorthogonal bases also may not be available in which case frames can be used. Frames constitute an important part of wavelet theory, and interested readers are referred to Kaiser's book mentioned earlier.
Following the same order as in chapter 2 for the STFT, some examples of continuous wavelet transform are presented next. The figures given in the examples were generated by a program written to compute the CWT.
Before we close this section, I would like to include two mother wavelets commonly used in wavelet analysis. The Mexican Hat wavelet is defined as the second derivative of the Gaussian function:

Equation 3.15
which is

Equation 3.16
The Morlet wavelet is defined as

Equation 3.16a
where a is a modulation parameter, and sigma is the scaling parameter that affects the width of the window.
All of the examples that are given below correspond to real-life non-stationary signals. These signals are drawn from a database signals that includes event related potentials of normal people, and patients with Alzheimer's disease. Since these are not test signals like simple sinusoids, it is not as easy to interpret them. They are shown here only to give an idea of how real-life CWTs look like.
The following signal shown in Figure 3.11 belongs to a normal person.

Figure 3.11
and the following is its CWT. The numbers on the axes are of no importance to us. those numbers simply show that the CWT was computed at 350 translation and 60 scale locations on the translation-scale plane. The important point to note here is the fact that the computation is not a true continuous WT, as it is apparent from the computation at finite number of locations. This is only a discretized version of the CWT, which is explained later on this page. Note, however, that this is NOT discrete wavelet transform (DWT) which is the topic of Part IV of this tutorial.

Figure 3.12
and the Figure 3.13 plots the same transform from a different angle for better
visualization.

Figure 3.13
Figure 3.14 plots an event related potential of a patient diagnosed with Alzheimer's disease

Figure 3.14
and Figure 3.15 illustrates its CWT:

Figure 3.15
and here is another view from a different angle

Figure 3.16
The continuous wavelet transform is a reversible transform, provided that Equation 3.18 is satisfied. Fortunately, this is a very non-restrictive requirement. The continuous wavelet transform is reversible if Equation 3.18 is satisfied, even though the basis functions are in general may not be orthonormal. The reconstruction is possible by using the following reconstruction formula:

Equation 3.17 Inverse Wavelet Transform
where C_psi is a constant that depends on the wavelet used. The success of the
reconstruction depends on this constant called, the admissibility constant , to
satisfy the following admissibility condition :

Equation 3.18 Admissibility Condition
where psi^hat(xi) is the FT of psi(t). Equation 3.18 implies that psi^hat(0) =
0, which is

Equation 3.19
As stated above, Equation 3.19 is not a very restrictive requirement since many wavelet functions can be found whose integral is zero. For Equation 3.19 to be satisfied, the wavelet must be oscillatory.
In today's world, computers are used to do most computations (well,...ok... almost all computations). It is apparent that neither the FT, nor the STFT, nor the CWT can be practically computed by using analytical equations, integrals, etc. It is therefore necessary to discretize the transforms. As in the FT and STFT, the most intuitive way of doing this is simply sampling the time-frequency (scale) plane. Again intuitively, sampling the plane with a uniform sampling rate sounds like the most natural choice. However, in the case of WT, the scale change can be used to reduce the sampling rate.
At higher scales (lower frequencies), the sampling rate can be decreased, according to Nyquist's rule. In other words, if the time-scale plane needs to be sampled with a sampling rate of N_1 at scale s_1 , the same plane can be sampled with a sampling rate of N_2 , at scale s_2 , where, s_1 < s_2 (corresponding to frequencies f1>f2 ) and N_2 < N_1 . The actual relationship between N_1 and N_2 is

Equation 3.20
or

Equation 3.21
In other words, at lower frequencies the sampling rate can be decreased which will save a considerable amount of computation time.
It should be noted at this time, however, that the discretization can be done in any way without any restriction as far as the analysis of the signal is concerned. If synthesis is not required, even the Nyquist criteria does not need to be satisfied. The restrictions on the discretization and the sampling rate become important if, and only if, the signal reconstruction is desired. Nyquist's sampling rate is the minimum sampling rate that allows the original continuous time signal to be reconstructed from its discrete samples. The basis vectors that are mentioned earlier are of particular importance for this reason.
As mentioned earlier, the wavelet psi(tau,s) satisfying Equation 3.18, allows reconstruction of the signal by Equation 3.17. However, this is true for the continuous transform. The question is: can we still reconstruct the signal if we discretize the time and scale parameters? The answer is ``yes'', under certain conditions (as they always say in commercials: certain restrictions apply !!!).
The scale parameter s is discretized first on a logarithmic grid. The time parameter is then discretized with respect to the scale parameter , i.e., a different sampling rate is used for every scale. In other words, the sampling is done on the dyadic sampling grid shown in Figure 3.17 :

Figure 3.17
Think of the area covered by the axes as the entire time-scale plane. The CWT assigns a value to the continuum of points on this plane. Therefore, there are an infinite number of CWT coefficients. First consider the discretization of the scale axis. Among that infinite number of points, only a finite number are taken, using a logarithmic rule. The base of the logarithm depends on the user. The most common value is 2 because of its convenience. If 2 is chosen, only the scales 2, 4, 8, 16, 32, 64,...etc. are computed. If the value was 3, the scales 3, 9, 27, 81, 243,...etc. would have been computed. The time axis is then discretized according to the discretization of the scale axis. Since the discrete scale changes by factors of 2 , the sampling rate is reduced for the time axis by a factor of 2 at every scale.
Note that at the lowest scale (s=2), only 32 points of the time axis are
sampled (for the particular case given in Figure 3.17). At the next scale
value, s=4, the sampling rate of time axis is reduced by a factor of 2 since
the scale is increased by a factor of 2, and therefore, only 16 samples are
taken. At the next step, s=8 and 8 samples are taken in time, and so on.
Although it is called the time-scale plane, it is more accurate to call it the translation-scale plane, because ``time'' in the transform domain actually corresponds to the shifting of the wavelet in time. For the wavelet series, the actual time is still continuous.
Similar to the relationship between continuous Fourier transform, Fourier series and the discrete Fourier transform, there is a continuous wavelet transform, a semi-discrete wavelet transform (also known as wavelet series) and a discrete wavelet transform.
Expressing the above discretization procedure in mathematical terms, the
scale discretization is s = s_0^j , and translation discretization is tau =
k.s_0^j.tau_0 where s_0>1 and tau_0>0 . Note, how the translation
discretization is dependent on scale discretization with s_0 .
The continuous wavelet function

Equation 3.22

Equation 3.23
by inserting s = s_0^j , and tau = k.s_0^j.tau_0 .
If {psi_(j,k)} constitutes an orthonormal basis, the wavelet series transform becomes

Equation 3.24
or

Equation 3.25
A wavelet series requires that {psi_(j,k)} are either orthonormal, biorthogonal, or frame. If {psi_(j,k)} are not orthonormal, Equation 3.24 becomes

Equation 3.26
where hat{ psi_{j,k}^*(t)} , is either the dual biorthogonal basis or dual
frame (Note that * denotes the conjugate).
If {psi_(j,k) } are orthonormal or biorthogonal, the transform will be non-redundant, where as if they form a frame, the transform will be redundant. On the other hand, it is much easier to find frames than it is to find orthonormal or biorthogonal bases.
The following analogy may clear this concept. Consider the whole process as looking at a particular object. The human eyes first determine the coarse view which depends on the distance of the eyes to the object. This corresponds to adjusting the scale parameter s_0^(-j). When looking at a very close object, with great detail, j is negative and large (low scale, high frequency, analyses the detail in the signal). Moving the head (or eyes) very slowly and with very small increments (of angle, of distance, depending on the object that is being viewed), corresponds to small values of tau = k.s_0^j.tau_0 . Note that when j is negative and large, it corresponds to small changes in time, tau , (high sampling rate) and large changes in s_0^-j (low scale, high frequencies, where the sampling rate is high). The scale parameter can be thought of as magnification too.
How low can the sampling rate be and still allow reconstruction of the signal? This is the main question to be answered to optimize the procedure. The most convenient value (in terms of programming) is found to be ``2'' for s_0 and "1" for tau. Obviously, when the sampling rate is forced to be as low as possible, the number of available orthonormal wavelets is also reduced.
The continuous wavelet transform examples that were given in this chapter were actually the wavelet series of the given signals. The parameters were chosen depending on the signal. Since the reconstruction was not needed, the sampling rates were sometimes far below the critical value where s_0 varied from 2 to 10, and tau_0 varied from 2 to 8, for different examples.
Even though the discretized wavelet transform can be computed on a computer, this computation may take anywhere from a couple seconds to couple hours depending on your signal size and the resolution you want. An amazingly fast algorithm is actually available to compute the wavelet transform of a signal.
MULTIRESOLUTION ANALYSIS
THE
DISCRETE WAVELET TRANSFORM
Why is the Discrete Wavelet Transform Needed?
Although the discretized continuous wavelet transform enables the computation of the continuous wavelet transform by computers, it is not a true discrete transform. As a matter of fact, the wavelet series is simply a sampled version of the CWT, and the information it provides is highly redundant as far as the reconstruction of the signal is concerned. This redundancy, on the other hand, requires a significant amount of computation time and resources. The discrete wavelet transform (DWT), on the other hand, provides sufficient information both for analysis and synthesis of the original signal, with a significant reduction in the computation time.
The DWT is considerably easier to implement when compared to the CWT. The basic concepts of the DWT will be introduced in this section along with its properties and the algorithms used to compute it. As in the previous chapters, examples are provided to aid in the interpretation of the DWT.
THE DISCRETE WAVELET TRANSFORM (DWT)
The foundations of the DWT go back to 1976 when a technique was devised to decompose discrete time signals. The name given to the analysis scheme was subband coding. In 1983, a technique very similar to subband coding was defined and named as pyramidal coding which is also known as multiresolution analysis. Later in 1989, some improvements were made to the subband coding scheme, removing the existing redundancy in the pyramidal coding scheme. Subband coding is explained below.
The Subband Coding and The Multiresolution Analysis
The main idea is the same as it is in the CWT. A time-scale representation of a digital signal is obtained using digital filtering techniques. Recall that the CWT is a correlation between a wavelet at different scales and the signal with the scale (or the frequency) being used as a measure of similarity. The continuous wavelet transform was computed by changing the scale of the analysis window, shifting the window in time, multiplying by the signal, and integrating over all times. In the discrete case, filters of different cutoff frequencies are used to analyze the signal at different scales. The signal is passed through a series of high pass filters to analyze the high frequencies, and it is passed through a series of low pass filters to analyze the low frequencies.
The resolution of the signal, which is a measure of the amount of detail information in the signal, is changed by the filtering operations, and the scale is changed by upsampling and downsampling (subsampling) operations. Subsampling a signal corresponds to reducing the sampling rate, or removing some of the samples of the signal. For example, subsampling by two refers to dropping every other sample of the signal. Subsampling by a factor n reduces the number of samples in the signal n times.
Upsampling a signal corresponds to increasing the sampling rate of a signal by adding new samples to the signal. For example, upsampling by two refers to adding a new sample, usually a zero or an interpolated value, between every two samples of the signal. Upsampling a signal by a factor of n increases the number of samples in the signal by a factor of n.
Although it is not the only possible choice, DWT coefficients are usually sampled from the CWT on a dyadic grid, i.e., s0 = 2 and pi 0 = 1, yielding s=2j and pi =k*2j, as described in Part 3. Since the signal is a discrete time function, the terms function and sequence will be used interchangeably in the following discussion. This sequence will be denoted by x[n], where n is an integer.
The procedure starts with passing this signal (sequence) through a half band digital lowpass filter with impulse response h[n]. Filtering a signal corresponds to the mathematical operation of convolution of the signal with the impulse response of the filter. The convolution operation in discrete time is defined as follows:

A half band lowpass filter removes all frequencies that are above half of the highest frequency in the signal. For example, if a signal has a maximum of 1000 Hz component, then half band lowpass filtering removes all the frequencies above 500 Hz.
The unit of frequency is of particular importance at this time. In discrete signals, frequency is expressed in terms of radians. Accordingly, the sampling frequency of the signal is equal to 2pi radians in terms of radial frequency. Therefore, the highest frequency component that exists in a signal will be pi radians, if the signal is sampled at Nyquist’s rate (which is twice the maximum frequency that exists in the signal); that is, the Nyquist’s rate corresponds to pi rad/s in the discrete frequency domain. Therefore using Hz is not appropriate for discrete signals. However, Hz is used whenever it is needed to clarify a discussion, since it is very common to think of frequency in terms of Hz. It should always be remembered that the unit of frequency for discrete time signals is radians.
After passing the signal through a half band lowpass filter, half of the samples can be eliminated according to the Nyquist’s rule. Simply discarding every other sample will subsample the signal by two, and the signal will then have half the number of points. The scale of the signal is now doubled. Note that the lowpass filtering removes the high frequency information, but leaves the scale unchanged. Only the subsampling process changes the scale. Resolution, on the other hand, is related to the amount of information in the signal, and therefore, it is affected by the filtering operations. Half band lowpass filtering removes half of the frequencies, which can be interpreted as losing half of the information. Therefore, the resolution is halved after the filtering operation. Note, however, the subsampling operation after filtering does not affect the resolution, since removing half of the spectral components from the signal makes half the number of samples redundant anyway. Half the samples can be discarded without any loss of information. In summary, the lowpass filtering halves the resolution, but leaves the scale unchanged. The signal is then subsampled by 2 since half of the number of samples are redundant. This doubles the scale.
This procedure can mathematically be expressed as

Having said that, we now look how the DWT is actually computed: The DWT analyzes the signal at different frequency bands with different resolutions by decomposing the signal into a coarse approximation and detail information. DWT employs two sets of functions, called scaling functions and wavelet functions, which are associated with low pass and highpass filters, respectively. The decomposition of the signal into different frequency bands is simply obtained by successive highpass and lowpass filtering of the time domain signal. The original signal x[n] is first passed through a halfband highpass filter g[n] and a lowpass filter h[n]. After the filtering, half of the samples can be eliminated according to the Nyquist’s rule . The signal can therefore be subsampled by 2, simply by discarding every other sample. This constitutes one level of decomposition and can mathematically be expressed as follows:

where yhigh[k] and ylow[k] are the outputs of the highpass and lowpass filters, respectively, after subsampling by 2.
This decomposition halves the time resolution since only half the number of samples now characterizes the entire signal. However, this operation doubles the frequency resolution, since the frequency band of the signal now spans only half the previous frequency band, effectively reducing the uncertainty in the frequency by half. The above procedure, which is also known as the subband coding, can be repeated for further decomposition. At every level, the filtering and subsampling will result in half the number of samples (and hence half the time resolution) and half the frequency band spanned (and hence double the frequency resolution). Figure 4.1 illustrates this procedure, where x[n] is the original signal to be decomposed, and h[n] and g[n] are lowpass and highpass filters, respectively. The bandwidth of the signal at every level is marked on the figure as "f".

Figure 4.1. The Subband Coding Algorithm As an example, suppose that the original signal x[n] has 512 sample points, spanning a frequency band of zero to pi rad/s. At the first decomposition level, the signal is passed through the highpass and lowpass filters, followed by subsampling by 2. The output of the highpass filter has 256 points (hence half the time resolution,hence double the frequency resolution). These 256 samples constitute the first level of DWT coefficients. The output of the lowpass filter also has 256 samples, but it spans the other half of the frequency band, frequencies from 0 to pi/2 rad/s. This signal is then passed through the same lowpass and highpass filters for further decomposition. The output of the second lowpass filter followed by subsampling has 128 samples spanning a frequency band of 0 to pi/4 rad/s, and the output of the second highpass filter followed by subsampling has 128 samples spanning a frequency band of pi/4 to pi/2 rad/s. The second highpass filtered signal constitutes the second level of DWT coefficients. This signal has half the time resolution, but twice the frequency resolution of the first level signal. In other words, time resolution has decreased by a factor of 4, and frequency resolution has increased by a factor of 4 compared to the original signal. The lowpass filter output is then filtered once again for further decomposition. This process continues until two samples are left. For this specific example there would be 8 levels of decomposition, each having half the number of samples of the previous level. The DWT of the original signal is then obtained by concatenating all coefficients starting from the last level of decomposition (remaining two samples, in this case). The DWT will then have the same number of coefficients as the original signal.
The frequencies that are most prominent in the original signal will appear as high amplitudes in that region of the DWT signal that includes those particular frequencies. The difference of this transform from the Fourier transform is that the time localization of these frequencies will not be lost. However, the time localization will have a resolution that depends on which level they appear. If the main information of the signal lies in the high frequencies, as happens most often, the time localization of these frequencies will be more precise, since they are characterized by more number of samples. If the main information lies only at very low frequencies, the time localization will not be very precise, since few samples are used to express signal at these frequencies. This procedure in effect offers a good time resolution at high frequencies, and good frequency resolution at low frequencies. Most practical signals encountered are of this type.
The frequency bands that are not very prominent in the original signal will have very low amplitudes, and that part of the DWT signal can be discarded without any major loss of information, allowing data reduction. Figure 4.2 illustrates an example of how DWT signals look like and how data reduction is provided. Figure 4.2a shows a typical 512-sample signal that is normalized to unit amplitude. The horizontal axis is the number of samples, whereas the vertical axis is the normalized amplitude. Figure 4.2b shows the 8 level DWT of the signal in Figure 4.2a. The last 256 samples in this signal correspond to the highest frequency band in the signal, the previous 128 samples correspond to the second highest frequency band and so on. It should be noted that only the first 64 samples, which correspond to lower frequencies of the analysis, carry relevant information and the rest of this signal has virtually no information. Therefore, all but the first 64 samples can be discarded without any loss of information. This is how DWT provides a very effective data reduction scheme.
Figure 4.2 Example of a DWT
We will revisit this example, since it provides important insight to how DWT should be interpreted. Before that, however, we need to conclude our mathematical analysis of the DWT.
One important property of the discrete wavelet transform is the relationship between the impulse responses of the highpass and lowpass filters. The highpass and lowpass filters are not independent of each other, and they are related by
![]()
where g[n] is the highpass, h[n] is the lowpass filter, and L is the filter length (in number of points). Note that the two filters are odd index alternated reversed versions of each other. Lowpass to highpass conversion is provided by the (-1)n term. Filters satisfying this condition are commonly used in signal processing, and they are known as the Quadrature Mirror Filters (QMF). The two filtering and subsampling operations can be expressed by

The reconstruction in this case is very easy since halfband filters form orthonormal bases. The above procedure is followed in reverse order for the reconstruction. The signals at every level are upsampled by two, passed through the synthesis filters g’[n], and h’[n] (highpass and lowpass, respectively), and then added. The interesting point here is that the analysis and synthesis filters are identical to each other, except for a time reversal. Therefore, the reconstruction formula becomes (for each layer)

However, if the filters are not ideal halfband, then perfect reconstruction cannot be achieved. Although it is not possible to realize ideal filters, under certain conditions it is possible to find filters that provide perfect reconstruction. The most famous ones are the ones developed by Ingrid Daubechies, and they are known as Daubechies’ wavelets.
Note that due to successive subsampling by 2, the signal length must be a power of 2, or at least a multiple of power of 2, in order this scheme to be efficient. The length of the signal determines the number of levels that the signal can be decomposed to. For example, if the signal length is 1024, ten levels of decomposition are possible.
Interpreting the DWT coefficients can sometimes be rather difficult because the way DWT coefficients are presented is rather peculiar. To make a real long story real short, DWT coefficients of each level are concatenated, starting with the last level. An example is in order to make this concept clear:
Suppose we have a 256-sample long signal sampled at 10 MHZ and we wish to obtain its DWT coefficients. Since the signal is sampled at 10 MHz, the highest frequency component that exists in the signal is 5 MHz. At the first level, the signal is passed through the lowpass filter h[n], and the highpass filter g[n], the outputs of which are subsampled by two. The highpass filter output is the first level DWT coefficients. There are 128 of them, and they represent the signal in the [2.5 5] MHz range. These 128 samples are the last 128 samples plotted. The lowpass filter output, which also has 128 samples, but spanning the frequency band of [0 2.5] MHz, are further decomposed by passing them through the same h[n] and g[n]. The output of the second highpass filter is the level 2 DWT coefficients and these 64 samples precede the 128 level 1 coefficients in the plot. The output of the second lowpass filter is further decomposed, once again by passing it through the filters h[n] and g[n]. The output of the third highpass filter is the level 3 DWT coefficiets. These 32 samples precede the level 2 DWT coefficients in the plot.
The procedure continues until only 1 DWT coefficient can be computed at level 9. This one coefficient is the first to be plotted in the DWT plot. This is followed by 2 level 8 coefficients, 4 level 7 coefficients, 8 level 6 coefficients, 16 level 5 coefficients, 32 level 4 coefficients, 64 level 3 coefficients, 128 level 2 coefficients and finally 256 level 1 coefficients. Note that less and less number of samples is used at lower frequencies, therefore, the time resolution decreases as frequency decreases, but since the frequency interval also decreases at low frequencies, the frequency resolution increases. Obviously, the first few coefficients would not carry a whole lot of information, simply due to greatly reduced time resolution. To illustrate this richly bizarre DWT representation let us take a look at a real world signal. Our original signal is a 256-sample long ultrasonic signal, which was sampled at 25 MHz. This signal was originally generated by using a 2.25 MHz transducer, therefore the main spectral component of the signal is at 2.25 MHz. The last 128 samples correspond to [6.25 12.5] MHz range. As seen from the plot, no information is available here, hence these samples can be discarded without any loss of information. The preceding 64 samples represent the signal in the [3.12 6.25] MHz range, which also does not carry any significant information. The little glitches probably correspond to the high frequency noise in the signal. The preceding 32 samples represent the signal in the [1.5 3.1] MHz range. As you can see, the majority of the signal’s energy is focused in these 32 samples, as we expected to see. The previous 16 samples correspond to [0.75 1.5] MHz and the peaks that are seen at this level probably represent the lower frequency envelope of the signal. The previous samples probably do not carry any other significant information. It is safe to say that we can get by with the 3rd and 4th level coefficients, that is we can represent this 256 sample long signal with 16+32=48 samples, a significant data reduction which would make your computer quite happy.
One area that has benefited the most from this particular property of the wavelet transforms is image processing. As you may well know, images, particularly high-resolution images, claim a lot of disk space. As a matter of fact, if this tutorial is taking a long time to download, that is mostly because of the images. DWT can be used to reduce the image size without losing much of the resolution. Here is how:
For a given image, you can compute the DWT of, say each row, and discard all values in the DWT that are less then a certain threshold. We then save only those DWT coefficients that are above the threshold for each row, and when we need to reconstruct the original image, we simply pad each row with as many zeros as the number of discarded coefficients, and use the inverse DWT to reconstruct each row of the original image. We can also analyze the image at different frequency bands, and reconstruct the original image by using only the coefficients that are of a particular band. I will try to put sample images hopefully soon, to illustrate this point.
Another issue that is receiving more and more attention is carrying out the decomposition (subband coding) not only on the lowpass side but on both sides. In other words, zooming into both low and high frequency bands of the signal separately. This can be visualized as having both sides of the tree structure of Figure 4.1. What result is what is known as the wavelet packages.
WATERMARKING
ALGORITHM
At the present day it has become virtually impossible to give the definition of a ``wavelet''. The research field is growing so fast and novel contributions are made at such a rate that even if one manages to give a definition today, it might be obsolete tomorrow. One, very vague, way of thinking about wavelets could be:
''Wavelet are building blocks that can
quickly decorrelate data.''
This sentence at least incorporates
three of the main features of wavelets. First of all, they are building blocks
for general data sets or functions. Mathematically we say that they form a
basis or, more general a frame. This means that each element of a general class
can be written in a stable way as a linear combination of the wavelets.
If we denote the wavelets by `w i` and the coefficients by `c i` , we
can write a general function f as
f =
Summation over i( w i * c i )
Secondly, wavelets have the power
to decorrelate. This means that the representation of the data in terms of the
wavelet coefficients `c i` is somehow more compact then the original
representation. In informationtheoretic jargon, we say that the entropy in the
wavelet representation is smaller then in the original representation.
In approximationtheoretic jargon,
we want to get an accurate approximation of f by only using a small fraction of
the wavelet coefficients.
The way to get this
decorrelation power is to construct wavelets which already in some way resemble
the data we want to represent. More specifically, we would like the wavelets to
have the same correlation structure as the data. For example, most signals we
encounter in daily life have both correlation in space and frequency. Samples
which are spatically close are much more correlated then ones that are far
apart, and frequencies often occur in bands. To analyze and represent such
signals we need wavelets that are local in space and frequency. Typically this
is achieved by building wavelets which have compact support (localization in
space), which are smooth (decay towards high frequencies), and which have
vanishing moments (decay towards low frequencies). Finally, we want to quickly
find the wavelet representation of the data. More precisely, we want to switch
between the original representation of the data and its wavelet representation
in a time proportional to the size of the data. The fast decorrelation power of
wavelets is the key to applications such as data compression, fast data
transmission, noise cancellation, signal recovering, and fast numerical
algorithms.
The main difference with
classic constructions is that it does not employ the Fourier transform. Until
recently, the Fourier transform has been instrumental in wavelet constructions.
The underlying reason is that wavelets are traditionally defined as translates
and dilates of one function, and translation and dilation become algebraic
operations after Fourier transform. The wavelet construction then relies on
certain polynomial factorizations. We refer to wavelets which are translates
and dilates of one function as first generation wavelets.
The lifting scheme is a new flexible
tool for constructing wavelets and wavelet transforms. The main difference with
classical constructions is that it does not rely on the Fourier transform. This
way lifting can be used to construct second generation wavelets, wavelets which
are not necessarily translates and dilates of one function. The latter we refer
to as first generation wavelets. In the case of first generation wavelets, the
lifting scheme will never come up with wavelets which somehow could not be
found by the Cohen-Daubechies-Feauveau machinery. Nevertheless, it has the
following advantages:
1. It allows a faster
implementation of the wavelet transform. Traditionally, the fast wavelet
transform is calculated with a twoband subband transform scheme. In each step
the signal is split into a high pass and low pass band and then subsampled.
Recursion occurs on the low pass band. The lifting scheme makes optimal use of
similarities between the high and low pass filters to speed up the calculation.
In some cases the number of operations can be reduced by a factor of two.
2. The lifting scheme allows a
fully inplace calculation of the wavelet transform. In other words, no auxiliary
memory is needed and the original signal (image) can be replaced with its
wavelet transform.
3. In the classical case, it is not
immediately clear that the inverse wavelet transform actually is the inverse of
the forward transform. Only with the Fourier transform one can convince oneself
of the perfect reconstruction property. With the lifting scheme, the inverse
wavelet transform can immediately be found by undoing the operations of the
forward transform. In practise, this comes down to simply reversing the order
of the operations and changing each + into – and vice-versa.
4.The lifting scheme is a natural
way to introduce wavelets in classroom .Indeed,it does not rely on the Fourier
Transform,the properties of wavelts do not appear as somehow ``magical'' to
students who do not have a strong background in Fourier analysis.
Since lifting does not
rely on the Fourier transform, it can be used to construction wavelets in
settings where translation and dilation, and thus the Fourier transform, cannot
be used. We refer to such wavelet as second generation wavelets. Typical
examples are:
1. Wavelets on bounded domains: The
construction of wavelets on domains in a Euclidean space is needed in
applications such as image segmentation and the numerical solution of partial
differential equations. A special case is the construction of wavelets on an
interval, which is needed to transform finite length signals
without introducing artifacts at
the boundaries.
2. Wavelets on curves and surfaces:
To analyze data that live on curves or surfaces or to solve equations on curves
or surfaces, one needs wavelets intrinsically defined on these manifolds,
independent of parametrization.
3. Weighted wavelets:
Diagonalization of differential operators and weighted approximation require a
basis adapted to weighted measures. Wavelets biorthogonal with respect to a
weighted inner product are needed.
4. Wavelets and irregular sampling:
Many real life problems require basis functions and transforms adapted to
irregularly sampled data.
It is obvious that
wavelets adapted to these setting cannot be formed by translation and dilation.
The Fourier transform can thus no longer be used as a construction tool. The
lifting scheme provides an alternative.
How are images represented in the computer?
An image as defined on, say a
photographic film, is a continuous function of brightness values. Each point on
the developed film can be associated with a gray level value representing how
bright that particular point is. To store images in a computer we have to
sample and quantize (digitize) the image function. Sampling refers to
considering the image only at a finite number of points. And quantization
refers to the representation of the gray level value at the sampling point
using finite number of bits. Each image sample is called a pixel. One of the
simpler scheme is sampling on a regular grid of squares. We can visualize the
process as overlaying an uniform grid on the image and sampling the image
function at the center of each grid square. Finer the grid, better the
resolution of the image; coarser the grid, the more is the observed
``pixelization'' . At each pixel (or at each grid square) we usually represent
the gray level value using an integer ranging from 0 for black to 255 for fully
white for a 8 bit image.
The wavelet function extracts
information about differences between adjacent positions within the data. A companion function which retains
information about the averages between adjacent points is the scale function
and it is applied in the afore mentioned way to generate a second series of
coefficients. These two series of
coefficients are each half as long as the original data set. The one series corresponds to the
coefficients generated from inner products with the scale function which will
be designated the 'smoothed' data. The
second series corresponds to the coefficients generated from the inner products
with the wavelet function which will be designated the 'detail' of the data. Therefore, an original data set of 1024
values will be decomposed into two series of 512 values. When the results are saved to a file, the
smoothed data is saved first followed by the detail of the data. This transform will be called a first level
transform and it only applies the wavelet function at the smallest scale. If the above analysis is done along with an
analysis using the function expanded by a factor of two, the results will be
called a second level transform.
A greyscale image is composed of
rectangular pixels, each having a particular intensity. By allowing a number
between 0 and 255 to represent each intensity and associating each pixel with
the appropriate number, we can represent the image as a matrix of numbers
between 0 and 255. By convention, the higher the number, the brighter the pixel
(eg black = 0 and white = 255). When we decompose an image onto a wavelet
basis, we also divide it into a relatively constant portion and a varying
portion. However, images tend to be composed mainly of relatively constant
regions. The low variation in these regions translates into very small
coefficients for many elements of the wavelet basis. If we set all coefficients
below a certain threshold to be zero, we end up with many zeros. Since the
number of coefficients that must be stored is the same as the number of pixels
in the image, and many of these coefficients are the same, we can store these
coefficients much more efficiently than simply storing the values for each
individual pixel.
This is the Shannon sampling theorem that has a
simple physical interpretation in image analysis: The sampling interval should
be chosen in size such that it is less than or equal to half of the smallest
interesting detail in the image.
THE BASIC IDEA BEHIND LIFTING
The basic idea behind the lifting scheme is very simple. It starts with a trivial wavelet, the “Lazy wavelet"; a function which essentially doesn't do a thing, but which has the formal properties of a wavelet. The lifting scheme then gradually builds a new wavelet, with improved properties, by adding in new basis functions. Our construction relies on the lifting scheme and inherits all of its advantages: fast transform, in-place calculation, and integer-to-integer transforms.
A canonical case of
lifting consists of three stages, which we refer to as: predict, update and
split. Assume we start with an abstract data set, which we refer to as d0 . We
know this data set has some correlation structure and we would like to
exploit it to obtain a more compact
representation.
In the first stage, we
calculate the prediction error on the odd samples by calculating the average of the adjacent even samples.
d(
i , j) = d ( i , j) - [d ( i-1 , j) + d ( i+1 , j)] / 2
In the second stage , we update the
even samples by the prediction error.
s( i , j) = s ( i , j) + [s
( i-1 , j) + s ( i+1 , j)] / 4
In the last stage ,we split the we
split the data into two smaller subsets –
odd and even samples into groups.
At this moment we can replace the
original data with the smaller set.
ALGORTHM :
The detailed algorithm is as follows:
RESEARCH WORK INVOLVED :
The novel approach used in the above stated watermarking algorithm is that it has been designed taking into account the attacker’s point of view.
1. First of all the attacker does not know what type of frequency transform am I using???
He will guess DCT or Wavelets.
If wavelets is the wild guess………the next question is which Wavelets have I used ………HAAR, GABOR, DAUBECHEIS, CDF etc…
If the wild guess is CDF …….then the next question is which type of wavelet -pyramidal wavelet structure or wavelet packets ???
2. Secondly, where and what values have I inserted as the watermark?
Suppose he too generates a random number array of length N using Donald Knuth’s random number generator. Then at that very instant that random values generated by the function are different from the values that I have used. The next problem is that in which subbands is the watermark inserted.I have chosen that too randomly i.e from M subbands I randomly select P subbands.The next question will be where in the subband is the watermark ? The attacker will guess that the zero wavelet coefficientas have been modified ……….. but the next step of randomness introduced in my algorithm is skipping of a random number of coefficients.Thus the attacker will fail here.
RELATED WORK
At present, most of the algorithms are based on DCT and the few which implement wavelets search for significant coefficients to store the watermark and store the positions of such coefficients , thus are not memory efficient algorithms. Some embed the watermark in a zigzag fashion i.e. the diagonal elements of the image pixels, and thus are more susceptible to attacks since the attacker can easily predict the location of the watermark if the positions of insertion are fixed. Others use a coefficient selection key, which is an array with values set to 0 or 1 for each pixel position – where 1 is for embedding the watermark in that pixel. Some techniques use blind watermark retrieval techniques, i.e retrieval of the watermark without the host image, the obvious disadvantage being lack of accuracy in determining attacks. Most of the algorithms implement Haar wavelet.
How many level transforms can be applied to an
image?
For an image of dimensions N by N where N is 2^n (, maximum “log N=n”
number of level transforms can be applied.(This can be easily proved by
mathematical induction).Its not necessary that n level tranforms should be
always be applied,however the key space is increased.
Is the algorithm symmetric?
It is not tested whether the
algorithm is symmetric, i.e., the time to insert watermark is nearly
equal to the time to extract watermark. (Complex watermarking algorithms tend
to have insertion times much larger than the extraction times.)
Why use Digital Watermark?
Ownership protection
Copy control
Authentication
Convey other information
Digital watermarking intends to
enable the proof of ownership on copyrighted material, detect the originator of
illegally made copies, monitor the usage of the copyrighted multimedia data and
analyze the spread spectrum of the data over networks and severs. It should be noted that the embedded
signaling could be used for a variety of purposes, other than copyright
control. In a distributed video production environment, for example,
watermarking can be applied to integrate different kinds of data into the video
material, such as customer information, or meta data as a format independent
part of the video, e.g. information about cuts or scene descriptions.Watermarking can be used to authenticate data. For example,
if video cameras were used to monitor traffic signal violations, an
unscrupulous person could alter the video, changing the condition of the
traffic light from green to red. But if the recording had been watermarked, it
would be impossible to make any alterations. Such a scheme would also prevent
unintentional corruption of data. Likewise, watermarking would prevent time
stamps, commonly used on photographs and video today, from being altered.
As the computers are more and more integrated via the network, the distribution of digital media is becoming faster, easier, and requiring less effort to make exact copies. One of the major impediment is the lack of effective intellectual property protection of digital media to discourage un-authorized copying and distribution.
Conventionally, in analog world, a painting is signed by the artist to attest the copyright, an identity card is stamped by the steel seal to avoid forgery, and the paper money are identified by the embossed portrait. Such kind of hand-written signatures, seals and watermarks have been used from ancient times as a way to identify the source, creator of a document or a picture.
However, in the digital world, digital technology for manipulating images has made it difficult to distinguish the visual truth. Besides, the characteristics of digitization bring significant changes in copyright issue, which creates an urgent need to intellectual property protection on the digitally recorded information.
Public key encryption systems do not completely solve the problem of unauthorized copying because of the ease with which images may be reproduced from previously published documents and enhanced using software such as "CorelDraw". All encrypted documents and images need to be decrypted before they can be inspected or used. The idea of using an indelible watermark to identify uniquely both the source of an image and an intended recipient has therefore stimulated much interest in the electronic publishing.
Digitally watermarked documents on computer networks help to locate stolen or misused documents.
The watermark is supposed to be robust against image tampering. Therefore anybody who wants to distribute the image further will also distribute the watermark with it, violating the copyright on the image. If the copyright holder can detect the fraud, he can prove ownership by showing that the image contains his proper private watermark.
APPLICATIONS
High security cameras – for use in surveillance or by crime scene and insurance photograph for images which may later come under the scrutiny of a court proceeding.
Scanners – as a means of storing time, date, model, serial number and additional information into original scanned images.
Faxes & Printers – as a means of restricting the display and printing of documents to those images which are confirmed by the printer as authentic.
Medical imagery – associating patient information with an image and verifying the authenticity of an image. The patient information becomes inseparable from the image.