This tutorial introduces the vl_covdet
VLFeat command
implementing a number of co-variant feature detectors and
corresponding descriptors. This family of detectors
include SIFT as well as multi-scale conern
(Harris-Laplace), and blob (Hessian-Laplace and Hessian-Hessian)
detectors. For example applications, see also
the SIFT tutorial.
The first example shows how to
use vl_covdet
to compute
and visualize co-variant features. Fist, let us load an example image
and visualize it:
The image must be converted to gray=scale and single precision. Then
vl_covdet
can be called in order to extract features (by
default this uses the DoG cornerness measure, similarly to SIFT).
The verbose
option is not necessary, but it produces
some useful information:
The vl_plotframe
command can then be used to plot
these features
which results in the image
vl_covdet
use the DoG
cornerness measure (like SIFT).
In addition to the DoG detector, vl_covdet
supports a
number of other ones:
For example, to use the Hessian-Laplace operator instead of DoG, use the code:
The following figure shows example of the output of these detectors:
To understand the rest of the tutorial, it is important to
understand the geometric meaning of a feature frame. Features
computed by vl_covdet
are oriented ellipses and
are defined by a translation $T$ and linear map $A$ (a $2\times 2$)
which can be extracted as follows:
The map $(A,T)$ moves pixels from the feature frame (also called normalised patch domain) to the image frame. The feature is represented as a circle of unit radius centered at the origin in the feature reference frame, and this is transformed into an image ellipse by $(A,T)$.
In term of extent, the normalised patch domain is a square box centered at the origin, whereas the image domain uses the standard MATLAB convention and starts at (1,1). The Y axis points downward and the X axis to the right. These notions are important in the computation of normalised patches and descriptors (see later).
Affine adaptation is the process of estimating the &ldqo;affine shape&rdqo; of an image region in order to construct an affinely co-variant feature frame. This is useful in order to compensate for deformations of the image like slant, arising for example for small perspective distortion.
To switch on affine adaptation, use
the EstimateAffineShape
option:
which detects the following features:
The detection methods discussed so far are rotationally invariant. This means that they detect the same circular or elliptical regions regardless of an image rotation, but they do not allow to fix and normalise rotation in the feature frame. Instead, features are estimated to be upright by default (formally, this means that the affine transformation $(A,T)$ maps the vertical axis $(0,1)$ to itself).
Estimating and removing the effect of rotation from a feature frame
is needed in order to compute rotationally invariant descriptors. This
can be obtained by specifying the EstimateOrientation
option:
which results in the following features being detected:
The method used is the same as the one proposed by D. Lowe: the orientation is given by the dominant gradient direction. Intuitively, this means that, in the normalized frame, brighter stuff should appear on the right, or that there should be a left-to-right dark-to-bright pattern.
In practice, this method may result in an ambiguous detection of the orientations; in this case, up to four different orientations may be assigned to the same frame, resulting in a multiplication of them.
vl_covdet
can also compute descriptors. Three are
supported so far: SIFT, LIOP and raw patches (from which any other
descriptor can be computed). To use this functionality simply add an
output argument:
This will compute SIFT descriptors for all the features. Each
column of descrs
is a 128-dimensional descriptor vector
in single precision. Alternatively, to compute patches use:
Using default settings, each column will be a 144-dimensional descriptor vector in single precision. If you wish to change the settings, use arguments described in LIOP tutorial
In this case each column of descrs
is a stacked patch.
To visualize the first 100 patches, one can use for example:
There are several parameters affecting the patches associated to
features. First, PatchRelativeExtent
can be used to
control how large a patch is relative to the feature scale. The extent
is half of the side of the patch domain, a square in
the frame reference
frame. Since most detectors latch on image structures (e.g. blobs)
that, in the normalised frame reference, have a size comparable to a
circle of radius one, setting PatchRelativeExtent
to 6
makes the patch about six times largerer than the size of the corner
structure. This is approximately the default extent of SIFT feature
descriptors.
A second important parameter is PatchRelativeSigma
which expresses the amount of smoothing applied to the image in the
normalised patch frame. By default this is set to 1.0, but can be
reduced to get &ldqo;sharper&rdqo; patches. Of course, the amount of
smoothing is bounded below by the resolution of the input image: a
smoothing of, say, less than half a pixel cannot be recovered due to
the limited sampling rate of the latter. Moreover, the patch must be
sampled finely enough to avoid aliasing (see next).
The last parameter is PatchResolution
. If this is
equal to $w$, then the patch has a side of $2w+1$ pixels. (hence the
sampling step in the normalised frame is given by
PatchRelativeExtent
/PatchResolution
).
Extracting higher resolution patches may be needed for larger extent
and smaller smoothing. A good setting for this parameter may be
PatchRelativeExtent
/PatchRelativeSigma
.
Finally, it is possible to use vl_covdet
to compute
descriptors on custom feature frames, or to apply affine adaptation
and/or orientation estimation to these.
For example
computes affinely adapted and oriented features on a grid:
vl_covdet
can return additional information about the
features, including the scale spaces and scores for each detected
feature. To do so use the syntax:
This will return a structure info
The last four fields are the peak, edge, orientation, and Laplacian scale scores of the detected features. The first two were discussed before, and the last two are the scores associated to a specific orientation during orientation assignment and to a specific scale during Laplacian scale estimation.
The first two fields are the Gaussian scale space and the
cornerness measure scale space, which can be plotted by means
of vl_plotss
. The following is the of the Gaussian scale
space for our example image:
The following is an example of the corresponding cornerness measure: