Claire Corlett

Fish Food, Fish Tanks, and More
Mod-02 Lec-22 Fisher’s LDA

Mod-02 Lec-22 Fisher’s LDA


In this last section we were discussing concept
of supervised method of pattern classification which is based on the fishers linear discriminate
analysis or LDA it is sometimes called FLD as well in the pattern recognition literature
and we are just introduced 2 different matrices the one is within class scatter matrix and
there is between class scatter matrix we will look at our expressions one more time and
then look at see some important properties and expressions of LDA and we will winde up
this class with a few examples which can be hand out with you okay. So let us look back what we are saying is
LDA is a method of supervise learning it is one of the classifiers which needs set of
training samples for learning a set of parameters in this case the parameters are SW and SB
the within class scatter and the between class scatter expletively the learning set is labeled
unlike the PCA where we had unlabeled data and that was also called unsupervised learning
so it is a class specific method in sense it tries to shape the scatter in order to
make it more reliable for classification. Remember in the case of PCA we are interested
in trying to find out the directions in which the maximum scatter of the entire data exists
that is what principle components analysis or PCA does in case of LDA we want to maximize
class separability okay that means in some scene the between class scatter if you take
a 2 class problems we want to find a certain direction of the data within the data certain
dimension in which the distance between the 2 class means or that two clusters of the
2 class become large or become more. And in the same direction the within class
scatter becomes less okay if you recollect the animation slide which we had long ago
in earlier class when we were trying to distinguish between classification verse clustering we
also way say that it is better and easier for a classifier to perform better if the
between class distance or the distance between 2 clusters centers or the 2 class themselves
is very large and the clusters are very compact. So if you can find such dimension or set of
dimensions what can be considered also as sub space okay with respect to the origin
higher dimension of the data if you can find sub space where the directions point out that
we going to have or is it satisfied with satisfies a constrained that the inter class distance
is very large and the inter cluster distance is very small that is what is trying to achieve
okay so that is what is the meaning of the sentence which you see now that the class
specific method it tries to shape the scatter in order to make more reliable or better fort
classification okay. So this is accomplished with the help of trying
to find out a weight matrix w which maximizes the ratio of between class scatter SB and
with class scatter SW where the terms we will define them again for your ease of understanding
that this is the between class scatter SB if you at the expression here the NI is the
number of samples per the for particular class Xi let us say the class label is Xi µi is
the mean for a particular class Xi and the µ is the overall µ of the data okay. So it is something so if you look at Py- B
µ it is basically that the individual process centers are normalized with respect to the
mean okay of the entire data set and summation over Ni will actually give the number of samples
but a particular classes are in weight age C or c here ware whatever you see is the number
of overall number of classes okay so you need to sum this over al class. That is the between class scatter let us look
the expression once again for SW which is the with class scatter matrix you need to
sum it over all class all right but you sum it over now this expression is similar to
the PCA it is an outer product of the samples with respect to µi that in case of PCA where
the overall data mean here which was µ the same µ which you see here on the left hand
side would have occurred here in the case of PCA but in case of LDA the you have the
mean subtracted from the data which is the individual class means. Okay the µi is the mean of the class Xi which
you have to take that and take the outer product you sum it over all the samples for particular
class in fact Xk belongs to the samples set Xi that means basically Ni the summation will
go over Ni number of times and the over c number of class okay so the total number of
summation term which will be having is basically N multiplied by c SW and SB have the same
dimension as the scatter matrix we had for PCA the dimension is the same. But the matrices themselves are little bit
different okay if you look at the expressions both are in some sense the outer product put
one of them is computed with respect to the means only the other is computed with respect
to the samples means subtracted with respect to the class specific means okay the PCA we
subtracted the overall determine I am repeating again here you’re subtracting the class means. Okay so SW and SB is what you have okay and
nest so what we are trying to do is find out an optimal W and we will just talk a little
bit later on that what happens if SW is singular but assuming that the within class scatter
matrix is nonsingular you try to find out an optimal value of W which maximizes this
expression okay so we try to find out a W which maximizes expression and in the process
of doing so you get an optimal W which can written in terms of a set of Eigen vectors
as give here m sort of Eigen vectors. and they are the Eigen vectors of this particular
matrix which is SW-1 x SB in fact what you are doing here is trying to find out the m
largest Eigen vectors of this characteristic equation which is the if you think of SW-1
x SB as an overall scatter matrix S then you are actually trying to find out the Eigen
vectors and Eigen values ? is the corresponding Eigen values of this particular matrix okay
and this is the reason why it is essential that the within class scatter matrix is a
nonsingular because you need to obtain it is inverse of that matrix then multiply it
with SB and then find the corresponding Eigen values and Eigen vectors. This is the basic approach for FLD or LDA
correct and I do that you need to find pretty inverse of the matrix SW the question comes
is SW always singular we will have a look at it very soon I will just give key points
with respect to some properties of within class scatter matrix SW. So there are actually at the most C+1 – non
zero Eigen values in the ? so if you look at what is called the Eigen spectrum of this
particular matrix the upper bound of m here is basically number class – 1. So the restriction on the number of non zero
Eigen values and the corresponding Eigen vectors for the W will actually depend on the characteristic
or properties of SW weather it is singular or not these are some of main criteria which
we need to follow. SW is singular if the total number of samples
N

5 comments on “Mod-02 Lec-22 Fisher’s LDA

Leave a Reply

Your email address will not be published. Required fields are marked *