## Mod-02 Lec-22 Fisher’s LDA

In this last section we were discussing concept

of supervised method of pattern classification which is based on the fishers linear discriminate

analysis or LDA it is sometimes called FLD as well in the pattern recognition literature

and we are just introduced 2 different matrices the one is within class scatter matrix and

there is between class scatter matrix we will look at our expressions one more time and

then look at see some important properties and expressions of LDA and we will winde up

this class with a few examples which can be hand out with you okay. So let us look back what we are saying is

LDA is a method of supervise learning it is one of the classifiers which needs set of

training samples for learning a set of parameters in this case the parameters are SW and SB

the within class scatter and the between class scatter expletively the learning set is labeled

unlike the PCA where we had unlabeled data and that was also called unsupervised learning

so it is a class specific method in sense it tries to shape the scatter in order to

make it more reliable for classification. Remember in the case of PCA we are interested

in trying to find out the directions in which the maximum scatter of the entire data exists

that is what principle components analysis or PCA does in case of LDA we want to maximize

class separability okay that means in some scene the between class scatter if you take

a 2 class problems we want to find a certain direction of the data within the data certain

dimension in which the distance between the 2 class means or that two clusters of the

2 class become large or become more. And in the same direction the within class

scatter becomes less okay if you recollect the animation slide which we had long ago

in earlier class when we were trying to distinguish between classification verse clustering we

also way say that it is better and easier for a classifier to perform better if the

between class distance or the distance between 2 clusters centers or the 2 class themselves

is very large and the clusters are very compact. So if you can find such dimension or set of

dimensions what can be considered also as sub space okay with respect to the origin

higher dimension of the data if you can find sub space where the directions point out that

we going to have or is it satisfied with satisfies a constrained that the inter class distance

is very large and the inter cluster distance is very small that is what is trying to achieve

okay so that is what is the meaning of the sentence which you see now that the class

specific method it tries to shape the scatter in order to make more reliable or better fort

classification okay. So this is accomplished with the help of trying

to find out a weight matrix w which maximizes the ratio of between class scatter SB and

with class scatter SW where the terms we will define them again for your ease of understanding

that this is the between class scatter SB if you at the expression here the NI is the

number of samples per the for particular class Xi let us say the class label is Xi µi is

the mean for a particular class Xi and the µ is the overall µ of the data okay. So it is something so if you look at Py- B

µ it is basically that the individual process centers are normalized with respect to the

mean okay of the entire data set and summation over Ni will actually give the number of samples

but a particular classes are in weight age C or c here ware whatever you see is the number

of overall number of classes okay so you need to sum this over al class. That is the between class scatter let us look

the expression once again for SW which is the with class scatter matrix you need to

sum it over all class all right but you sum it over now this expression is similar to

the PCA it is an outer product of the samples with respect to µi that in case of PCA where

the overall data mean here which was µ the same µ which you see here on the left hand

side would have occurred here in the case of PCA but in case of LDA the you have the

mean subtracted from the data which is the individual class means. Okay the µi is the mean of the class Xi which

you have to take that and take the outer product you sum it over all the samples for particular

class in fact Xk belongs to the samples set Xi that means basically Ni the summation will

go over Ni number of times and the over c number of class okay so the total number of

summation term which will be having is basically N multiplied by c SW and SB have the same

dimension as the scatter matrix we had for PCA the dimension is the same. But the matrices themselves are little bit

different okay if you look at the expressions both are in some sense the outer product put

one of them is computed with respect to the means only the other is computed with respect

to the samples means subtracted with respect to the class specific means okay the PCA we

subtracted the overall determine I am repeating again here you’re subtracting the class means. Okay so SW and SB is what you have okay and

nest so what we are trying to do is find out an optimal W and we will just talk a little

bit later on that what happens if SW is singular but assuming that the within class scatter

matrix is nonsingular you try to find out an optimal value of W which maximizes this

expression okay so we try to find out a W which maximizes expression and in the process

of doing so you get an optimal W which can written in terms of a set of Eigen vectors

as give here m sort of Eigen vectors. and they are the Eigen vectors of this particular

matrix which is SW-1 x SB in fact what you are doing here is trying to find out the m

largest Eigen vectors of this characteristic equation which is the if you think of SW-1

x SB as an overall scatter matrix S then you are actually trying to find out the Eigen

vectors and Eigen values ? is the corresponding Eigen values of this particular matrix okay

and this is the reason why it is essential that the within class scatter matrix is a

nonsingular because you need to obtain it is inverse of that matrix then multiply it

with SB and then find the corresponding Eigen values and Eigen vectors. This is the basic approach for FLD or LDA

correct and I do that you need to find pretty inverse of the matrix SW the question comes

is SW always singular we will have a look at it very soon I will just give key points

with respect to some properties of within class scatter matrix SW. So there are actually at the most C+1 – non

zero Eigen values in the ? so if you look at what is called the Eigen spectrum of this

particular matrix the upper bound of m here is basically number class – 1. So the restriction on the number of non zero

Eigen values and the corresponding Eigen vectors for the W will actually depend on the characteristic

or properties of SW weather it is singular or not these are some of main criteria which

we need to follow. SW is singular if the total number of samples

N

Fucking brilliant! Thank you!

This is really useful, thank you so much professor

Excellent explanation of Fisher's Sample Linear Discriminants.

this man hate whiteboard.he always talk

he blames iit