Help - Search - Members - Calendar
Full Version: Principal component analysis (PCA)
BrainMeta.com Forum > Science > Mathematics
Unknown
[dohtml]<P> <H1><A NAME="SECTION00610000000000000000">Principal component analysis</A></H1> <P> Principal component analysis (PCA) is a classical statistical method. This linear transform has been widely used in data analysis and compression. Principal component analysis is based on the statistical representation of a random variable. Suppose we have a random vector population <B>x</B>, where <P> <IMG WIDTH=315 HEIGHT=22 ALIGN=BOTTOM ALT="displaymath2085" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img37.gif" > <P> and the mean of that population is denoted by <P> <IMG WIDTH=295 HEIGHT=20 ALIGN=BOTTOM ALT="displaymath2087" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img38.gif" > <P> and the covariance matrix of the same data set is <P> <IMG WIDTH=365 HEIGHT=22 ALIGN=BOTTOM ALT="displaymath2089" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img39.gif" > <P> The components of <IMG WIDTH=22 HEIGHT=27 ALIGN=MIDDLE ALT="tex2html_wrap_inline2091" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img40.gif" > , denoted by <IMG WIDTH=17 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline2093" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img41.gif" > , represent the covariances between the random variable components <IMG WIDTH=15 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline2095" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img42.gif" > and <IMG WIDTH=16 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline2097" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img43.gif" > . The component <IMG WIDTH=16 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline2099" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img44.gif" > is the variance of the component <IMG WIDTH=15 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline2095" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img42.gif" > . The variance of a component indicates the spread of the component values around its mean value. If two components <IMG WIDTH=15 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline2095" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img42.gif" > and <IMG WIDTH=16 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline2097" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img43.gif" > of the data are uncorrelated, their covariance is zero <IMG WIDTH=110 HEIGHT=31 ALIGN=MIDDLE ALT="tex2html_wrap_inline2107" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img45.gif" > . The covariance matrix is, by definition, always symmetric. <P> From a sample of vectors <IMG WIDTH=87 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline2109" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img46.gif" > , we can calculate the sample mean and the sample covariance matrix as the estimates of the mean and the covariance matrix. <P> From a symmetric matrix such as the covariance matrix, we can calculate an orthogonal basis by finding its eigenvalues and eigenvectors. The eigenvectors <IMG WIDTH=14 HEIGHT=18 ALIGN=MIDDLE ALT="tex2html_wrap_inline2111" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img47.gif" > and the corresponding eigenvalues <IMG WIDTH=14 HEIGHT=27 ALIGN=MIDDLE ALT="tex2html_wrap_inline2113" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img48.gif" > are the solutions of the equation <P> <IMG WIDTH=344 HEIGHT=17 ALIGN=BOTTOM ALT="displaymath2115" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img49.gif" > <P> For simplicity we assume that the <IMG WIDTH=14 HEIGHT=27 ALIGN=MIDDLE ALT="tex2html_wrap_inline2113" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img48.gif" > are distinct. These values can be found, for example, by finding the solutions of the characteristic equation <P> <IMG WIDTH=304 HEIGHT=20 ALIGN=BOTTOM ALT="displaymath2119" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img50.gif" > <P> where the <IMG WIDTH=8 HEIGHT=13 ALIGN=BOTTOM ALT="tex2html_wrap_inline2121" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img51.gif" > is the identity matrix having the same order than <IMG WIDTH=22 HEIGHT=27 ALIGN=MIDDLE ALT="tex2html_wrap_inline2091" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img40.gif" > and the |.| denotes the determinant of the matrix. If the data vector has <I>n</I> components, the characteristic equation becomes of order <I>n</I>. This is easy to solve only if <I>n</I> is small. Solving eigenvalues and corresponding eigenvectors is a non-trivial task, and many methods exist. One way to solve the eigenvalue problem is to use a neural solution to the problem [<A HREF="node47.html#Oja83">30</A>]. The data is fed as the input, and the network converges to the wanted solution. <P> By ordering the eigenvectors in the order of descending eigenvalues (largest first), one can create an ordered orthogonal basis with the first eigenvector having the direction of largest variance of the data. In this way, we can find directions in which the data set has the most significant amounts of energy. <P> Suppose one has a data set of which the sample mean and the covariance matrix have been calculated. Let <IMG WIDTH=15 HEIGHT=13 ALIGN=BOTTOM ALT="tex2html_wrap_inline2133" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img52.gif" > be a matrix consisting of eigenvectors of the covariance matrix as the row vectors. <P> By transforming a data vector <B>x</B>, we get <P> <IMG WIDTH=310 HEIGHT=20 ALIGN=BOTTOM ALT="displaymath2135" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img53.gif" > <P> which is a point in the orthogonal coordinate system defined by the eigenvectors. Components of <B>y</B> can be seen as the coordinates in the orthogonal base. We can reconstruct the original data vector <IMG WIDTH=11 HEIGHT=9 ALIGN=BOTTOM ALT="tex2html_wrap_inline1951" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img14.gif" > from <IMG WIDTH=11 HEIGHT=19 ALIGN=MIDDLE ALT="tex2html_wrap_inline2139" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img54.gif" > by <P> <IMG WIDTH=308 HEIGHT=21 ALIGN=BOTTOM ALT="displaymath2141" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img55.gif" > <P> using the property of an orthogonal matrix <IMG WIDTH=85 HEIGHT=16 ALIGN=BOTTOM ALT="tex2html_wrap_inline2143" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img56.gif" > . The <IMG WIDTH=26 HEIGHT=16 ALIGN=BOTTOM ALT="tex2html_wrap_inline2145" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img57.gif" > is the transpose of a matrix <IMG WIDTH=15 HEIGHT=13 ALIGN=BOTTOM ALT="tex2html_wrap_inline2133" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img52.gif" > . The original vector <IMG WIDTH=11 HEIGHT=9 ALIGN=BOTTOM ALT="tex2html_wrap_inline1951" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img14.gif" > was projected on the coordinate axes defined by the orthogonal basis. The original vector was then reconstructed by a linear combination of the orthogonal basis vectors. <P> Instead of using all the eigenvectors of the covariance matrix, we may represent the data in terms of only a few basis vectors of the orthogonal basis. If we denote the matrix having the K first eigenvectors as rows by <IMG WIDTH=28 HEIGHT=27 ALIGN=MIDDLE ALT="tex2html_wrap_inline2151" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img58.gif" > , we can create a similar transformation as seen above <P> <IMG WIDTH=316 HEIGHT=20 ALIGN=BOTTOM ALT="displaymath2153" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img59.gif" > <P> and <P> <IMG WIDTH=315 HEIGHT=21 ALIGN=BOTTOM ALT="displaymath2155" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img60.gif" > <P> <P> This means that we project the original data vector on the coordinate axes having the dimension <I>K</I> and transforming the vector back by a linear combination of the basis vectors. This minimizes the mean-square error between the data and this representation with given number of eigenvectors. <P> If the data is concentrated in a linear subspace, this provides a way to compress data without losing much information and simplifying the representation. By picking the eigenvectors having the largest eigenvalues we lose as little information as possible in the mean-square sense. One can e.g. choose a fixed number of eigenvectors and their respective eigenvalues and get a consistent representation, or abstraction of the data. This preserves a varying amount of energy of the original data. Alternatively, we can choose approximately the same amount of energy and a varying amount of eigenvectors and their respective eigenvalues. This would in turn give approximately consistent amount of information in the expense of varying representations with regard to the dimension of the subspace. <P> We are here faced with contradictory goals: On one hand, we should simplify the problem by reducing the dimension of the representation. On the other hand we want to preserve as much as possible of the original information content. PCA offers a convenient way to control the trade-off between loosing information and simplifying the problem at hand. <P> As it will be noted later, it may be possible to create piecewise linear models by dividing the input data to smaller regions and fitting linear models locally to the data. <P> Now, consider a small example showing the characteristics of the eigenvectors. Some artificial data has been generated, which is illustrated in the Figure <A HREF="node30.html#figpcaexample">3.1</A>. The small dots are the points in the data set. <P> <P><A NAME="814"> </A><A NAME="figpcaexample"> </A> <IMG WIDTH=282 HEIGHT=220 ALIGN=BOTTOM ALT="figure810" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img61.gif" > <BR> <STRONG>Figure 3.1:</STRONG> Eigenvectors of the artificially created data<BR> <P> <P> Sample mean and sample covariance matrix can easily be calculated from the data. Eigenvectors and eigenvalues can be calculated from the covariance matrix. The directions of eigenvectors are drawn in the Figure as lines. The first eigenvector having the largest eigenvalue points to the direction of largest variance (right and upwards) whereas the second eigenvector is orthogonal to the first one (pointing to left and upwards). In this example the first eigenvalue corresponding to the first eigenvector is <IMG WIDTH=95 HEIGHT=27 ALIGN=MIDDLE ALT="tex2html_wrap_inline2159" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img62.gif" > while the other eigenvalue is <IMG WIDTH=94 HEIGHT=27 ALIGN=MIDDLE ALT="tex2html_wrap_inline2161" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img63.gif" > . By comparing the values of eigenvalues to the total sum of eigenvalues, we can get an idea how much of the energy is concentrated along the particular eigenvector. In this case, the first eigenvector contains almost all the energy. The data could be well approximated with a one-dimensional representation. <P> Sometimes it is desirable to investigate the behavior of the system under small changes. Assume that this system, or phenomenon is constrained to a <I>n</I>-dimensional manifold and can be approximated with a linear manifold. Suppose one has a small change along one of the coordinate axes in the original coordinate system. If the data from the phenomenon is concentrated in a subspace, we can project this small change <IMG WIDTH=15 HEIGHT=28 ALIGN=MIDDLE ALT="tex2html_wrap_inline2165" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img64.gif" > to the approximative subspace built with PCA by projecting <IMG WIDTH=15 HEIGHT=28 ALIGN=MIDDLE ALT="tex2html_wrap_inline2165" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img64.gif" > on all the basis vectors in the linear subspace by <P> <IMG WIDTH=292 HEIGHT=20 ALIGN=BOTTOM ALT="displaymath2169" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img65.gif" > <P> where the matrix <IMG WIDTH=28 HEIGHT=27 ALIGN=MIDDLE ALT="tex2html_wrap_inline2151" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img58.gif" > has the K first eigenvectors as rows. Subspace has then a dimension of K. <IMG WIDTH=15 HEIGHT=29 ALIGN=MIDDLE ALT="tex2html_wrap_inline2173" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img66.gif" > represents the change caused by the original small change. This can be transformed back with a change of basis by taking a linear combination of the basis vectors by <P> <IMG WIDTH=298 HEIGHT=23 ALIGN=BOTTOM ALT="displaymath2175" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img67.gif" > <P> <P> Then, we get the typical change in the real-world coordinate system caused by a small change <IMG WIDTH=15 HEIGHT=28 ALIGN=MIDDLE ALT="tex2html_wrap_inline2165" SRC="http://www.cis.hut.fi/~jhollmen/dippa/img64.gif" > by assuming that the phenomenon constrains the system to have values in the limited subspace only.

[dohtml]
HiddenVariable
Haha, your html doesn't work here!
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.


Home     |     About     |    Research     |    Forum     |    Feedback  


Copyright © BrainMeta. All rights reserved.
Terms of Use  |  Last Modified Tue Jan 17 2006 12:39 am