PCA for Visualization and Dimension Reduction….

source: https://i.stack.imgur.com/lNHqt.gif

Table of Contents:

  1. PCA intuition (geometric)
  2. Column standardization
  3. Optimization problem
  4. Calculation of covariance matrix
  5. Eigen values and Eigen vectors
  6. Interpretation of Eigen values
  7. Summary
  8. Implementation of PCA from scratch
  9. Implementation of PCA using scikit learn
  10. Limitations of PCA
  11. Conclusion
  12. References

PCA intuition:

Principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest. Let us consider a 2-D dataset with feature1(f1) in the x-axis and feature2(f2) in the y-axis.

Data points on f1 and f2 axis

Column Standardization:

Now, let us standardize the data points.

converting data points
Column standardization
Column-centered graph
maximum variance position

Optimization problem:

We now want to find the direction u1 such that the variance of Xi’s projected onto f1' is maximized. Let us understand this statement by looking at the picture below.

data point projected on the plane

Calculation of co-variance matrix:

As the data column is standardized, hence the second term is zero. So, our final equation becomes,

Eigen values and Eigen vectors:

Using the power of Lagrange multipliers, we can rewrite the above equation as

Interpretation of eigenvalues(ƛ):

Let us understand the geometric interpretation of eigenvalues by looking at different cases.

100% data representation on f1'
75% data representation on f1'
60% data representation on f1'
50% data representation on f1'


  1. Perform column standardization.
  2. Calculate the covariance matrix.
  3. Compute the eigenvalues and eigenvectors from the covariance matrix.
  4. Sort these eigenvalue pairs in descending order of magnitude.
  5. Select the top eigenvalues which retain maximum variance.
  6. Perform data transformation original data by using the eigenvectors corresponding to top eigenvalues.

Implementation of PCA from scratch:

Lets work with digit recognizer dataset. It has 42000 rows and 784 columns. Let me break this dataset. Say, you have dataset of digit images and classes ranging from 0–9. Each image has dimension of 28 x 28 (784). I hope you can correlate now. Each row contains a label from 0–9 and 784 pixel values as a column.

#importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# reading the data
d0 = pd.read_csv('train.csv')
# save the labels into a variable l.
labels = d0['label']
# Drop the label feature and store the pixel data in d.
data = d0.drop("label",axis=1)
# Finding the size of data
# display or plot a number.
idx = 1
# reshape from 1d to 2d pixel array
grid_data = data.iloc[idx].to_numpy().reshape(28,28)
plt.imshow(grid_data, interpolation = "none", cmap = "gray")
The value stored in the first index
# Data-preprocessing: Standardizing the datafrom sklearn.preprocessing import StandardScaler
standardized_data = StandardScaler().fit_transform(data)
(42000, 784)
#find the co-variance matrix which is : A^T * A
sample_data = standardized_data
# matrix multiplication using numpy
covar_matrix = np.matmul(sample_data.T , sample_data)
print ( "The shape of variance matrix = ", covar_matrix.shape)The shape of variance matrix = (784, 784)
from scipy.linalg import eigh# the parameter 'eigvals' is defined (low value to heigh value) 
# eigh function will return the eigen values in asending order
# this code generates only the top 2 (782 and 783) eigenvalues.
values, vectors = eigh(covar_matrix, eigvals=(782,783))
print("Shape of eigen vectors = ",vectors.shape)
# converting the eigen vectors into (2,d) shape for easyness of further computations
vectors = vectors.T
print("Updated shape of eigen vectors = ",vectors.shape)Shape of eigen vectors = (784, 2)
Updated shape of eigen vectors = (2, 784)
import matplotlib.pyplot as plt
new_coordinates = np.matmul(vectors, sample_data.T)
print (" resultant new data points' shape ", vectors.shape, "X", sample_data.T.shape," = ", new_coordinates.shape)resultant new data points' shape (2, 784) X (784, 42000) = (2, 42000)
import pandas as pd# appending label to the 2d projected data
new_coordinates = np.vstack((new_coordinates, labels)).T
# creating a new data frame for ploting the labeled points.
dataframe = pd.DataFrame(data=new_coordinates, columns=("1st_principal", "2nd_principal", "label"))
1st_principal 2nd_principal label
0 -5.226445 -5.140478 1.0
1 6.032996 19.292332 0.0
2 -1.705813 -7.644503 1.0
3 5.836139 -0.474207 4.0
4 6.024818 26.559574 0.0
import seaborn as sn
sn.FacetGrid(dataframe, hue="label", size=6).map(plt.scatter, '1st_principal', '2nd_principal').add_legend()

Implementation of PCA using Scikit-learn:

Lets perform the same operation with Scikit-learn

# initializing the pca
from sklearn import decomposition
pca = decomposition.PCA()
# configuring the parameteres
# the number of components = 2
pca.n_components = 2
pca_data = pca.fit_transform(sample_data)
# pca_reduced will contain the 2-d projects of simple data
print("shape of pca_reduced.shape = ", pca_data.shape)
# attaching the label for each 2-d data point
pca_data = np.vstack((pca_data.T, labels)).T
# creating a new data fram which help us in ploting the result data
pca_df = pd.DataFrame(data=pca_data, columns=("1st_principal", "2nd_principal", "label"))
sn.FacetGrid(pca_df, hue="label", size=6).map(plt.scatter, '1st_principal', '2nd_principal').add_legend()
# PCA for dimensionality redcution (non-visualization)pca.n_components = 784
pca_data = pca.fit_transform(sample_data)
percentage_var_explained = pca.explained_variance_ / np.sum(pca.explained_variance_);cum_var_explained = np.cumsum(percentage_var_explained)# Plot the PCA spectrum
plt.figure(1, figsize=(6, 4))
plt.plot(cum_var_explained, linewidth=2)

Limitations of PCA:



PCA is simple algorithm which assumes linearly correlated dataset and convert high dimension data into low dimension. It is observed that PCA performs good during conversion of data but not in visualization. However, there are better algorithm like T-SNE, which assumes non- linear dataset and preserves local structure of the dataset.




Get the Medium app