PCA Implementation with Step-by-Step Example

By
On:

Let me walk you through each step of PCA with this example:

  1. Data Preparation
    • We start with a 10×2 dataset containing two features
    • Each row represents one observation with two measurements
  2. Standardization (Step 1)
    • Center the data by subtracting the mean
    • Scale the data to unit variance
    • This ensures all features contribute equally to the analysis
  3. Covariance Matrix Calculation (Step 2)
    • Compute the covariance matrix to understand relationships between variables
    • For our 2D data, this results in a 2×2 matrix
    • The diagonal elements represent variances
    • Off-diagonal elements represent covariances between variables
  4. Eigendecomposition (Step 3)
    • Calculate eigenvalues and eigenvectors of the covariance matrix
    • Eigenvalues tell us the amount of variance explained by each principal component
    • Eigenvectors give us the direction of the principal components
  5. Principal Components (Step 4)
    • Project the standardized data onto the principal components
    • The first principal component (PC1) captures the maximum variance
    • The second principal component (PC2) is orthogonal to PC1

Key Results Interpretation:

  1. Explained Variance Ratio
    • Shows how much variance each principal component explains
    • Helps determine how many components to keep
    • In this example, if PC1 explains >80% of variance, we might only keep one component
  2. Transformed Data
    • The final coordinates in the new principal component space
    • Can be used for dimensionality reduction by keeping fewer components 
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Create sample dataset
np.random.seed(42)
X = np.array([
    [2.5, 2.4],
    [0.5, 0.7],
    [2.2, 2.9],
    [1.9, 2.2],
    [3.1, 3.0],
    [2.3, 2.7],
    [2.0, 1.6],
    [1.0, 1.1],
    [1.5, 1.6],
    [1.1, 0.9]
])

def plot_data(X, title):
    plt.figure(figsize=(8, 6))
    plt.scatter(X[:, 0], X[:, 1], c='blue', alpha=0.5)
    plt.title(title)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.grid(True)
    plt.axis('equal')
    return plt

# Step 1: Standardize the data
scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

# Step 2: Calculate covariance matrix
covariance_matrix = np.cov(X_standardized.T)

# Step 3: Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)

# Sort eigenvalues and eigenvectors in descending order
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

# Step 4: Project data onto principal components
PC = X_standardized.dot(eigenvectors)

# Calculate explained variance ratio
explained_variance_ratio = eigenvalues / np.sum(eigenvalues)

# Print results
print("Original Data:")
print(X)
print("\nStandardized Data:")
print(X_standardized)
print("\nCovariance Matrix:")
print(covariance_matrix)
print("\nEigenvalues:")
print(eigenvalues)
print("\nEigenvectors:")
print(eigenvectors)
print("\nExplained Variance Ratio:")
print(explained_variance_ratio)
print("\nTransformed Data (Principal Components):")
print(PC)

Read also: How to code a binary classifier in python

Santhakumar Raja

I am the founder of Pedagogy Zone, a dedicated education platform that provides reliable and up-to-date information on academic trends, learning resources, and educational developments.

For Feedback - techactive6@gmail.com