PCA Implementation with Step-by-Step Example

By
On:

Let me walk you through each step of PCA with this example:

  1. Data Preparation
    • We start with a 10×2 dataset containing two features
    • Each row represents one observation with two measurements
  2. Standardization (Step 1)
    • Center the data by subtracting the mean
    • Scale the data to unit variance
    • This ensures all features contribute equally to the analysis
  3. Covariance Matrix Calculation (Step 2)
    • Compute the covariance matrix to understand relationships between variables
    • For our 2D data, this results in a 2×2 matrix
    • The diagonal elements represent variances
    • Off-diagonal elements represent covariances between variables
  4. Eigendecomposition (Step 3)
    • Calculate eigenvalues and eigenvectors of the covariance matrix
    • Eigenvalues tell us the amount of variance explained by each principal component
    • Eigenvectors give us the direction of the principal components
  5. Principal Components (Step 4)
    • Project the standardized data onto the principal components
    • The first principal component (PC1) captures the maximum variance
    • The second principal component (PC2) is orthogonal to PC1

Key Results Interpretation:

  1. Explained Variance Ratio
    • Shows how much variance each principal component explains
    • Helps determine how many components to keep
    • In this example, if PC1 explains >80% of variance, we might only keep one component
  2. Transformed Data
    • The final coordinates in the new principal component space
    • Can be used for dimensionality reduction by keeping fewer components 
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Create sample dataset
np.random.seed(42)
X = np.array([
    [2.5, 2.4],
    [0.5, 0.7],
    [2.2, 2.9],
    [1.9, 2.2],
    [3.1, 3.0],
    [2.3, 2.7],
    [2.0, 1.6],
    [1.0, 1.1],
    [1.5, 1.6],
    [1.1, 0.9]
])

def plot_data(X, title):
    plt.figure(figsize=(8, 6))
    plt.scatter(X[:, 0], X[:, 1], c='blue', alpha=0.5)
    plt.title(title)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.grid(True)
    plt.axis('equal')
    return plt

# Step 1: Standardize the data
scaler = StandardScaler()
X_standardized = scaler.fit_transform(X)

# Step 2: Calculate covariance matrix
covariance_matrix = np.cov(X_standardized.T)

# Step 3: Calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(covariance_matrix)

# Sort eigenvalues and eigenvectors in descending order
idx = eigenvalues.argsort()[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]

# Step 4: Project data onto principal components
PC = X_standardized.dot(eigenvectors)

# Calculate explained variance ratio
explained_variance_ratio = eigenvalues / np.sum(eigenvalues)

# Print results
print("Original Data:")
print(X)
print("\nStandardized Data:")
print(X_standardized)
print("\nCovariance Matrix:")
print(covariance_matrix)
print("\nEigenvalues:")
print(eigenvalues)
print("\nEigenvectors:")
print(eigenvectors)
print("\nExplained Variance Ratio:")
print(explained_variance_ratio)
print("\nTransformed Data (Principal Components):")
print(PC)

Read also: How to code a binary classifier in python

Santhakumar Raja

Hello The goal of this blog is to keep students informed about developments in the field of education. encourages pupils to improve as writers and readers.

For Feedback - techactive6@gmail.com