第5回 MATLAB seminar 2021年07月06日

Introduction to Principal Component Analysis (PCA). Here, MATLAB code and the notes used in the seminar are shared.

In this article only basic fundamental examples are given. If you want to know more on it, please access the matlab website.

NOTE


You can download the notes used in the seminar from this link.

Example

Dataset

We use "load fisheriris" dataset.

MATLAB
  1. load fisheriris.mat % load dataset

Check meas and species are loaded into your Workspace. Type meas and species.

  • meas is 4-dimension data for 150 iris (name of a flower) data.
  • species is 1-dimension data for 150 iris (name of a flower) data. These are names of iris species.

MATLAB
  1. meas % type it
  2.  
  3. % MATLAB OUTPUT
  4. meas =
  5.  
  6. 5.1000 3.5000 1.4000 0.2000
  7. 4.9000 3.0000 1.4000 0.2000
  8. 4.7000 3.2000 1.3000 0.2000
  9. 4.6000 3.1000 1.5000 0.2000
  10. :
  11.  
  12. species % type it
  13.  
  14. % MATLAB OUTPUT
  15. species =
  16.  
  17. 150×1 cell array
  18.  
  19. {'setosa' }
  20. {'setosa' }
  21. {'setosa' }
  22. :

Normalize the data

Normalize the data for PCA.

MATLAB
  1. d = normalize(meas) % type it
  2.  
  3. % OUTPUT (normalized data)
  4. d =
  5.  
  6. -0.8977 1.0156 -1.3358 -1.3111
  7. -1.1392 -0.1315 -1.3358 -1.3111
  8. -1.3807 0.3273 -1.3924 -1.3111
  9. -1.5015 0.0979 -1.2791 -1.3111
  10. -1.0184 1.2450 -1.3358 -1.3111
  11. -0.5354 1.9333 -1.1658 -1.0487
  12. :

Carry out PCA

Extremely easy.

MATLAB
  1. [COEFF, SCORE, LATENT] = pca(d);

COEFF is a matrix to convert the data. You can visualize the data for example,

MATLAB
  1. heatmap(COEFF)
name

LATENT is a principal component variances. You can compute contribution ratio for example,

MATLAB
  1. cr = 100*LATENT / sum(LATENT);
  2. disp(['Contribution ratio of the PC1 is ' num2str(cr(1)) ' %'])
  3. disp(['Contribution ratio of the PC2 is ' num2str(cr(2)) ' %'])
  4. disp(['Contribution ratio of the PC3 is ' num2str(cr(3)) ' %'])
  5. disp(['Contribution ratio of the PC4 is ' num2str(cr(4)) ' %'])
  6.  
  7. % OUTPUT
  8. Contribution ratio of the PC1 is 72.9624 %
  9. Contribution ratio of the PC2 is 22.8508 %
  10. Contribution ratio of the PC3 is 3.6689 %
  11. Contribution ratio of the PC4 is 0.51787 %

SCORE is the data on the new axes. You can compute contribution ratio for exmaple. Please verify that the SCORE is also 150*4 Matrix.

MATLAB
  1. SCORE
  2.  
  3. % OUTPUT
  4. SCORE =
  5.  
  6. -2.2571 0.4784 0.1273 -0.0241
  7. -2.0740 -0.6719 0.2338 -0.1027
  8. -2.3563 -0.3408 -0.0441 -0.0283
  9. -2.2917 -0.5954 -0.0910 0.0657
  10. -2.3819 0.6447 -0.0157 0.0358
  11. :

Visualization

If you want to visualize PC1 and PC2,

MATLAB
  1. gscatter(SCORE(:, 1), SCORE(:, 2), species)
  2. xlabel('PC1')
  3. ylabel('PC2')
name

If you want to visualize PC3 and PC4,

MATLAB
  1. gscatter(SCORE(:, 3), SCORE(:, 4), species)
  2. xlabel('PC3')
  3. ylabel('PC4')
name

この記事のTOP    BACK    TOP