The most popular \u201cfactors\u201d for analyzing equity returns are the three Fama-French factors (RMRF, HML and SMB).\u00a0 The RMRF factor is the market return minus the risk free rate, and the HML and SMB factors are\u00a0created by sorting portfolios into several \u201cvalue\u201d and \u201csize\u201d buckets and forming long-short portfolios.<\/p>\n
The three factors can be used to explain, though not predict, the returns for a variety of diversified portfolios. Many posts on this blog use the Fama-French 3 Factor (FF3F) model, including a tutorial on running the 3-factor regression using R<\/a>.<\/p>\n An alternative way to construct factors is to use linear algebra to create \u201coptimal\u201d factors using a technique such as principal component analysis (PCA). This post will show how to construct the statistically optimal factors for the Fama-French 25 portfolios (sorted by size and value).<\/p>\n In my next post, I will compare these PCA factors to the Fama-French factors.<\/p>\n The data used for this analysis comes from the Kenneth French website<\/a>.\u00a0 I\u2019m using the Fama-French 25 (FF25) portfolio returns which are available in the file titled \u201c25 Portfolios Formed on Size and Book-to-Market\u201d.\u00a0 I\u2019m using the returns from 1962 through 2012 since the pre-Compustat era portfolios have relatively few stocks.<\/p>\n The Fama-French factors are also available on the Kenneth French website in the file titled \u201cFama\/French Factors\u201d.\u00a0 In this post, I will use not use the Fama-French factors themselves, but I do use the factor data file to get the monthly risk-free rate.<\/p>\n For reference, the arithmetic average monthly returns of the FF25 portfolios are plotted for the date range used in this analysis.\u00a0 The Octave script to create this plot<\/a> was provided in an earlier post.<\/p>\n <\/a><\/span><\/p>\n PCA is a method for constructing factors which are uncorrelated with each other and which allow us to maximize R^2 when running regressions on the\u00a0target portfolios. We can find the PCA factors with just a few lines of Octave code!<\/p>\n Step 1: Construct a matrix of excess returns<\/b><\/p>\n If we have a set of portfolios, we can show the returns in a matrix where each column is a\u00a0portfolio and each row is a return.<\/p>\n For example, if I have 25 portfolios with 612 months of return data, I can structure these returns in a 612 by 25 matrix, where the first row is the earliest return for each portfolio.\u00a0 Generally, when working with equity return factor models, the monthly risk-free rate is deducted from each monthly return, so the returns used are excess<\/i> returns.<\/p>\n For this example, I have \u201cpreassembled\u201d the excess return matrix and I read it in from a text file with the following Octave command:<\/p>\n rx = load(‘excessreturns.txt’)<\/i><\/strong><\/p>\n Step 2: Calculate the Covariance Matrix<\/b><\/p>\n The Octave function \u201ccov\u201d can be used to calculate the covariance matrix for our return data.<\/p>\n Sigma_rx=cov(rx)<\/i><\/strong><\/p>\n Step 3: Calculate the Eigenvalue Decomposition<\/b><\/p>\n Once we have the covariance matrix, we can use Octave to find the eigenvalue decomposition using the \u201ceig\u201d function:<\/p>\n [V,Lambda]=eig(Sigma)<\/i><\/strong><\/p>\n The V<\/strong> columns are the eigenvectors and the diagonal elements of Lambda<\/strong> are the eigenvalues.<\/p>\n Step 4: Sort by Largest Standard Deviation and Extract Factor Loads<\/b><\/p>\n We need to sort the eigenvalues so that we can determine which factors will capture the highest variance of the data, so we first pull out the diagonal elements of Lambda<\/strong> and sort from largest to smallest.\u00a0 This sort order is then applied to the eigenvectors.<\/p>\n [stdevs,order]=sort(diag(Lambda)’.^0.5,’descend’)<\/strong><\/em><\/p>\n V = V(:,order)<\/strong><\/p>\n Note that I’m taking the square root, so the variances become standard deviations.\u00a0 This step is not necessary, but I like to look at the standard deviations rather than the variances when plotting.\u00a0 The standard deviations for each vector can be plotted using the “bar” command.<\/p>\n bar(stdevs)<\/strong><\/p>\n <\/a><\/p>\n You can see that standard deviation captured by each additional factor trails off rapidly.\u00a0 So, for this analysis, I’m just going to use the first three factors.<\/p>\n Step 5: Calculate the PCA Factors<\/b><\/p>\n The columns of V<\/strong> are the eigenvectors, and we sorted V<\/strong> so the eigenvectors accounting for the greatest variance in the data are ordered first.\u00a0 Since the eigenvector column can be interpreted as a factor load for each of our original portfolios, we can create the factors themselves with\u00a0a matrix\u00a0multiplication.\u00a0 Note that I don’t “center” the matrix rx<\/strong>, which is usually done for PCA.\u00a0 In this case, I want to retain the mean because these new factors are essentially portfolio returns, and I want to be able to get meaningful alphas when I use them in regressions.<\/p>\n factors = rx*V<\/strong><\/em><\/p>\n The first three factors that we are interested in will be the first three columns of the new factors<\/strong> matrix.<\/p>\n f1 = factors(:,1)<\/strong><\/em> We now have our three PCA factors!<\/p>\n Example Code (Simple Version):<\/b><\/p>\n The first three PCA factors generated by the script above can be used to analyze portfolios \u00a0in a manner similar to the Fama-French factors. \u00a0However, unlike the Fama-French factors, the PCA factors do not have a simple\u00a0interpretation such as “size” or “value”. \u00a0Nevertheless, Arbitrage Pricing Theory suggests that we should see small alphas if the R^2 for the regression is high.<\/p>\n For reference, I’ve posted the three PCA equity factors in a Google Docs Spreadsheet<\/a>. It is an interesting exercise to run some regressions using these factors and to compare the results with the FF3F model. The PCA factors should be expected to give higher R^2 for the FF25 portfolios since they are optimized to fit\u00a0that data, but the results could be better or worse when tested on other types of portfolios.<\/p>\n For the FF25 portfolios, we don’t need to run any regressions to get the factor loads. The factor loads for each PCA factor are simply the corresponding\u00a0columns of V<\/strong>.<\/p>\n I’ve plotted the FF25 factor loads for the first three PCA equity factors here.\u00a0 The code to generate these plots is shown at the end of this post.<\/p>\n <\/a><\/p>\n <\/a><\/p>\n At first glance, these PCA factor loads for the FF25 don\u2019t bear much resemblance to what we would see with the FF3F model. For example, we would expect HML to have a loading which was relatively flat along the size axis, but which increased as we moved towards higher \u201cvalue\u201d. We would expect SMB loadings to be relatively flat across the value dimension, but to step up as we moved towards smaller size.<\/p>\n However, the PCA factors have certain properties that real world risk factors won’t necessarily have. The PCA factors are uncorrelated, they are normalized, and each in turn captures as much of the remaining covariance as possible. So, even if the PCA is successfully picking up covariance caused by real risk there may be equally effective factors (effective in terms of getting low alpha and high R^2) which are\u00a0 linear combinations of these PCA factors. In fact, some linear combination of these three factors may give use something very similar to the FF3F factors.\u00a0 In the next post, I’ll dig into this idea further.<\/p>\n Example Code (Extended Version):<\/strong><\/p>\n Main Script:<\/p>\n Function used by main script. This script should be saved in the working directory as “fivebyfive.m”.<\/p>\n The most popular \u201cfactors\u201d for analyzing equity returns are the three Fama-French factors (RMRF, HML and SMB).\u00a0 The RMRF factor is the market return minus the risk free rate, and the HML and SMB factors are\u00a0created by sorting portfolios into several \u201cvalue\u201d and \u201csize\u201d buckets and forming long-short portfolios. The three factors can be used […]<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/posts\/4410"}],"collection":[{"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/comments?post=4410"}],"version-history":[{"count":118,"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/posts\/4410\/revisions"}],"predecessor-version":[{"id":4639,"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/posts\/4410\/revisions\/4639"}],"wp:attachment":[{"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/media?parent=4410"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/categories?post=4410"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.calculatinginvestor.com\/wp-json\/wp\/v2\/tags?post=4410"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}Description of Data<\/strong><\/h6>\n
Principal Component Analysis Example<\/strong><\/h6>\n
\nf2 = factors(:,2)<\/strong><\/em>
\nf3 = factors(:,3)<\/strong><\/em><\/p>\n\r\nrx = load('excessreturns.txt'); % Load Excess Returns\r\nSigma_rx = cov(rx); % Calculate covariance matrix of excess returns<\/em>\r\n[V,Lambda]=eig(Sigma_rx); % Eigenvalue decomposition\r\n[stdevs,order] = sort(diag(Lambda)'.^0.5,'descend'); % Sort standard deviations in decending order\r\nbar(stdevs) % Plot standard deviations to see importance of each factor\r\nV = V(:,order); % Sort Eigenvectors in similar way\r\nfactors = rx*V; % Calculate Factors\r\nf1 = factors(:,1); % Extract Factor 1\r\nf2 = factors(:,2); % Extract Factor 2\r\nf3 = factors(:,3); % Extract Factor 3\r\n<\/pre>\n
The First Three PCA Factors and Factor Loads for the FF25<\/strong><\/h6>\n
<\/a>Conclusion<\/strong><\/h6>\n
\r\nclear all;\r\nclose all;\r\n\r\n% Load Factors and Portfolio Returns\r\n% The target files are the files from the Kenneth French website with comment text and data other than monthly returns\r\n\r\nremoved.\r\nff_facts = load('F-F_Research_Data_Factors_monthly.txt');\r\nff_ports = load('25_Portfolios_5x5_monthly.txt');\r\n\r\n% Determine Start and Stop points for FF factors\r\n% 1962 chosen so that only Compustat era data is used\r\nstart_year = 1962;\r\nstart_month = 1;\r\nstop_year = 2012;\r\nstop_month = 12;\r\nstart = (start_year-1932)*12 + (start_month-1) + 67;\r\nstop = (stop_year-1932)*12 + (stop_month-1) + 67;\r\n\r\n% Pullout risk free rate starting on start date\r\nrf = ff_facts(start:stop,5);\r\n\r\n% Pull out FF25 Portfolio Returns for Specified Date Range\r\nx = ff_ports(start:stop,:); % start after NAs end i.e line 67\r\nr = x(:,2:end);\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % remove first column which is dates\r\nrx = r - repmat(rf,1,25);\u00a0\u00a0 % calculate excess returns for portfolios\r\n\r\nSigma_rx = cov(rx);\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % Calculate covariance matrix of excess returns\r\n[V,Lambda]=eig(Sigma_rx);\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % Eigenvalue decomposition\r\n[stdevs,order] = sort(diag(Lambda)'.^0.5,'descend');\u00a0 % Sort standard deviations in decending order\r\nbar(stdevs)\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % Plot standard deviations to see importance of each factor\r\nprint -dpng scree.png\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % Print to file\r\nV = V(:,order);\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % Sort Eigenvectors in similar way\r\nfactors = rx*V;\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % Calculate Factors\r\nf1 = factors(:,1);\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % Extract Factor 1\r\nf2 = factors(:,2);\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % Extract Factor 2\r\nf3 = factors(:,3);\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 % Extract Factor 3\r\n\r\n% Create 3-D Plots of Factor Loadings (Columns of V)\r\n\r\nfigure;\r\nfivebyfive(V(:,1));\r\nxlabel('Size','fontsize',20)\r\nylabel('Value','fontsize',20)\r\nzlabel('Factor Load','rotation',90,'fontsize',20)\r\ntitle('PCA Factor 1','fontsize',36)\r\nview(50,25)\r\nprint -dpng pca1.png\r\n\r\nfigure;\r\nfivebyfive(V(:,2));\r\nxlabel('Size','fontsize',20)\r\nylabel('Value','fontsize',20)\r\nzlabel('Factor Load','rotation',90,'fontsize',20)\r\ntitle('PCA Factor 2','fontsize',36)\r\nview(50,25)\r\nprint -dpng pca2.png\r\n\r\nfigure;\r\nfivebyfive(V(:,3));\r\nxlabel('Size','fontsize',20)\r\nylabel('Value','fontsize',20)\r\nzlabel('Factor Load','rotation',90,'fontsize',20)\r\ntitle('PCA Factor 3','fontsize',36)\r\nview(50,25)\r\nprint -dpng pca3.png\r\n\r\n% Write Factors to file\r\ndates = x(:,1);\r\npcafactors = [dates f1 f2 f3 rf];\r\nsave -ascii pcafactors.txt pcafactors\r\n<\/pre>\n
\r\nfunction[] = fivebyfive(zvalues);\r\n\r\n% Expand 5x5 data to 10x10 for use in surface plot function\r\nzvals = [zvalues' ; zvalues'];\r\nzvals = reshape(zvals,10,5);\r\nzvals = [zvals;zvals];\r\nzvals = reshape(zvals,10,10);\r\n\r\n% Define x and y values\r\nx = [0 0.999 1 1.999 2 2.999 3 3.999 4 5];\r\ny = [0 0.999 1 1.999 2 2.999 3 3.999 4 5];\r\n\r\n% Create x-y mesh for surface plot\r\n[xx,yy] = meshgrid(x,y);\r\n\r\n% Generate Plot\r\nsurf(xx,yy,zvals)\r\nxlabel('Size','fontsize',20)\r\nylabel('Value','fontsize',20)\r\nzlabel('Factor Load','rotation',90,'fontsize',20)\r\naxis([0 5 0 5 min(0,min(zvalues)-0.1*abs(min(zvalues))) max(zvalues)+0.1*abs(max(zvalues))])\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"