Introducing

Finding Correlation Between Stocks

May 12, 2023

In this tutorial, we will look at examples of how stocks move in relation to one another by building several correlation matrices using Python for data analysis and Polygon’s python-client library to fetch market data. The underlying idea is that by diversifying across uncorrelated assets, you can effectively reduce portfolio risk, and mitigate the impact of market fluctuations. Finding stocks that move together and those that do not is a crucial aspect of solving this problem.

Polygon.io is a financial data platform that provides both real-time and historical market data for Stocks, Options, Indices, Forex, and Crypto. With access to this information, developers, investors, and financial institutions can gain valuable insights and make informed decisions.

What is Stock Correlation

Correlation is a statistical measure that indicates the extent to which two or more variables move in relation to each other. In the context of stocks, correlation can help us understand the degree to which the prices of two or more stocks move together. A positive correlation indicates that the stocks tend to move in the same direction, while a negative correlation means they move in opposite directions. By identifying uncorrelated stocks, traders can minimize risk and build a more balanced portfolio.

Here's a high-level workflow of how to calculate the correlation between stocks:

  • Gather historical price data for the stocks you are interested in analyzing with Polygon.io.
  • Calculate the daily returns using Python for each stock via the percentage changed.
  • Compute the correlation matrix using the pandas library built-in corr() method (Pearson correlation coefficient), which measures the linear relationship between two variables.
  • Visualize the correlation matrix using a heatmap, which helps better understand the relationships between stocks.

By following these steps, you can fairly quickly calculate and visualize the correlation between stocks, that could help you build a well-diversified portfolio that minimizes risk.

Computing a Correlation Matrix

Now, let's walk-through this workflow step-by-step and look at what is needed in detail. The initial step involves gathering historical price data for the stocks of interest, which can be accomplished using Polygon.io's Aggregates (Bars) API using the client-python library. To access the API, you will need an API key. If you do not already have one, you can sign up for free on the website to obtain it. Once you have your API key, you can proceed with the analysis.

The specific stock symbols and the date range are defined as variables in our script, and these can be tailored as per your specific requirements. We have the complete code example available via the client-python repo (link to script) but we will look at snippets from the script as we walk through it.

symbols = ["INTC", "AMD", "NVDA", ...]
start_date = "2022-04-01"
end_date = "2023-05-10"
stock_data = fetch_stock_data(symbols, start_date, end_date)

Once we have the historical price data, we calculate the daily returns for each stock. This can be done using the percentage change method which gives us the rate of return from one trading day to the next.

daily_returns = calculate_daily_returns(stock_data)

Next, we compute the correlation matrix using the corr() method provided by the pandas library. This method computes the Pearson correlation coefficient, which measures the linear relationship between two variables.

correlation_matrix = compute_correlation_matrix(daily_returns)

The output of this step is a correlation matrix, a square table that shows the correlation coefficients between each pair of stocks. Each cell in the table shows the correlation between two stocks: a value of 1 means a perfect positive correlation, a value of -1 means a perfect negative correlation, and a value of 0 means no correlation. The diagonal line of 1s from the top left to the bottom right represents each stock's correlation with itself, which is always 1.

          INTC       AMD      NVDA       TXN      QCOM        MU      AVGO       ADI      MCHP      NXPI
INTC  1.000000  0.688225  0.668148  0.728849  0.705444  0.705384  0.704072  0.707094  0.725683  0.693339
AMD   0.688225  1.000000  0.863082  0.723755  0.714276  0.712958  0.712381  0.751192  0.763786  0.752658
NVDA  0.668148  0.863082  1.000000  0.751007  0.761949  0.729233  0.767373  0.780262  0.812928  0.783573
TXN   0.728849  0.723755  0.751007  1.000000  0.778992  0.747224  0.823659  0.871440  0.877141  0.842469
QCOM  0.705444  0.714276  0.761949  0.778992  1.000000  0.748538  0.787640  0.798045  0.819367  0.812557
MU    0.705384  0.712958  0.729233  0.747224  0.748538  1.000000  0.705517  0.716119  0.753912  0.730331
AVGO  0.704072  0.712381  0.767373  0.823659  0.787640  0.705517  1.000000  0.808928  0.839874  0.801082
ADI   0.707094  0.751192  0.780262  0.871440  0.798045  0.716119  0.808928  1.000000  0.901758  0.857143
MCHP  0.725683  0.763786  0.812928  0.877141  0.819367  0.753912  0.839874  0.901758  1.000000  0.889236
NXPI  0.693339  0.752658  0.783573  0.842469  0.812557  0.730331  0.801082  0.857143  0.889236  1.000000

Finally, to better understand the relationships between the stocks, we visualize the correlation matrix using a heatmap. The seaborn library provides an easy way to create this heatmap. These stocks are likely to be highly correlated due to being in the technology sector, specifically in the sub-industry of semiconductors.

plot_correlation_heatmap(correlation_matrix)

The resulting image, provides a visually intuitive representation of the correlation matrix, so that you can quickly identify both highly correlated and uncorrelated stocks. The heatmap's color scale ranges from -1 (indicating a perfect negative correlation) to 1 (indicating a perfect positive correlation), with the varying shades of color in between representing the degree of correlation.

Here is another example of ten stocks selected across a diverse range of industries including automotive, healthcare, energy, consumer discretionary, financials, technology, consumer staples, and industrials. Given this industry diversity, these stocks are likely to be uncorrelated, as they are exposed to different market forces, economic trends, and sector-specific risks.

Here is another example, of stocks that are divided into two distinct groups: technology and oil. The technology group are likely to be highly correlated due to their shared sector influences. Conversely, the oil group, are also likely to move in tandem due to shared influences like global oil prices, energy demand, and environmental regulations. However, given the distinct market forces and sector-specific risks affecting technology and oil stocks, these two groups are expected to be less correlated with each other.

Through these steps, we can analyze the correlation between different stocks and build a diverse portfolio to manage risk and inform investment strategies.

Next Steps

With a correlation matrix and heatmap visualization, you can easily identify how correlated two or more stocks are and diversify your portfolio to reduce risk. Keep in mind that correlations may change over time, so this might be an interesting idea to explore and see how correlation values change over time.

In conclusion, understanding the correlation between stocks is crucial for building a well-diversified and low-risk portfolio. By utilizing this example code that uses Polygon.io's historical stock Aggregates (Bars) API via the client-python package, along with the power of other Python libraries, you can efficiently analyze and visualize correlations to make more informed investment decisions.

From the blog

See what's happening at polygon.io

hunting anomalies in the stock market Feature Image
tutorial

Hunting Anomalies in the Stock Market

This tutorial demonstrates how to detect short-lived statistical anomalies in historical US stock market data by building tools to identify unusual trading patterns and visualize them through a user-friendly web interface.