What is a linear regression

Linear regression is a statistical method to find a line that most approximates the target values. As you will see later in this story, we can sometimes find a pattern on a graph in financial analysis. If the pattern/shape of the graph is close to a line, we apply linear regression. There are several kinds of regression, but the most basic on is this linear regression.

There are tons of method in Python to apply linear regression on a graph, but the most common way is using the “sklearn” package. In this story, we are going to see how to use the package step by step.

Linear regression (Wikipedia)

sklearn.linear_model.LinerRegression (Scikit Learn)

Why do we need regression?

This is because linear regression is the most basic kind of prediction. In most of data science filed, what we want to do is to figure out what’s going on the data, and to predict what will happen in the future. In finance, for example, we could find that the stock prices of two companies are close to linear (the example below). If the pattern has lasted for long, we can expect the linear patten will remain in the future. This leads to a prediction of stock price.

An example

The example data we’re going to analyze is relative performance of the sector “Computer and Technology” to the sector “Business Services”. The graph below shows their relative performance, and you can see that it’s close to linear. If the graph is close to a clear line, it means the performances of the two sectors are strongly correlated. If the slope of the line is large, it means the performance of “Computer and Technology” is better than “Business Services”.

If we want to find the slope of the graph, there’s no other choice than applying linear regression on the graph. But the problem is we can’t apply linear regression directly on a time series data like stock price. Because the x-axis of the graph is dates, we must convert them into numerical values. In addition to seeing how to apply linear regression, we’re going to see how to make the conversion as well.

Python code

1. Import packages

As we do in other stories, we import “numpy”, “matplotlib”, and “pandas” for basic data analysis. “datetime” is a must when dealing with time series data. Because we have to make regression, we need “sklearn” as well. This does every math things for you.

“financialanalysis” is a package that automates almost everything we do in this story. You can install it with “pip install financialanalysis”.

Full code is available below

2. Read dataset

The example dataset is available from the link below. Download the CSV “relative_price_change_CTtoBS_table.csv”.

https://drive.google.com/file/d/1Cd2ibwcPYFZPw-wl_Cfjr2Jmo1ziI8mc/view?usp=sharings

We read the file with the “read_csv()” function of pandas. The output is DataFrame. DataFrame is a kind of data type that stores a table like data. If you don’t know DataFrame so much, this story will help you:

Handling table like data in Python with DataFrame (Python Financial Analysis)

Python DataFrame slicing in the easiest way (How to find a company from 5000 companies)

Full code is available below

3. Convert dates into datetime objects

The dates written in the CSV file are all texts. Because it’s not the standard form to represent date and time in Python, we need convert these strings into Python “datetime” objects. The dates in text is stored in the column “date” of “data”. We iterate over each element on the column, and convert them into datetime objects one by one. The result is saved on the list “date”, and then resaved as a new column of “data”. If you want to know date conversion with more details, read the following article!

Python “datetime” in the easiest way (how to handle dates in data science with Python)

If you don’t want to do these things above, the “financialanalysis” package can do that all for you. Just give the column “date” of “data” to the function “stringToDatetime()”. The result is the same.

Full code is available below

4. Prepare data

Before applying linear regression, we have to convert input data into a form suitable for “sklearn”. In the code below, the data for the x-axis is denoted as “X”, while the data for the y-axis “y”. “X” is made from the datetime objects we made earlier.

The next step is the most important one of this story. Because we can’t feed datetime objects directly, we must convert them into float values. The function “financialanalysis” converts each date into a float year. Float year means each data is represented in year. For example, 2020–07–01 becomes 2020.49 because is middle of the year.

The operation “[::, None]” converts a row array into a column array. We can’t feed row arrays.

Full code is available below

5. Apply linear regression

Finally, we use the function “LinearRegression().fit()” of sklearn to apply linear regression on “X” and “y”. The returned object “reg” contains the slope and y-intercept of the prediction line. Once we extract the slope and intercept, we generate the line with “slope*X + intercept”. But we have to note here is that, because X is a column array, “fittedline” is also a column vector. Thus, we make it back to a row vector with the “flatten()” function.

Full code is available below

6. “financialanalysis” does everything for you

If you don’t write code like above, you can automate everything with the “timeseriesLinearRegression()” function of “financialanalysis”. Just give the datetime objects and performance data. Then it gives you all of the things.

Full code is available below

7. Make graphs

Then we make the graph of the original data and the prediction line. If you don’t know how to use Matplotlib, the following article explains the basics:

Make graphs of stock price in Python (Python Financial Analysis)

Full code is available below

Full Python code

You can download the dataset from this link
https://drive.google.com/drive/folders/1Ux2u1s5mctYiywS08sv7_3_PbnWd8v0G?usp=sharing

Other Links

Python Financial Analysis | Home
Python Data Analysis | Home

New articles are notified on Twitter @sparkle_twtt
E-mail:sparkle.official.01@gmail.com

YouTube:https://www.youtube.com/channel/UC19jAflhuZEtmrYYrlhX-6w

--

--