Today I want to show you how to integrate Excel and Python. In a way that we can call Python functions within Excel spreadsheets. This kind of integration is powerful because it gives the best of both worlds – Excel’s simplicity and the power of Python! We will demonstrate this by building a stock tracker to extract/scrape financial data from websites using Excel & Python.
By the end of this tutorial, you will learn how to:
- Extract/scrape data from any website
- Call Python functions within a spreadsheet, using user-defined formulas in Excel
Part 1 – Web Scraping with Python
Aug 20, 2015 Excel 2016 for Mac comes with a pre-installed and integrated SQL Server ODBC driver, which we worked hand-in-hand with Simba Technologies to provide. Excel 2016 for Mac has a brand new Microsoft Query (MSQuery) and Connection Manager to make creating and managing all of your data connections easier and more consistent with Windows. Doing further research, I came across Robert Knight's comment on this question VBA Shell function in Office 2011 for Mac and built an HTTPGet function using his execShell function to call curl. I've tested this on a Mac running Mac OS X 10.8.3 (Mountain Lion) with Excel for Mac 2011. Here is the VBA code. Excel VBA Web Scraping VBA Web Scraping is a technique of accessing web pages and downloading the data from that website to our computer files. Web scraping is possible by accessing external applications like Internet Explorer. We can do it in two ways i.e. Outlook for Mac and OneNote for Mac do not support VBA. Office 2016 for Mac is sandboxed Unlike other versions of Office apps that support VBA, Office 2016 for Mac apps are sandboxed. Sandboxing restricts the apps from accessing resources outside the app container. Start quickly with the most recent versions of Word, Excel, PowerPoint, Outlook, OneNote and OneDrive —combining the familiarity of Office and the unique Mac features you love. Work online or offline, on your own or with others in real time—whatever works for what you’re doing.
There are many ways to get financial data from the Internet, the easiest way is through an API. Still, we’ll leave that to another tutorial. Today we’ll scrape stock data from Yahoo Finance website using BeautifulSoup
and requests
. Once you learn this, you’ll be able to scrape data from any website.
A word of caution for scraping websites: be aware of the target website’s bandwidth limitations, don’t flood it like sending thousands of requests in a second. That will be considered a DoS attack, which is regarded as a malicious act.
Now back to scraping, I’m using Chrome for this tutorial, but you can use any internet browser.
Let’s find Apple’s stock information on Yahoo Finance. Here is the URL: https://finance.yahoo.com/quote/AAPL, which looks like this:
First, we want to get its price: $262.47. Select this number on Yahoo Finance’s website, right-click, then choose “Inspect”. This will bring up the Chrome developer tools, which reveal the underlying HMTL code of the site we are viewing. A little bit of HTML knowledge helps a lot here because all the data we are trying to find is in HTML, and we just need to know where to look.
What we are interested in is a div
tag name, as unique as possible since a unique value will help narrow down the choices. I’ve settled on <div…>
line, but feel free to try other tags. The key to remember is that we need a tag (HTML code block) that includes the data we are trying to extract. We can see that the price is within our selected div
tag. If you want to try other tags, the one I selected in yellow should also work <div>
.
Installing Python libraries
We need to use three Python libraries. If you don’t have them already, use the following command line input to install them.
Getting the underlying HTML document
We can use the requests
library to get the entire HTML document of the page with 2 lines of code:
The requests
library allows us to send HTTP requests easily to any server. The get()
method returns a Response
object. A value of 200 means OK, which indicates that we have made a request to the server and received some data back successfully.
The Response
object contains a .content
attribute, which literally means the text/content of the response. In this case, it’s the HTML code for the underlying website – Yahoo Finance. This text data is huge and we really don’t want to print it on the screen – it will hang your Python IDE. There’s no way we can extract data from such a large text data, so we need some help…
Pulling data from the HTML document
Since we care about only the information we are trying to scrape, namely stock price, volume, and etc., we can use BeautifulSoup
, which is a Python library for pulling data out of HTML files.
The soup.find_all()
method returns all the HTML code block that match the argument inside the parentheses. In our case, there’s only one of them, which is the code block <div...>
. Thanks to the unique tag value we picked earlier! Note this is not the only solution, so feel free to try other div
tags. The key to remember is that you want a code block that includes the price.
The above screenshot is the entire div block with id=”quote-header-info”. Price is within this block (green box). The object price
appears to be a list type object that contains 1 item, so we can access the actual div block text using price[0]
, since Python index starts from 0. We also want to further extract only the price from this nonsensical block of text. Note in the <span>
tag that contains the price, there’s an attribute data-reactid='14'
, we’ll take advantage of it.
With a little bit assistance from the helper method .get_text(), we just extracted the current Apple stock price. Pay attention that this value is a string
type.
Let’s try to scrape a few other pieces of information from the same website. For a stock tracker, I’m also interested in Apple stocks’ volume and the next earning announcement date. Same technique here:
- On the webpage, select the data of interest, right click -> Inspect.
- Look for the block of HTML code that contains the data of interest. Pick one with a unique id or class.
- Use
BeautifulSoup
‘s engine to find the element that contains the data of interest. Then extract the text value.
Good job! We have just completed the first part of the job! Next, let’s look at how to bring the data into Excel spreadsheet seamlessly with an Excel formula!
Putting it Together
I’m posting a full version of the code, so feel free to grab it here, or from Github. The code is more complicated than the example we walked through, but the core concept is the same. Note that I place all the code inside a function named get_stock()
, which will return a list of data points we’d like to scrape. Note that the return value is called a list comprehension, which is essentially a Pythonic way to write a for loop in one line. Check out this tutorial if you want to learn about it.
Part 2 – Calling Python functions in Excel
Let me introduce another excellent tool – xlwings
, which is a Python library that allows us to leverage the power of Python from and with Excel. With it, you can automate Excel spreadsheets, write macros in Python, or write user-defined functions (UDF).
Here, we only focus on how to create user-defined functions in Python and use them in Excel. Check out this tutorial if you need help with xlwings
setup, or if you are interested in learning about how to automate Excel or write macros in Python.
Setting up the Excel file
- In Excel, open File -> Excel Options -> Trust Center -> Trusted Center Settings -> Macro Settings. Check the box to enable
Trust access to the VBA project object model
. - Install the xlwings Excel addin, download the xlwings.xlam file here: https://github.com/xlwings/xlwings/releases
- Double click the downloaded xlwings.xlam file to install it. In Excel, open Developer -> Add-Ins, make sure the xlwings addin is properly installed.
- Open the VBA editor (Developer -> Visual Basic or press Alt + F11).
- Check xlwings box under Tools -> References. Save and close the VBA editor.
- Now you should see an Excel tab appear for xlwings.
- Save this Excel file on your computer. I’m naming my file as “Tracker.xlsb”.
Setting up the Python script
The setup in Python is a lot easier compared to what we just did in Excel. Since we are creating user defined functions (UDF), we need to write a function in Python, and the function has to return some data to us. We did this in part 1 of the tutorial. Then, follow the below steps:
- Inside the Python code, right above the Python function, add a decorator line
@xw.func
. This decorator will allow you to use call Python functions from Excel. - Place this Python script in the same directory as the Excel file that we just saved.
- Name the Python script with the same name as the Excel file. The Excel file can be either .xlsm or .xlsb format.
Now the setup is complete. One last step we need to do is to load the Python function into Excel. We do this by clicking on the Import Functions in the xlwings tab in Excel. Remember, every time we make a change in the Python code, we need to re-import it here.
It’s testing (and fun) time!
Our user defined function get_stock()
can return multiple data points in Excel by using an array formula. If you don’t know how to enter an array formula, here’s what you need to do to create one, using the below screenshot as an example.
- Select a range in Excel (
B4:L4
) - Enter the formula inside the formula bar (
=get_stock(A4,$B$1:$L$1)
) - Simultaneously press Ctrl + Shift + Enter
To re-cap, now you know how to:
- Scrape website with Python using
requests
andBeautifulSoup
- Use
xlwings
to create user defined functions (UDF) in Python and call them within Exce
Office For Mac Vba Scraping Tool
Enjoy your new stock tracker spreadsheet!
Comments are closed.