Python-Pandas vs Excel

Photo by Shahadat Rahman on Unsplash

Why learn to work with Python when we have Excel and BI tools?

Excel is most widely used tool for data analysis and manipulation in the world. Every organisation is using this; analysts, accountants, even CEOs are using for the stats and data handling. All excel users know, how excel can perform tasks form simple functions to complex formulas, macros, power query, complex stats and then VBA coding, so it is our all weather friend. But we all would have noticed that every time we open a new database or there is change in the existing database, we have to clean the data from the scratch or have to write a function or VBA code to resolve the issue. Without going into details, Excel limitation are well known.

By using Python, you can enjoy the extra liberty of playing with data than Excel and on top of that get a non-ending list of connections to your data-model i.e. links to live web apps, to SQL databases etc. There is a great tool available in shape of a python library - Pandas. Pandas was built by Wes McKinney  He built pandas to help work with datasets in Python for his work in finance at his place of employment. Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. Wes Mckinney stance behind building pandas is a great in-favor point to learn new skills for all my Accounting and Finance friends who are in turmoil of whether they should learn a programming language for something beyond Excel and BI tools or not?

Here are some couple of benefits on account of which i recommend to jump start using Pandas(a Python Library)

  1. Built on Python. It is built on Python that what makes it more powerful. Real power of Pandas is enjoyed when you use it with other available python libraries. Python libraries list is very long and you can learn any of those and can build you own models. Below are one of the major connections which i narrate to highlight its worth.
      • Scikit-Learn is Mahine Learning library in Python. You can use Pandas to feed data into Scikit-Learn Machine learning models. Reference:- https://scikit-learn.org/stable/
      • Flask & Django are python libraries which provide web-development framework. You can built Web apps using any one of these two. You can use Pandas at the back-end of your web app for the data analysis and reports creation. https://www.djangoproject.com/     https://flask.palletsprojects.com/en/1.1.x/
      • Dash, Plotly and plotly express. Dash is the library for building ML and data science web applications. These libraries provide the framework to build interactive Dashboards and visualizations of your data through web apps. https://plotly.com/dash/

                        This list of libraries is non-ending. As you would keep on exploring Python, you would keep on learning new and faster ways to mingle different libraries as per your requirements. You may also visit https://www.python.org/ for more details.


  2. Handle multiple datasets. The biggest hurdle we all face with the data analysis is the data type itself. We are provided data in many data formats i.e. SQL and HTML etc. Although we can import these into excel but it always take much time to convert these datasets into excel usable form. Even though after a successful conversion, formatting often gets disturbed and may result in errors. Pandas provide a solution to this by helping to import more than 15 types of data formats and luxury of importing all these data types is that you can play with all of them in the same fashion. It doesn't matter whether you are importing from SQL, HTML or excel/csv itself, Pandas provides uniform way to deal all of them in the same manner.
  3. Quick Exploratory Data Analysis (EDA). Just with a single line of code, Pandas can provide you a complete picture of the data. I shall endeavor to practically elaborate this in my next article.
  4. Handle large datasets. We all know that as the data size starts getting bigger, Excel speed starts getting lower and affects our efficiency of work. When the excel row numbers top 10,000, its speed gets lower. Pandas has no limitation to the number of rows and its computing capacity revolve around the memory of computing it is running on.  You can create hundred of calculation on millions of data points in Pandas instantly. 
  5. Quickly clean files and convert data files. The very first stage of data analysis is data-cleaning. We don't get data in the desired form. There are always something to add, edit or delete. Pandas provides all such function to help data experts to clean the data as per their desired fashion in just number of code lines. 
  6. Pandas is open-source. Unlike Excel and many other BI tools, pandas is free. So at Nil cost you can enjoy higher benefits.

However despite all such benefits, below questions pop up immediately whenever we work towards adding something new to our existing skill-set.

Which tool to be used? which skill-set to be acquired? how much time to be spent on learning a skill? The worth of learning a new language or a tool?

The answer to all these questions is whether learning a skill-set shall make your work leaner and faster? whether you can add value to your existing work if you learn it? what actually is being asked you to do? At what level of quality you want your reporting work to take? 

Being in finance/Accounting field, i encourage all professionals in the field to learn coding skills and try to add Python-Pandas in your skill set. As you will come along with that, you shall learn when to use conventional tools and when to use core programming tools. When dealing with simple requirement and less amount of data, you may go through conventional tools like excel and BI tools but while the data shall be bigger or reporting and analysis requirements are higher then you may have to take help of core programming tools i.e. Pandas. You may also import and export from/to Pandas anytime. for example. a simple analysis can be done in excel and then that can be brought to pandas for further statistical analysis  - or an analysis is done in Pandas and then exported to excel for analysis.

This short article sums up the key and major points to an excel user to start using Python and Pandas. In the next article i shall practically highlight how we can preform the Excel and SQL query tasks in Python-Pandas. It shall be much easier and soothing than you might be thinking, so stay tuned.




Comments