Learn Data Analysis with Python
Subscribe to Our Email List

Unlock Exclusive Course Offers!

Stay in the loop! Receive the latest course updates, special discounts, and curated learning materials directly to your inbox.

 

PyGWalker Tutorial: A Tableau-Like Python Library for Interactive Data Exploration and Visualization

Pygwalker Python library for data visualization and analysis

As data scientists and analysts, we often spend a lot of time exploring and analyzing data using tools like pandas, matplotlib, and seaborn. While these tools are powerful, they can be limited when it comes to interactive data exploration and visualization.

This is where PyGWalker (pronounced like "Pig Walker") comes in. PyGWalker is a Python library that integrates Jupyter Notebook (or other jupyter-based notebooks) with Graphic Walker, an open-source alternative to Tableau. With PyGWalker, you can turn your pandas dataframe into a Tableau-style user interface for visual exploration.

I will be using superstore sales data that I downloaded from Kaggle.com, but you can follow along with any data you like.

Here is a YouTube video walkthrough of this blog post:
https://youtu.be/cZ0jURSVy_w?feature=shared

If you are interested in learning Python for Data Analysis, you don't want to miss out the FREE Masterclass I will be hosting. Make sure to register before all the spots fill in!

https://www.datamasteryacademy.com/pythontraining

 

How to Install PyGWalker:

This one is easy! Open your Jupyter notebook and type:

!pip install pygwalker

Importing Data and Libraries:

To access data located in my Google Drive, I need to establish a connection between my Google Drive and Google Colab. I am also importing the libraries I will use, including PyGWalker:

Exploring Data with PyGWalker: 

It is time to run PyGWalker: 

pyg.walk(data)

 Once the code runs, we will see the interactive data visualization tool that is very similar to Tableau:

The left-hand side of the interface displays the variables we are working with. The central area is reserved for our visualizations, where we can drag and drop variables to the X and Y-axis boxes. There are various customization options available, such as filters, color, opacity, size, and shape. Let's explore these tools and create some visualizations.

Take a Look at the Data:

Click on the data tab to quickly see the data and the data types. To go back to the visualization area, click visualizations:

Creating a Bar Plot: 

Drag and drop the column "Sales" to x-axis and "Region" to y-axis. This chart shows us the regional sales of this superstore.

Selecting Plot Type:

You can select the desired mark type or leave it in auto so the tool can select the most appropriate plot type for you!

 Sorting the Chart:

You can sort the chart in ascending or descending order by clicking one of these:

 

Aggregation

PyGWalker allows for aggregating the numeric columns. The following chart is created by aggregating the sales by region. You can select the aggregation function from the available options:

Chart Resizing:

You can resize your chart by clicking on the "layout mode"  and selecting "fixed" or by clicking on the gear icon right next to the "layout mode" button:

 

You can also use the resize option by hovering over the corner of the chart and dragging it:

 

Interpreting the Data:

PyGWalker has the ability to interpret the data and give some insights. Click on the West region and select "interpret data":

For the West region, the total sale is greater than expectation:

You can also select south from the graph and click interpret data. This time it says "less than expectation" 

 

Exporting the Chart:

You can export the chart and save it as a PNG or a SVG file:

 

 Create a New Chart:

To create a new chart, click on the +New tab:

 

 

Create a line graph with time series:

To use the Order Date and Ship Date, I first converted them into datetime types, and then refreshed my PyGWalker cell.(FYI: Refreshing the cell deleted my previous work)

The data types for these dates changed to:

 

Let's create our line graph now! Drag the Ship Date column to the x-axis and the Sales column to the y-axis. PyGWalker selected the line graph type automatically for us. I resized the graph to make it look like this:

 

Adding filters:

We can easily add color filtering by dragging columns to the color filter box. For this, I first change the mark type to scatter.

We can also filter our data to select only a portion of data based on any column. Let's drag the segment to the filter box and select the consumer label only to see the sales for the consumer segment. Click the green arrow after deselecting corporate and home office options.

 

Great! We were able to filter our data based on segment!

Let's also adjust the size of the scatter points based on the quantity of the order:

The larger scatter points indicate a larger order quantity.

 

Adding multiple columns to create multiple plots:

We can drag as many columns as we want to either of the axis. I dragged the category and subcategory columns to the x-axis to see the sales:

Let's also color the above charts by average discount:

 

Summary

PyGWalker is a powerful and user-friendly interactive data visualization tool for Python that enables users to explore and analyze datasets intuitively. By following the steps outlined in this tutorial, you should now have a solid understanding of how to install and use PyGWalker to create interactive visualizations and gain insights into your data. Whether you're a data scientist, data analyst, or simply someone who needs to visualize data in a meaningful way, PyGWalker is an excellent package to add to your toolkit. With its flexibility and ease of use, PyGWalker is sure to become an invaluable asset in your data analysis workflow. So, give it a try today!

 

 

 

Stay connected with news and updates!

Join our mailing list to receive the latest news and updates from our team.
Don't worry, your information will not be shared.

We hate SPAM. We will never sell your information, for any reason.