top of page
Writer's pictureThe Tech Platform

Building Data Visualization in PowerPoint Using Python

Updated: Jan 13



The way organizations make decisions across various sectors has evolved with the changing role of data. The need for accuracy in decision-making has highlighted the importance of understanding data better. In response to this need, the technology around Data visualization has seen significant progress in recent years.


This progress empowers developers and designers to create compelling visual artifacts that enhance the communication of messages. However, not all platforms have experienced the same level of advancement in the realm of Data Visualization. One such platform that has missed out on these advancements is Microsoft PowerPoint.


Why Microsoft Powerpoint Matters?

Microsoft PowerPoint remains a crucial platform for presentations, maintaining a vast market base across diverse sectors. Despite the evolution of the PowerPoint user interface since its debut in 1987, certain features, particularly in Animations and Charts, have not progressed equally.

Data Visualization in PowerPoint

Visualization provides answers to questions we might not even know we have. However, in PowerPoint, visualizations have remained largely static over the years, with a traditional and somewhat limiting palette of options. While a simple bar chart can convey a message effectively, overusing such graphs, often out of context, has diminished the impact of data in visualizations.


Capturing the audience's attention and conveying a compelling message has become increasingly challenging. From the creator's perspective, PowerPoint presentations are often seen as mundane, lacking excitement and creativity. It might not be an exaggeration to say that the world's largest medium for presenting insights is also one of the less favored platforms for engagement.


While it might be too strong to use the word "hate," the common sentiment expressed with phrases like "Another Presentation?" near the water cooler suggests a certain level of dissatisfaction. Whether the platform is to blame for uninspiring presentations is a topic worthy of debate.


Bringing Python and PowerPoint Together

Many design ideas remain unrealized due to the limitations of PowerPoint features. For instance, envision a data visualization with 200 data points in a single slide, each represented by an image within bubble clusters. Now, imagine these data points rearranged in a bar chart fashion in the next slide, creating a dynamic Sandance Visualization.


Manually arranging such data points is practically impossible. Unfortunately, PowerPoint hasn't been designed to accommodate such imaginative visualizations.

Building Data Visualization in PowerPoint Using Python

Creating a presentation with Python introduces exciting possibilities by using code to unlock the potential of automation. To achieve this, we'll use a Python library called Python PPTX. While python-pptx is commonly known for its ability to update and create presentations in a typical workflow, it can be used for many interesting purposes.


Python-PPTX is compatible with any OpenXML-based Presentation platform. Microsoft PowerPoint, for instance, is formatted in XML to represent content, making it easier to transfer data across their platform. Make sure to refer to the installation process of the Python PPTX package before you dive into the code. You can find the installation guide here. Here's a step-by-step guide to building a Racing Bar Chart presentation using Python and Python PPTX:


Step 1: Import Necessary Packages

We import essential modules from the pptx library for handling PowerPoint presentations.

  • pandas is imported for data manipulation.

  • math is imported for mathematical operations.

# Import necessary objects from Python PPTX 
from pptx import Presentation 
from pptx.enum.shapes import MSO_SHAPE 
from pptx.util import Pt, Cm, Inches 
from pptx.enum.text import PP_ALIGN 

# Miscellaneous Imports 
# Pandas to read and process data 
import pandas as pd 
import math

Step 2: Create an Empty Presentation Object

We create a presentation object using the Presentation() constructor from the pptx library.

# Create an empty Presentation Object 
prs = Presentation()

Step 3: Choose a Slide Layout Type

We choose slide layouts based on their index in the presentation. In this case, a title slide layout and a blank slide layout are selected.

# Choose a slide Layout type 
title_slide_layout = prs.slide_layouts[0] 
blank_slide_layout = prs.slide_layouts[6]

Step 4: Add a Slide

We add a slide to the presentation using the add_slide() method, and we specify the layout type for the added slide.

# Add a Slide 
slide = prs.slides.add_slide(title_slide_layout)

Step 5: Access Slide Default Objects

We access default slide objects like title and subtitle using the shapes and placeholders attributes. We set the text for the title and subtitle.

# Access Slide default objects 
title = slide.shapes.title 
subtitle = slide.placeholders[1] 

title.text = "World Population Over the Years"  
subtitle.text = "A look into how human settlement has evolved."

Step 6: Read Data from a Dataset

We use pandas to read a CSV file ("data.csv") and store it in a DataFrame (df). The head(2) method is used to display the first two rows of the DataFrame.

# Read data from a dataset 
df = pd.read_csv("data.csv") 
df.head(2)

Step 7: Get All Unique Years from the Dataset

We extract all unique years from the "Year" column of the DataFrame.

# Get all the unique years available in the dataset 
years = df["Year"].unique()

Step 8: Define a Scale Function

We define a function called scale to map values to a specified scale.

# Define a scale function 
def scale(value, min_val, max_val, minScale, maxScale): return minScale + (value - min_val) / (max_val - min_val) * (maxScale - minScale)

Step 9: Define a Function to Make Numbers Readable

We define a function called millify to format numbers in a readable way using suffixes like K, M, B, etc.

# Define a function to make numbers readable 
millnames = ['',' K',' M',' B',' T'] 

def millify(n): 
	n = float(n) 
	millidx = max(0, min(len(millnames) - 1, int(math.floor(0 if n == 0 else math.log10(abs(n)) / 3)))) 
	return '{:.0f}{}'.format(n / 10**(3 * millidx), millnames[millidx])

Step 10: Create Data Visualization Objects

We iterate through each year and create a blank slide for each. The data for each year is sliced and sorted by the "value" column. Maximum and minimum values are determined for scaling. Textboxes and shapes (bars) are added to the slide to represent data for each country.

# Create Data Visualization objects 
for year in years: 
	# Add a slide for each year 
	slide = prs.slides.add_slide(blank_slide_layout) 

	# Slice the main dataframe by year and sort the sliced dataframe by the value 
	df_slice_by_year = df[df["Year"] == year].sort_values("value", ascending=False) 

	# Find the Maximum value in the data slice which would be the top row 
	slice_max = df_slice_by_year.iloc[0]["value"] 

	# Set Min as 0. Bar charts should start from 0 unless you have negative values 
	slice_min = 0 

	# Get the Slide width 
	slideWidth = prs.slide_width 

	# Add a text box to showcase each slide by year 
	text_box = slide.shapes.add_textbox(Cm(19), Cm(15), Cm(4), Cm(2)) 
	text_box.text = str(year) 
	text_box.text_frame.paragraphs[0].font.size = Pt(54) 
	text_box.text_frame.paragraphs[0].font.bold = True 

	# Loop to create bars in a single slide. For this visualization, we will only cover Top 15 countries for each year 
	for index, country in enumerate(df_slice_by_year.head(15).iterrows()): 
		# Value which becomes the basis for measurement 
		value = country[1]["value"] 

		# Setup the Maximum width based on the current slide width 
		scaleMaxVal = slideWidth  0.7 

		# Get the width of each bar which has been scaled to the width of the slide 
		scaledValue = scale(value, 0, slice_max, 0, scaleMaxVal) 
		
		# Add the Bar textbox 
		bar = slide.shapes.add_shape(MSO_SHAPE.RECTANGLE, Cm(5.5), Cm((0.7  (index + 1) + (0.21  (index + 1))) + 2), 
		scaledValue, Cm(0.7)) 
		
		# Add a reference name to each object for creating a motion graphic 
		bar.name = "!!" + str(country[1]["Country"]).replace(" ", "").replace("[^0-9a-zA-Z]+", "") 

		# Add the value towards the end of the bar to help the audience understand the order of magnitude 
		text_frame = bar.text_frame 
		text_frame.clear() 
		p = text_frame.paragraphs[0] 
		p.alignment = PP_ALIGN.RIGHT 
		p.font.size = Pt(10) 
		p.text = millify(country[1]["value"]) 

		# Add the country name on the Y-Axis to help the user track the country 
		bartext = slide.shapes.add_textbox(Cm(2), Cm((0.7  (index + 1) + (0.21 * (index + 1))) + 2), Cm(1.5), Cm(1.5)) 
		bartext_run = bartext.text_frame.add_paragraph() 
		bartext.text = country[1]["Country"] 
		bartext.text_frame.paragraphs[0].font.size = Pt(12) 
		bartext_run.font.bold = True

Step 11: Save the Presentation

We save the final presentation with the filename 'Racing Bar Chart.pptx'.

# Save the presentation 
prs.save('Racing Bar Chart.pptx')

Post-Generation Steps:

Select All the Slides: Click and drag to select all slides in the presentation.


Choose "Morph" Animation: In the Transition Tab, choose the “Morph” animation (available in Office 2016 version and above).


Set Slide Duration: Change the duration of each slide to 0.5 seconds (Feel free to adjust based on your preference).


Configure Advance Slide Options:

  • Deselect the "On Mouse Click" option.

  • Select the "After" option.


Run the Presentation: Open the presentation in SlideShow mode.


Now, you have a dynamic Racing Bar Chart presentation showcasing the population changes over the years for the top 15 countries!

Comments


bottom of page