The way organizations make decisions across various sectors has evolved with the changing role of data. The need for accuracy in decision-making has highlighted the importance of understanding data better. In response to this need, the technology around Data visualization has seen significant progress in recent years.
This progress empowers developers and designers to create compelling visual artifacts that enhance the communication of messages. However, not all platforms have experienced the same level of advancement in the realm of Data Visualization. One such platform that has missed out on these advancements is Microsoft PowerPoint.
Also Read: Python Data Visualization with Matplotlib
Why Microsoft Powerpoint Matters?
Microsoft PowerPoint remains a crucial platform for presentations, maintaining a vast market base across diverse sectors. Despite the evolution of the PowerPoint user interface since its debut in 1987, certain features, particularly in Animations and Charts, have not progressed equally.
Data Visualization in PowerPoint
Visualization provides answers to questions we might not even know we have. However, in PowerPoint, visualizations have remained largely static over the years, with a traditional and somewhat limiting palette of options. While a simple bar chart can convey a message effectively, overusing such graphs, often out of context, has diminished the impact of data in visualizations.
Capturing the audience's attention and conveying a compelling message has become increasingly challenging. From the creator's perspective, PowerPoint presentations are often seen as mundane, lacking excitement and creativity. It might not be an exaggeration to say that the world's largest medium for presenting insights is also one of the less favored platforms for engagement.
While it might be too strong to use the word "hate," the common sentiment expressed with phrases like "Another Presentation?" near the water cooler suggests a certain level of dissatisfaction. Whether the platform is to blame for uninspiring presentations is a topic worthy of debate.
Bringing Python and PowerPoint Together
Many design ideas remain unrealized due to the limitations of PowerPoint features. For instance, envision a data visualization with 200 data points in a single slide, each represented by an image within bubble clusters. Now, imagine these data points rearranged in a bar chart fashion in the next slide, creating a dynamic Sandance Visualization.
Manually arranging such data points is practically impossible. Unfortunately, PowerPoint hasn't been designed to accommodate such imaginative visualizations.
Building Data Visualization in PowerPoint Using Python
Creating a presentation with Python introduces exciting possibilities by using code to unlock the potential of automation. To achieve this, we'll use a Python library called Python PPTX. While python-pptx is commonly known for its ability to update and create presentations in a typical workflow, it can be used for many interesting purposes.
Python-PPTX is compatible with any OpenXML-based Presentation platform. Microsoft PowerPoint, for instance, is formatted in XML to represent content, making it easier to transfer data across their platform. Make sure to refer to the installation process of the Python PPTX package before you dive into the code. You can find the installation guide here. Here's a step-by-step guide to building a Racing Bar Chart presentation using Python and Python PPTX:
Step 1: Import Necessary Packages
We import essential modules from the pptx library for handling PowerPoint presentations.
pandas is imported for data manipulation.
math is imported for mathematical operations.
# Import necessary objects from Python PPTX
from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE
from pptx.util import Pt, Cm, Inches
from pptx.enum.text import PP_ALIGN
# Miscellaneous Imports
# Pandas to read and process data
import pandas as pd
import math
Step 2: Create an Empty Presentation Object
We create a presentation object using the Presentation() constructor from the pptx library.
# Create an empty Presentation Object
prs = Presentation()
Step 3: Choose a Slide Layout Type
We choose slide layouts based on their index in the presentation. In this case, a title slide layout and a blank slide layout are selected.
# Choose a slide Layout type
title_slide_layout = prs.slide_layouts[0]
blank_slide_layout = prs.slide_layouts[6]
Step 4: Add a Slide
We add a slide to the presentation using the add_slide() method, and we specify the layout type for the added slide.
# Add a Slide
slide = prs.slides.add_slide(title_slide_layout)
Step 5: Access Slide Default Objects
We access default slide objects like title and subtitle using the shapes and placeholders attributes. We set the text for the title and subtitle.
# Access Slide default objects
title = slide.shapes.title
subtitle = slide.placeholders[1]
title.text = "World Population Over the Years"
subtitle.text = "A look into how human settlement has evolved."
Step 6: Read Data from a Dataset
We use pandas to read a CSV file ("data.csv") and store it in a DataFrame (df). The head(2) method is used to display the first two rows of the DataFrame.
# Read data from a dataset
df = pd.read_csv("data.csv")
df.head(2)
Step 7: Get All Unique Years from the Dataset
We extract all unique years from the "Year" column of the DataFrame.
# Get all the unique years available in the dataset
years = df["Year"].unique()
Step 8: Define a Scale Function
We define a function called scale to map values to a specified scale.
# Define a scale function
def scale(value, min_val, max_val, minScale, maxScale): return minScale + (value - min_val) / (max_val - min_val) * (maxScale - minScale)
Step 9: Define a Function to Make Numbers Readable
We define a function called millify to format numbers in a readable way using suffixes like K, M, B, etc.
# Define a function to make numbers readable
millnames = ['',' K',' M',' B',' T']
def millify(n):
n = float(n)
millidx = max(0, min(len(millnames) - 1, int(math.floor(0 if n == 0 else math.log10(abs(n)) / 3))))
return '{:.0f}{}'.format(n / 10**(3 * millidx), millnames[millidx])
Step 10: Create Data Visualization Objects
We iterate through each year and create a blank slide for each. The data for each year is sliced and sorted by the "value" column. Maximum and minimum values are determined for scaling. Textboxes and shapes (bars) are added to the slide to represent data for each country.
# Create Data Visualization objects
for year in years:
# Add a slide for each year
slide = prs.slides.add_slide(blank_slide_layout)
# Slice the main dataframe by year and sort the sliced dataframe by the value
df_slice_by_year = df[df["Year"] == year].sort_values("value", ascending=False)
# Find the Maximum value in the data slice which would be the top row
slice_max = df_slice_by_year.iloc[0]["value"]
# Set Min as 0. Bar charts should start from 0 unless you have negative values
slice_min = 0
# Get the Slide width
slideWidth = prs.slide_width
# Add a text box to showcase each slide by year
text_box = slide.shapes.add_textbox(Cm(19), Cm(15), Cm(4), Cm(2))
text_box.text = str(year)
text_box.text_frame.paragraphs[0].font.size = Pt(54)
text_box.text_frame.paragraphs[0].font.bold = True
# Loop to create bars in a single slide. For this visualization, we will only cover Top 15 countries for each year
for index, country in enumerate(df_slice_by_year.head(15).iterrows()):
# Value which becomes the basis for measurement
value = country[1]["value"]
# Setup the Maximum width based on the current slide width
scaleMaxVal = slideWidth 0.7
# Get the width of each bar which has been scaled to the width of the slide
scaledValue = scale(value, 0, slice_max, 0, scaleMaxVal)
# Add the Bar textbox
bar = slide.shapes.add_shape(MSO_SHAPE.RECTANGLE, Cm(5.5), Cm((0.7 (index + 1) + (0.21 (index + 1))) + 2),
scaledValue, Cm(0.7))
# Add a reference name to each object for creating a motion graphic
bar.name = "!!" + str(country[1]["Country"]).replace(" ", "").replace("[^0-9a-zA-Z]+", "")
# Add the value towards the end of the bar to help the audience understand the order of magnitude
text_frame = bar.text_frame
text_frame.clear()
p = text_frame.paragraphs[0]
p.alignment = PP_ALIGN.RIGHT
p.font.size = Pt(10)
p.text = millify(country[1]["value"])
# Add the country name on the Y-Axis to help the user track the country
bartext = slide.shapes.add_textbox(Cm(2), Cm((0.7 (index + 1) + (0.21 * (index + 1))) + 2), Cm(1.5), Cm(1.5))
bartext_run = bartext.text_frame.add_paragraph()
bartext.text = country[1]["Country"]
bartext.text_frame.paragraphs[0].font.size = Pt(12)
bartext_run.font.bold = True
Step 11: Save the Presentation
We save the final presentation with the filename 'Racing Bar Chart.pptx'.
# Save the presentation
prs.save('Racing Bar Chart.pptx')
Post-Generation Steps:
Select All the Slides: Click and drag to select all slides in the presentation.
Choose "Morph" Animation: In the Transition Tab, choose the “Morph” animation (available in Office 2016 version and above).
Set Slide Duration: Change the duration of each slide to 0.5 seconds (Feel free to adjust based on your preference).
Configure Advance Slide Options:
Deselect the "On Mouse Click" option.
Select the "After" option.
Run the Presentation: Open the presentation in SlideShow mode.
Now, you have a dynamic Racing Bar Chart presentation showcasing the population changes over the years for the top 15 countries!
Comments