In big data, managing and transforming data effectively is crucial for any organization. Microsoft Fabric’s Dataflows Gen2 provides a powerful, user-friendly platform for such tasks. In this article, we will walk you through the process of creating new Dataflows Gen2 in Microsoft Fabric. We’ll start with the basics, explaining what Dataflows Gen2 is and why it’s beneficial. Then, we’ll explore the practical steps of creating a Dataflow, getting data, and applying transformations.
Table of Contents:
Dataflows Gen2 in Microsoft Fabric
Dataflows Gen2 is the latest version of dataflows in Microsoft Fabric. It’s a modern cloud-first implementation of Power Query. It allows you to easily transform and move data between different sources and destinations through a user-friendly interface. With over 300 data and AI-based transformations, it simplifies the data transformation process.
Here are some of the benefits of using Dataflows Gen2:
Improved Authoring Experience: Dataflows Gen2 offers a shorter authoring flow with auto-save and background publishing features. This means any changes made to a dataflow are autosaved to the cloud, allowing you to exit the authoring experience at any point and continue from where you left off at a later time.
Enhanced Monitoring and Refresh History: Dataflows Gen2 provides improved monitoring and refresh history, helping you keep track of your dataflows’ performance and updates.
Integration with Data Pipelines: Dataflows Gen2 can be integrated with data pipelines, providing high-scale computing.
Data Destinations: Unlike Dataflow Gen1, Dataflow Gen2 allows you to specify a data destination for your data. This feature enables you to separate your ETL (Extract, Transform, Load) logic and destination storage.
Reusability: Dataflows promote the reusability of underlying data elements, preventing the need to create separate connections with your cloud or on-premises data sources.
Support for Multiple Data Sources: Over 158 different connectors are currently supported to bring in files of different types whether they are on-premises or on a cloud service.
Steps to Create New Dataflows Gen2 in Microsoft Fabric
Before you start with Dataflows Gen2 in Microsoft Fabric, you need to fulfill the following prerequisites:
Microsoft Fabric Tenant Account: You need to have a Microsoft Fabric tenant account with an active subscription. If you don’t have one, you can create a free account.
Microsoft Fabric Enabled Workspace: Make sure you have a Microsoft Fabric enabled workspace1. If you don’t have a workspace, you can create one.
Follow the below steps to create new Dataflows Gen2.
STEP 1: Creating a Dataflow
1. Navigate to the Power BI. In the lower-left panel, click on the "Power BI" icon. From the options available, select "Data Factory". This tool in Power BI will allow you to create and manage dataflows.
2. Now, click on "Dataflow Gen2". This will open the dataflow editor where you can create and edit your dataflows.
STEP 2: Getting Data
3. In the dataflow editor, select Get data and then select More.
5. In the "Choose data source" window, search for OData.
6. Select OData as the new data source. OData is a protocol that allows you to create and consume RESTful APIs.
7. A dialog box will open where you will enter the URL of the OData service you want to connect to.
Enter the URL https://services.odata.org/v4/northwind/northwind.svc/, and then select Next. This URL is the endpoint of the Northwind OData service, which provides access to sample data from the Northwind database.
8. In the "Choose data" window, select the Orders and Customers tables, and then click Create. This will load the data from these tables into your dataflow.
You have now loaded your data into your first dataflow, congratulations!
Now it’s time to apply a couple of transformations to bring this data into the desired shape. You’re going to be doing this task from the Power Query editor.
STEP 3: Applying Transformations and Publishing
This step involves manipulating the data you have loaded into your dataflow to bring it into a desired format. This is done using Power Query, a powerful data connection technology that enables you to discover, connect, combine, and refine data across a wide variety of sources.
Enable the Data Profiling Tools
Data profiling tools provide useful insights about the quality and the characteristics of your data. To enable these tools, navigate to Home > Options.
Under the Global Options, check the column profile options and click OK.
Enable Diagram View
The diagram view provides a visual representation of your data and the transformations applied to it.
To enable the diagram view, use the options under the View tab in the Power Query editor ribbon, or by selecting the diagram view icon on the lower right side of the Power Query window.
Examples
Example 1: Calculating the Total Number of Orders Per Customer
In this example, we will calculate the total number of orders per customer using the Orders table.
STEP 1: Open the Orders table in your dataflow. This table should contain a record of all orders, each associated with a CustomerID.
STEP 2: Select the CustomerID column.
STEP 3: Navigate to the Transform tab in the ribbon and select Group By.
The Group By transformation allows you to group rows that have the same values in specified columns into aggregated data, like sums, averages, or counts.
STEP 4: Then, select Count Rows as the operation to perform on the grouped rows. This operation will count the number of orders for each CustomerID, effectively giving you the total number of orders per customer.
STEP 5: Click OK.
Example 2: Combining Data from the Customer's Table with the Count of Orders Per Customer
In this example, we will combine data from the Customers table with the Count of Orders per customer. This process involves merging the two tables based on a common field, allowing us to view customer information alongside their respective order counts.
STEP 1: In the Diagram View, select the Customers query.
STEP 2: Click on the “⋮” ellipsis to access the Merge queries as new transformation.
This option allows you to combine data from two queries into a new query.
STEP 3: Now, you’ll be asked to select a second table to merge with the Customers table.
In this case, you’ll want to select the query that contains the Count of Orders per customer (which you obtained by grouping the Orders table by CustomerID and counting the number of orders for each customer).
STEP 4: Next, you’ll be asked to select the matching column in both tables. In this case, you’ll want to select the CustomerID column, as this is the common column between the Customers table and the Count of Orders per customer.
STEP 5: Click OK.
Conclusion
We’ve now walked through the process of creating a new Dataflows Gen2 in Microsoft Fabric, step by step. From understanding the prerequisites to creating a dataflow, getting data, and applying transformations, we’ve covered it all.
Thank you for joining us on this journey through Dataflows Gen2. We hope this guide has been helpful and has empowered you to take full advantage of this powerful feature in Microsoft Fabric.
Happy data transforming!
Comments