If you want to be a true spreadsheet power user, you need to learn about the Power Query and Power Pivot features in Excel. While you can get a lot done with Excel alone, using these built-in “Power” features will turn you into an advanced professional Excel user. In this guide, you learn how to use Power Query to import potentially hundreds of different files with millions of rows and use Power Pivot to generate complex analyses on the massive data models you import.
Want to protect your work from prying eyes? Learn how to add a password to your Excel workbook.
1. Importing Data From Multiple Files
Using Power Query, you can import data from different files in a single folder. Be mindful, though, as you should still ensure all the data follows the same format. Keep the number of columns and the column header names the same and ensure the data type for the values in each column is consistent.
For this guide, we are using five files that comprise ~100 million rows of data from Kaggle. Download them if you want to follow along or use your own data. With that in mind, here are the steps to import the data:
- Click on the “Data” tab in the ribbon.
- Select “Get Data -> From File -> From Folder,” then browse to your folder and select it.
- You should see a new window pop up. In our case, we have five test files that are named according to their size: “adult10m” has 10 million rows of data, “adult1m” has 1 million rows, “adult100k” has 100,000 rows, and so on. Click “Transform Data” to open the Power Query Editor.
2. Filtering the Imported Files and Rows
In the Power Query Editor, you can apply filters to import only the files that match the criteria you set, which is very useful for eliminating data sources you don’t want.
- We have four files with no file extension, but one file with the .DATA extension. Let’s remove the odd one by clicking the arrow next to “Extension.”
- We can see a list of all of our files’ file extensions. In this example, these are just (blank) and .data. Uncheck the extensions you don’t want.
- Each filter we applied will appear under the “Applied steps” section.
- We can also filter by file name by clicking the arrow next to “Name.”
- Let’s say we wanted to filter out any files with “k” in them, like “adult100k”, because we don’t want to work with any small files containing less than one million rows. We can apply a text filter. There are many options, but for our purposes, the “Does not contain” filter is the right one.
- Type in the text that you don’t want to see in the file name.
- Scroll a bit to the right until you can see the “Date modified” and “Date created” columns. Let’s filter to select only files created in a certain time range. Click the arrow next to “Date created.”
- Choose “Date/Time Filters -> Between.”
- Type in time values for the two conditions and confirm by pressing “OK.”
- We have completed filtering our data on a file-by-file level.
- Let’s combine our remaining files so that we can filter the data itself. Click the double arrows next to the “Content” header.
- Now that you can see all the values for individual rows, each row has an additional column, “Source.Name,” which indicates the file where it originated.
- To further improve our data: since our CSV files originally came with no column header names, you can right-click some of the column headers and click “Rename” to give them an appropriate name.
- Filters work here, too. We can apply a number range filter. For instance, click the arrow next to the “Age” column header, then choose “Number Filters -> Greater than.”
- Enter a number such as 30, thenn click “OK.” All rows with ages below the specified number will be excluded.
Tip: when working with spreadsheets in Microsoft Excel, it may be handy to know how to move a column.
3. Loading Filtered Data Into a Data Model
You can continue cleaning and filtering the data, but let’s move on and load it into a data model to start analyzing it.
- Finish up with Power Query by clicking “Close & Load -> Close & Load to.”
- In the pop-up that opens, select “Only Create Connection” and check “Add this data to the Data Model.”
- Wait for the data to be loaded. This can take a few minutes if you have a large amount of data.
Using the Power Pivot Data Model in Excel
- After the data loading is done, we can make changes (like adding new columns) to the data model by clicking “Data -> Data Tools -> Manage Data Model.”
- You should see the Power Pivot window. Let’s add a column with a Data Analysis Expressions (DAX) formula. DAX formulas are very similar to the Excel formulas that you’re probably already familiar with. Scroll to the right until you see the last column of your data, then click “Add Column.”
- Let’s write a formula for this column that applies a simple calculation to every row. Compared to Excel formulas, DAX syntax is slightly different. For example, we can add a column called “Description” and input the formula
=CONCATENATE([Column9],[Column10])
. Note the use of square brackets ([ ]) and the access via the column header names.
- If you wish, you can also perform additional modifications like renaming or removing columns by right-clicking the column header.
4. Analyzing Data With PivotTable
After you’re done making changes to the data model, it’s time to analyze the data in an Excel worksheet using PivotTable.
- Click “Home -> PivotTable” on the ribbon of the Power Pivot window to open the “Create PivotTable” window.
- Select “New Worksheet” and hit “OK.”
- The PivotTable Fields panel will open in a new sheet on the main Excel window. In the box under the search bar, you should see your data sources listed, “kaggle_adult_csv” in this example. Click on it to expand it and display all of its columns.
- Let’s do an analysis where we count the number of each “Description” in our data. We can select the columns we want to work with by clicking the checkbox next to their names. Select “Description” and notice how the focus automatically shifts to the “Rows” area below. This is Power Pivot demonstrating its intelligence – it can accurately guess the appropriate area for selections. Since we’re starting from a blank PivotTable, it makes sense that our first selection is rows.
- Drag the checked “Description” column into the empty space in “Values.” Notice that Power Pivot again correctly assumes that we want a “Count of Description” due to the data type being text. Also, our table has been updated with a column providing the counts of each occurrence of each “Description” value.
- We can add another dimension to our counting. For instance, drag the “Age” column into the empty “Columns” area to add more detail to the table by counting the number of descriptions by each age.
- You can also add a filter selection ability to the table by dragging the column “WorkClass” into the “Filters” area. Notice the new interactive cell that appears above our table.
- Click the arrow next to “All,” then select “Never-worked.” This will change the table to only include people with a WorkClass of “Never-worked” in the calculations. You can change the filter at any time to view the table through a different lens.
Dealing with many duplicates in your Excel workbook? Learn how to find and remove duplicates and clean up your spreadsheets.
5. Presenting Your Data With PowerChart
You can create more worksheets to analyze your data in different ways. This time, let’s create a more visually impressive result using PowerChart.
- Return to the “Power Pivot for Excel” window. If it’s closed, you can open it from the Excel window through “Data -> Data Tools -> Manage Data Model.”
- Click “PivotTable -> PivotChart -> New Worksheet.”
- In the panel on the right, click on the data source (“kaggle_adult_csv”) to expand it and display all of its columns.
- Drag and drop the columns from the data source into one of the four areas below to build the chart. Move the “OccupationType” column into the “Axis (Categories)” area.
- Drag “OccupationType” again, and this time drop it into the “Values” area. You’ll immediately notice that a bar graph is automatically generated. It shows the count of each type of occupation.
- You can also drag the “Ethnicity” column into the “Legend (Series)” area. You will instantly see a more granular chart that compares how many of each ethnicity make up each occupation.
- Next, we can apply filters to our Legend to simplify our chart and only look at certain values. Move your mouse over “Ethnicity” so that an arrow appears on the right side. Click the arrow.
- Click the checkboxes next to the values you want to remove, leaving only two.
- Let’s try something new. Under the “Values” section, click “Count of OccupationType,” then click “Remove Field.” Drag and drop the “Age” column to replace it.
- You’ll notice the value gets automatically defined as “Sum of Age,” but that’s not very useful in the real world. Let’s change that. Click it and select “Value Field Settings.”
- Since age is a number, there are plenty of calculations we can apply to this section. Try selecting “Average” for instance.
- The visual will get updated to show the average ages for each occupation by specific ethnicities (which we filtered for).
Want to brush up on Excel fundamentals? We have a comprehensive guide on all the keyboard shortcuts for Excel.
Frequently Asked Questions
What are the origins of Power Pivot?
Microsoft introduced Power Pivot as a separate add-on provided by Microsoft’s SQL Server 2008 R2, which was released in 2010. Back then, it was named “PowerPivot” with no space. In 2013 it was renamed “Power Pivot.” It only became a built-in Excel feature starting in 2016.
What are Data Analysis Expressions and how are they different from Excel formulas?
Data Analysis Expressions is a formula language used by Power Pivot in Excel and by Power BI. In essence, it’s how you write formulas for calculations in the aforementioned applications: the statements, syntax, and functions you use.
DAX’s main advantage over Excel formulas is that it is designed to operate on aggregate data. Standard Excel formulas can perform calculations on a row-by-row basis (such as calculating the sum of multiple values in a row), but DAX formulas can operate on a column-by-column basis (such as counting the number of a specific value in a column), or even on an entire table.
How do I fix the "This table was created by a query. To change this table, change the query instead" error?
This error in Excel Power Pivot happens when you try to modify a data model within Power Pivot when that data model was originally loaded via Power Query. You can’t do this, so you must modify it via the original method of loading: Power Query.
On the right side of Excel, click the “Queries & Connections” icon, then find and double-click on your query data source to open the window where you can modify the table.
Image credit: Pexels. All screenshots by Brandon Li.
Our latest tutorials delivered straight to your inbox