🧹Course: Cleaning, preparing and transforming data with Tableau Prep
🧹Course: Cleaning, preparing and transforming data with Tableau PrepUp to now we've created solutions where the data has already been prepared for us, in this course we look at how we can clean, prepare and transform data to use in our analysis using Tableau PrepFor information on how Tableau Academy works please refer to our Welcome To Tableau Academy Guide. This Week’s ProjectIn most of the projects we’ve covered so far the data used has already been prepared for you, in reality a large part of data analysis and visualisation involves gathering and preparing the data you’ll need to use so this week we look at using Tableau Prep Builder to clean, prepare and transform date for the project. The project involves creating a data set that can be used to analyse how much time and resource is being used across some key business change projects and will cover combining timesheet data from a number of sources, adding in HR data and then cleaning and transforming the data ready for analysis. The Brief:In your role as a Project Management Analyst you’ve been asked to start providing insight into the allocation of people’s time to five key business change projects that are currently in progress at your company. You have access to download data from your companies timesheet booking systems where every employee has to book how many hours they’ve worked on each specific project and activities within that project. They also have to book any none project related time such as training, annual leave or other types of admin not, to enable this to be done a ‘dummy’ project has been created in the timesheet system to book this time against. Unfortunately each region in your company uses a different timesheet booking system, while they are all from the same supplier there is no connectivity or interaction between the individual regions systems so you’ll need to combine multiple data sources. The Business Change leadership team have also asked if it’s possible to get data that includes the following as this will form part of the ongoing analysis and reporting of this data:
For the second two points you’ve been supplied with an Employee Data file from HR which contains a list of relevant employees and the required details as well as a list of the average FTE costs for each role in each region. For point 4 it’s not possible to get the data in this way from the timesheets data or to update the timesheet systems to provide it so you’re going to need to find a way to transform the data to provide the required info. The last point is critical to this project, as a one off this could be done manually but with the requirement to update it every month or on demand an easily repeatable solution is required. Required Data:The files you need for this can be found in the Course Data Files folder for this project, as this course is focussed around preparing data we’ll explore the data and files as we go through so for now you just need to make sure you’ve downloaded the files in the above folder. Additional Information:This project uses Tableau Prep Builder, this is included as part of your Tableau Desktop or Creator license. If you don’t currently have a license or are using Tableau Public you can get a free 14 day trial of Tableau Prep to complete this course (make sure you don’t activate it until you’re ready to start though!) More info on Tableau Prep Builder can be found here: https://www.tableau.com/en-gb/products/prep And the latest version can be downloaded here: https://www.tableau.com/en-gb/products/prep/download Tableau Prep offers an easy to use, drag and drop way to clean, prepare and transform data ready for analysis and be able to repeat these steps, in organisations you may find that this can be done directly within your databases/data warehouse using SQL however it’s still useful to know what can be done with Tableau Prep since it’s part of your Tableau license anyway. Tableau Prep also allows you to take care of preparing data yourself if you don’t have the access or expertise to write SQL code to do this, it’s also useful for quickly preparing data for one off exercises, proof of concepts or where you can’t make changes to the underlying data sources like in the scenario we’re looking at. Course Files:You can download the files required for this project from here: Tableau Academy Course Files In particular you will need to download and save the source data files from the the Course Data Files folder We’ve also included the data files that are output as part of this course and the Tableau Prep ‘flow’ files used to create them for your reference. Solution
The steps below detail the solution created for this and how it was built for you to follow through. Step 1: Combine the timesheet data exports from each region into one fileStep 1.1 - Check the timesheets data filesTo begin with we want to have a look at the source files we’ve got to see what we need to do with them. If we open the three “Timesheets_ …” files we notice:
Step 1.2 - Combine the APAC files into oneThe first thing we’re going to do is combine our APAC file which is split by country into one table (APAC is short for Asia Pacific and refers to countries in this region), this will mean we can apply our cleaning and preparing steps just once on the full table rather than having to repeat the same step for multiple files.
On the blue bar on the left you should now see a table for each of the Asia Pacific countries.
Once you drop the table onto the pane a grey Input step window should appear in the bottom half of the screen, this is where we set how to load the data in from our file.
The wildcard union is where you tell Tableau Prep how to find files with a similar name etc. so it can load them without you having to specify each individual file one by one. If you have a look at some of the options you’ll see this is pretty flexible and can be used to locate files in a particular folder, that have a similar naming or follow a particular naming convention.
Tableau Prep should have automatically found the five sheets within the file we’re loading, you don’t need to change any options but you should see that it’s going to pull in all sheets from the file we’ve specified, at the bottom you should see five sheets each with a different country name under the Include files / Include sheets list.
Once we’ve created an input step we can then use the Tableau Prep functions to do a lot of cleaning, preparing and transforming our data. For this step we’re going to look at some basic cleaning options that removes data we don’t want in our end file.
A new step should now have appeared in your data pane called Clean 1, the pane below should display the columns and data in your table, to begin with we’re just going to remove a couple of columns from this table.
This has now removed those columns that we don’t need from our data without changing the underlying data source files (if you were to open them up you should see they haven’t changed). Tableau Prep allows you to see each of the clean steps you’ve created and undo or edit them if you need to as well, this is a great way of being able to track what changes have been made.
You should see the 2 remove steps you just created in here, if you decided you wanted to keep these columns at a later date you can simply remove the steps by clicking them and then clicking the x on the right hand side (later on we’ll see how we can edit certain steps as well but with a remove step there’s nothing we can edit).
Step 1.3 - Load and clean the Americas timesheets dataIn this step we’re going to add in the Americas timesheet data and clean the hours booked data field so that it gives us the time as a number that we can use later on.
This time we don’t need to do anything with the input step so just rename the new input step that’s appeared as Load Americas Data and add a new Clean step. Notice that the Hours column is being brought in as a text string, this is because when the data has been entered it’s been entered as x hours. This means we can’t do any calculations with the numbers so let’s clean this up.
You should see that there’s a few options for being able to quickly clean our data, they’re all pretty self explanatory and we’ll use some of the others later in the course.
Notice how Tableau Prep has gone through each entry and removed any letters for us? We also had a space between the number and hours (you might not be able to see this immediately but if you double click one of the values in the column you should see the space at the end), we’ll need to remove that before we can use the numbers,
We can now convert the columns format to numbers so that it’s the same as the fields from the other data sources.
That column should now have updated to a number format.
That’s all we need to do with the Americas file so rename the Clean 2 step as Clean Americas Data and we can move onto importing our final timesheet file.
Step 1.4 - Load the EMEA timesheets dataIn this step we’ll follow the same process for loading our final timesheets file in from the EMEA region (EMEA stands for Europe, Middle East and Africa and covers countries in these regions), in this step we don’t need to do any cleaning though so this will be a simple loading of the data.
Step 1.5 - Combine the three regions timesheets data into a single tableWe’ve now loaded the timesheets data for each region and done any cleaning that is relevant for each specific file, in this step we’re going to combine them together into one table that we can then continue to clean and prepare.
The union step will try to combine two data tables where they have the same fields, we’ll look at the options for this in a minute but first we want to add the EMEA data to it.
The steps should now show a line from each of the Clean.. steps to the new Union step.
With a union step Tableau Prep will create a combined table from the specified data tables, when the individual tables have the same column names and number format Tableau Prep will automatically combine them. On the Settings tab that should be visible you should be able to see that Tableau Prep has been able to identify that the Employee, Details and Hours columns are the same in the Americas and EMEA data so has combined those columns. It hasn’t been able to determine the relevant columns from the APAC file as they have different names so we need to match them up manually
This should now disappear as the fields have been matched across all three files.
We’ve now got a table with the timesheet data from all of our input files, rename the Union 1 step to Combine Timesheet Data.
Step 1.6 - Finish cleaning and preparing the timesheet dataOur final step in this section is to finish off cleaning and preparing the timesheet data, as these steps apply to all of our input files we can do them on the combined data, if we hadn’t combined out data we’d have to repeat this on each of our files/sheets... Subscribe to Tableau Academy to read the rest.Become a paying subscriber of Tableau Academy to get access to this post and other subscriber-only content. A subscription gets you:
|
Older messages
Course: Creating a business Key Performance Indicator (KPI) dashboard
Tuesday, May 16, 2023
In the third Tableau Academy course you've been tasked with creating a KPI dashboard to align the strategic goals and aims within your organisation and reduce the number of reports and KPIs.
Course: Creating an interactive sales dashboard in Tableau
Friday, May 12, 2023
In this Tableau Academy course you take up the role of a sales operations analyst who has been tasked with creating an interactive global sales dashboard for the retail chain they work for.
Tableau Tea Break, Tableau Jobs & Tableau Academy are now one!
Wednesday, April 26, 2023
Back for 2023! We're pleased to announce Tableau Tea Break, Tableau Jobs and Tableau Academy are all back and now under one banner, read on to find out more!
Your Complete Tableau Learning Pathway
Friday, February 17, 2023
Go from complete Tableau newbie to Tableau master in no time by following our tried and tested Tableau Learning Pathway. Simply follow each of the courses below in your own time and you'll cover
Introducing Tableau Jobs!
Monday, November 21, 2022
If you're looking for a new role, or just interested in what's out there check out our Tableau Jobs project below.
You Might Also Like
A new formula for lifestyle creep?
Saturday, December 21, 2024
4% ain't gonna cut it ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
• Authors • Promote your book series on social media • all in one order
Saturday, December 21, 2024
~ Book Series Ads for Authors ~ All in One Order! SEE WHAT AUTHORS ARE SAYING ABOUT CONTENTMO ! BOOK SERIES PROMOTIONS by ContentMo We want to help you get your book series out on front of readers. Our
6 Ways to Celebrate Christmas like a Minimalist
Saturday, December 21, 2024
6 Ways to Celebrate Christmas like a Minimalist I recently read a quote about Christmas that left me thinking. In Letters from Father Christmas, JRR Tolkien says, “Here comes Christmas! That
[Electric Speed] My favorite tools of 2024
Saturday, December 21, 2024
Plus: voice synthesis | smartphone stands ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
Closes 12/22 • Book a Spot in Our "Day after Christmas" Books Newsletter Promo •
Friday, December 20, 2024
We're emailing a newsletter on the day when many people are shopping with gift cards! enable
It's Not Too Late to Help People Read
Friday, December 20, 2024
The Now I Know 2024 fundraiser continues
🎤 The SWIPES Email (Friday, December 20th, 2024)
Friday, December 20, 2024
The SWIPES Email Friday, December 20th, 2024 An educational (and fun) email by Copywriting Course. Enjoy! Swipe: Vrrroooomm this ad for a Dyson wet vacuum showcases "data" in the form of
Top 10 in 2024
Friday, December 20, 2024
Get the most popular and sometimes surprising advice from this year. ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
New Course Live : ABM
Friday, December 20, 2024
Get ROI in Six Weeks ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏
3-2-1: How to deal with stress, forging your own reality, and learning vs. achieving
Thursday, December 19, 2024
“The most wisdom per word of any newsletter on the web.” 3-2-1: How to deal with stress, forging your own reality, and learning vs. achieving read on JAMESCLEAR.COM | DECEMBER 19, 2024 Happy 3-2-1