Building a Marketing Data Warehouse in Google Cloud Platform

marketing data warehouse in google cloud platform

In January of 2019, I started my journey as a freelance marketing analytics consultant. The first two months I spent diving deep into the Google Cloud Platform to try and figure out if this technology could solve the main problems that most of my clients had:

“How to do analysis across the walled gardens of digital advertising tools and across an increasing range of customer interaction platforms.”

Over the past 1.5 years I’ve been working, together with my partner in crime Hussain Mehmood and the MarketLytics team, to solve this question and questions like it for my clients leveraging Google Cloud Platform by building marketing data warehouses in BigQuery. This blogpost describes our vision and our learning so far working on projects for clients like, and

The 3 Main Challenges in Digital Marketing

  • Measure user interaction across different apps and platforms
    As if it wasn’t hard enough to implement proper tracking for websites, most businesses now have multiple platforms (web, app, interactive campaigns) where consumers interact with them. Measuring across them is essential when making decisions about whether something is successful or not.

  • Measure advertising interaction and successfully attribute across walled gardens of advertising platforms
    As long as we do not measure across the walled gardens of Google & Facebook (and others) to figure out the entire customer journey and the intertwined interactions they take towards purchasing with us, we are stabbing in the dark when trying to optimize our campaigns for effectiveness.

  • Be able to timely activate insights from data back into platform and marketing interactions with customers
    Insights ≠ value. Without turning insights into (preferably automated) action, it’s hard to actually profit from our data infrastructure. You need to be able to trigger actions based on data points when it’s actually relevant to the consumer.

The Essential Building Blocks for Marketing Success

In order to tackle these challenge successfully, there are a couple of things your organization should try to build in one form or the other.

  • A way to extract data from all your platforms and tools
  • A place to store all this data
  • A way to merge relevant data and query across different datasets
  • A way to automatically activate marketing actions based on data signals

Obviously, the right people with the skillsets to execute (or at least manage and hire-in) these processes are essential to all of this, but not focussed on in this post.

What you will likely end up with, is a data infrastructure that looks like the image below.

a schematic overview of a marketing data warehouse

What is Google Cloud Platform and Why do we prefer it?

Google’s Cloud Platform is the collection of services and tools that Google has produced for internal usage and opened up for developers to use and build upon. In basic terms: it solves a lot of the complex problems that are required when running software and allows developers to focus on their core app more and less on the infrastructure behind it.

Popular competitors are Amazon’s AWS and Microsoft Azure. The reason Google Cloud Platform is (more) interesting for us as marketers, in our opinion, is three-fold:


Google’s BigQuery is the data warehouse of choice for most organizations dealing with marketing data. The reasons are (among others):

  • Storage is virtually limitless (user interaction data can be very large)
  • Cost of storage is really low
  • Querying large datasets is effortless and fast
  • Frequent updating of datasets is easy and fast
  • Ad-Hoc analysis is very easy

Extract, Load & Transform instead of Extract, Transform & Load.

In the past, people used to mention “ETL processes” when talking about streams of data. What they were talking about was the process of extracting data, transforming it into the desired format, and loading it into the desired tool (for instance the data warehouse). Thanks to the cost of storage data dropping and the ability to query large datasets increasing, in most cases it now pays off to opt for a ELT, instead of ETL strategy.

Extract and load the raw data into your data warehouse first (often, the raw ingestion part is dubbed data lake) and handle the transformation after that. You maintain your raw data ingestion tables (available to repurpose later on) and shift the most intensive part of the chain (transformation) to the end.

Your Data “Lake” and “Warehouse” in the same tool

Thanks to the cost structure for data storage and the many ways to easily ingest data, BigQuery is an interesting place to store both your “raw ingestion data” as well as your transformation and final data tables. This is interesting for multiple reasons:

  • You limit the dreaded “silo’s” within organizations. By having everything available in one platform, doing analysis across datasets becomes so much easier.
  • Storing raw data in your data lake allows you to re-run analysis or build other applications on top of that data in later stages. One of the hardest things is to know what analysis you want to do, before collecting and exploring the data. By storing the raw data this is now a problem of the past.
  • Having your final data tables and transformation tables available in the same tool saves a lot of headaches trying to debug issues.
    Whenever you spot a problem in the final product (like a dashboard), you need to figure out where this problem occurred in your setup. By having the entire flow available in one tool (BigQuery) this is a lot easier to do.

Extracting Data from Marketing Tools & Platforms

extracting data from marketing tools and advertising platforms into your marketing data warehouse

The first thing you’re going to run into when building your marketing data warehouse is the challenge of extracting data from all different sources.

Depending on your marketing stack, you’ll likely want to export advertising data from Google and Facebook, have your analytics data from both web and apps, have a CRM dataset, a transactional dataset and on top of that additional marketing tools like your email marketing suite.

All of these platforms have data in them, and we need to get that data out. Furthermore, we need to continuously get updates from these platforms whenever data is added or altered.

The first step in creating the architecture of your marketing data warehouse should be to map out your marketing stack and identify the “schema” of the data that can be exported out of each tool. (Here is an example of the documentation of the Google Ads data export schema.) The main things you are looking for are:

  • A “Key” to match this data to other datasets.
    This could be a user ID to match sessions from a user across devices or platforms. Or a click ID to match a users’ impressions and click to their session behavior in analytics.

  • The level of granularity available.
    Most tools do not offer a ‘raw data export’. Both because it would be costly to do, but also because it would likely violate privacy agreements. In most cases, you’ll be forced to do aggregation on a certain level (for instance, per day). Understanding this is crucial when thinking about potential use-cases for the data in the future when building applications to activate your marketing campaigns based on this data.
Learnings about extracting data
  • Some tools have HIGHLY LIMITED exporting abilities. Investigate this first before proceeding any further.
  • API’s update, some very frequently. The price of a extraction service like Stitchdata or Supermetrics is often worth it compared to maintenance.
  • If native connectors are available (Like Google Ads to BigQuery) your best future bet is to go with those.

Identifying Users across Sessions, Devices & Platforms

identifying users across sessions, devices and platforms in your marketing data warehouse

User recognition is essential for marketing. After all, we want to understand what marketing messages a person has received in order to try and optimize our marketing campaigns for return on investment. But in the current technological landscape there are many issues making proper user identification hard:

  • Consumers use multiple devices.
  • Consumers use multiple browsers.
  • Privacy-enhancing features make not-logged-in traffic tracking harder.
  • Stitching historic data to currently identified data is usually not possible in off-the-shelve tools.

Building your 1st Party User Graph

Within your data warehouse, you get the freedom to create your own user graph which allows you to make this information as complete as possible. You should be able to create a table in your dataset which links all anonymous ID’s from past usage to one central user ID that you assign to a user.

Once you have your user graph logic in place, it becomes really interesting to ‘discover additional devices’ from your customers. For instance, if you have your customers’ email address via a purchase that happened on their desktop, you could probably stitch their mobile device into the user graph by leveraging email marketing (which is often opened on mobile) in a smart way.

Companies that are able to identify their users’ devices and stitch sessions together based on owned (first party) data position themselves to continue doing effective digital marketing while privacy enhancing features on devices and browsers increase.

Learnings about creating a User Graph
  • Map out all your different user ID’s across tools before deciding how to tie them together.
  • The User ID that is present on the first hit of a session is essential when thinking about attribution.
  • When altering historic data by updating your user graph, expect your dataset to change over time.

Tying Advertising data to Interaction data

tying advertising data to interaction data

When evaluating your advertising campaigns, it is essential to connect the right “clicks” to the right “sessions”. When stitching this data together, you’ll quickly discover the value of adding ‘dynamic values’ to your links (besides the default UTM tracking).

Both Facebook and Google support these dynamic link decorations which allow you to pass along a unique value per “Campaign”, per “Adset” and per “Ad”. Adding these as dynamic parameters to your links when ads are clicked allows you to match these to the session which has these parameters in their landing page URL.

Attribution Modelling in Marketing Data Warehouse

When doing attribution modeling, you are now able to aggregate per campaign/adset/ad per specific date and tie the Impressions, Clicks and Cost together with the Sessions and Conversions that are tied to it.

The flexibility of creating your own attribution dataset allows you to solve a couple of really interesting topics, like:

  • Attribution across different advertising networks (walled gardens) and across different owned platforms (app, web, etc)
  • Ability to extend your “lookback window” for attribution beyond what normal tools allow you to do.
  • Ability to update historic data once your user graph expands, making attribution more precise over time.
  • Ability to apply machine learning algorithms instead of manual ‘hand-made’ attribution models.
Learnings about transforming Advertising Data
  • Add unique numeric ID’s to your ads as soon as possible for historic data to match.
  • Naming convention is everything, spend some time on setting it up properly in your ad hierarchy.
  • Proper Attribution is highly reliant on your ability to stitch earlier (anonymous) visits to your customer journey.

Visualizing Data and Extracting Insights

One of the first things you will probably want to do with the data in your marketing data warehouse is visualize it for further analysis. Especially marketing attribution dashboards that work across all of your advertising tools and platforms are a unique opportunity which is impossible to achieve without a marketing data warehouse.

There are a lot of visualization tools out there, and most of them have a native BigQuery connector. If you don’t have a tool yet, Google’s free tool called Google Data Studio is a great tool to start your journey with.

Avoid staring at numbers for the sake of it

As I’ve described in earlier blog posts, humans have the tendency to want a lot of data for the wrong reasons. Most of the time, to look smart and blame something when a decision turns out bad. Dashboards can be very valuable, but can also drain a lot of time from your organization if not used effectively.

The main question to keep in mind when creating a dashboard is: “What decision/action are you trying to make and how often do you need to make it?”

Ideally, each part of information on your dashboard should have an answer to that question attached to it.

Spreading the ability to create dashboards

The technical act of creating a dashboard is becoming easier every day with intuitive tools like Google Data Studio. This allows everybody, not just analysts, to create dashboards for their own usage. Although this “democratization of data” is probably a good thing in the long run, you should probably take some precautions like:

  • Only share “final tables” of data with non-analysts to prevent people from using the wrong datasets in their dashboards.
  • Create a common naming convention of data points to use in these final tables and document that somewhere where everybody can access it.
  • Document the limitations of datasets (for instance maximum granularity to use) and holes/quality of datasets.
Learnings about visualizing data
  • Start with interviewing the person that will actually use the dashboard to make a decision or take an action. Not their manager.
  • Begin as ugly as possible on purpose, making all feedback about the purpose instead of the looks.
  • Dashboards should not live forever. Take a project approach to them and iterate in versions.

Taking (Automated) Action upon Data

Only action (or purposeful in-action) can generate value. Once you have defined what actions to take upon the data you collect you now have the ability to make the data work for you. Because you’ve created your own marketing data warehouse, you now have the ability and flexibility to design the actions you want to take. Google Cloud Platform offers a variety of tools to make this possible like Google Cloud Functions.

Here are 3 interesting ideas to get you started, but obviously the more specific and tailored to your business they are the more valuable they will get.

Personalized Email Marketing with Recommended Items based on previous browsing behavior

Once you have your User Graph completed and you collect “Product Impressions”, “Product Add To Cart” and “Product Purchases” you now have all the ingredients to let Google’s Recommendations API serve you with the most likely products a user will purchase based on machine learning. You can feed this information into your email marketing tool of choice to generate personalized email marketing based on your product feed.

Create your own Cross & Up-sell module using Recommendation API

Using the same API as mentioned in the previous tip, you can create your own logic to decide which product SKU’s to show in the upsell and cross sell areas in your webshop.

Apply Machine Learning to your Attribution Models

Leverage google’s Tensorflow to apply propensity modelling to your attribution data instead of using arbitrary man-made attribution models like last- or first-click.

Final Thoughts

Building a marketing data warehouse is a challenging and interesting project to work on with a high (potential) reward if done and maintained the right way.

If you want to build your own marketing data warehouse, I recommend you to start with these 3 things:

  • Map out at least 3 use-cases with the final data that you think will be worth your while.
  • Do a thorough inventory of your current marketing stack and review their documentation when it comes to ‘exporting data’.
  • Create a schematic overview of your system (using a tool like Miro).

Once you have thought these 3 things through, you should have a decent picture of the amount of complexity involved and the potential upside you can achieve by building a data warehouse for marketing. Armed with these 3 things, you have a great starting point to talk to an internal data engineer or external consultant who can help you proceed.

Interested in working with us?

Do you think your organization needs a marketing data warehouse? Need outside help to build (parts of) it? Let’s have a chat! Use the link below to schedule a call with me.

Read something interesting? Lets talk!


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.