Understanding The PBJ Metrics Data Warehouse

Updated 11 June 2024 23:34

This article serves as an introduction to the Powered by Jumbo Metrics Data Warehouse. It provides valuable information defining what the metrics data warehouse is, outlining its functionalities, and explaining its potential use cases. You’ll gain a clear understanding of how the data warehouse consolidates and stores data, enabling you to extract valuable insights and make data-driven decisions. Additionally, the article will provide insights into the frequency of updates, ensuring that you have up-to-date information at your disposal. Whether you’re new to the Powered by Jumbo platform or seeking a deeper understanding of the metrics data warehouse, this article will equip you with the knowledge needed to leverage this powerful tool effectively.

Prerequisites

Access to your PBJ Metrics Database
SQL proficient

What Is The Powered By Jumbo Metrics Data Warehouse?

The PBJ Metrics Data Warehouse allows PBJ users to quickly access critical raw data from disparate sources all in one place. The PBJ Metrics Data Warehouse provides a simple way for PBJ users to access the data they need to run custom data queries for analytical purposes or custom business reporting. The data warehouse provides several dimension tables that make querying key collections of data much faster. This allows for information involving cross-functional activities to be provided in a consistent way, enabling ad-hoc reporting and queries. This results in a reduced amount of time it takes for analysis and reporting, while also reducing the stress on the production system. Furthermore, the benefits also extend to the user as it reduces the need to retrieve data from multiple sources, which also leads to greater efficiencies. Perhaps the most important function of a Metrics Data Warehouse is the storage of historical data. This helps users to analyse different time periods and trends to reveal hidden patterns of data-flows and groupings.

Why Do We Need The Metrics Data Warehouse?

To manage performance, provide redundancy and allow clients to scale as their lottery grows, the PBJ transactional databases are spread over several separate instances referred to as nodes. These databases have been highly optimised to manage client-facing transactions and are not suitable for historical reporting.

The Metrics Data Warehouse, on the other hand, aggregates all the data into one place and has been optimised to allow for large queries that look back at historical data. Queries on the Metric’s data warehouse don’t have an impact on the performance of the transactional site.

The Metrics Data Warehouse benefits:

Decision-makers who rely on a mass amount of data as it is essential to systemic approaches for decision-making
Organisations who use customised, complex processes to obtain information from multiple data sources
Organisations who want efficient performance when accessing huge amounts of data, which is a necessity for generating reports, graphs or charts
Data warehouse is a first step If you want to discover ‘hidden patterns’ of data-flows and groupings

Technical Overview

The ETL-based data warehouse uses staging, data integration, and access layers to house its key functions. All of the raw data extracted from the different source data systems are stored in the staging database. The disparate data sets are integrated from staging into the integration layer and once transformed are stored in the operational data store (ODS) database. The integrated data is then moved into the data warehouse and given hierarchy after being organised into dimensions, facts and aggregate facts, the star schema. The access layer is there to enable ease of use when retrieving data.

Dimensions filter and give unordered numeric measures structured labelling. This creates data sets that contain unique non-overlapping data elements/objects which can be used with BI tools.

Screen-Shot-2020-07-20-at-1.58.45-pm-1024x368.png

Types Of Dimensions In The Data Warehouse

Screen-Shot-2020-07-07-at-11.56.51-am-775x1024.png

Example Date Hierarchies:

The date dimension contains several hierarchies:

“Days (are grouped into) Months (which are grouped into) Years”,
“Days (are grouped into) Weeks (which are grouped into) Years”
“Days (are grouped into) Months (which are grouped into) Quarters (which are grouped into) Years”

View this article here for more information on different statuses that may appear in the Metrics DB.

Example PBJ Metrics Data Warehouse Queries

Finding a customer that has signed up after the 5th of July 2020 in the Metrics Data Warehouse:

SELECT * FROM customer WHERE customer_timestamp > '2020-07-05';

Finding VIP customers in the Metrics Data Warehouse:

SELECT DISTINCT(customer_id) FROM ticket_sales JOIN customer USING(customer_id) WHERE is_vip;

Finding customers with an Average Transaction Value greater than $25:

SELECT customer_id, avg(sales) FROM ticket_sales GROUP BY customer_id HAVING avg(sales) > 25;

Finding customers that have not purchased in the current draw, and have opted-in to email communications for the prize home draw:

WITH current_draw AS (
    SELECT dim_lottery_id, max(dim_draw_id) as dim_draw_id
    FROM dim_draw group by dim_lottery_id
)
SELECT customer_id
FROM ticket_sales
JOIN customer USING(customer_id)
LEFT JOIN current_draw USING(dim_draw_id)
WHERE dim_lottery_id IS NULL
AND receive_emails;

Understanding SQL Select Statements

An SQL SELECT statement retrieves records from a database table according to clauses (for example, FROM and WHERE) that specify criteria. The syntax is:

SELECT column1, column2 FROM table1, table2 WHERE column2='value';

In the above SQL statement:

The SELECT clause specifies one or more columns to be retrieved; to specify multiple columns, use a comma and a space between column names. To retrieve all columns, use the wild card * (an asterisk).
The FROM clause specifies one or more tables to be queried. Use a comma and space between table names when specifying multiple tables.
The WHERE clause selects only the rows in which the specified column contains the specified value. The value is enclosed in single quotes (for example, WHERE first_name='Joe').
The semicolon (;) is the statement terminator. If only one statement is being sent to the back end, a statement terminator is not needed; if more than one is being sent, it is needed. It is best practice to include it.

Example:

Examples of SQL SELECT statements:

To select all columns from a table (Customers) for rows where the Last_Name column has Smith for its value, you would use this SELECT statement:

SELECT * FROM Customers WHERE Last_Name='Smith';

Example result:

+---------+-----------+------------+   
| Cust_No | Last_Name | First_Name |   
+---------+-----------+------------+   
| 2001    | Smith     | Johnny     |   
| 3094    | Smith     | Stephen    |   
| 2437    | Smith     | Roger      |   
+---------+-----------+------------+   
3 rows in set (0.05 sec)

To return only the Cust_No and First_Name columns, based on the same criteria as above, use this statement:

SELECT Cust_No, First_Name FROM Customers WHERE Last_Name='Smith';

Example result:

+---------+------------+   
| Cust_No | First_Name |   
+---------+------------+   
| 2001    | Johnny     |   
| 3094    | Stephen    |   
| 2437    | Roger      |   
+---------+------------+   
3 rows in set (0.05 sec)

To select all columns from a table (Customers) for rows where the City column has Brisbane or Gold Coast for its value, you would use this SELECT statement:

SELECT * FROM Customers WHERE City = 'Brisbane' or City = 'Gold Coast';

Example result:

+---------+-----------+------------+------------+   
| Cust_No | Last_Name | First_Name | City       |  
+---------+-----------+------------+------------+  
| 4532    | Gibraltar | Matt       | Brisbane   |
| 6098    | Schumer   | Violet     | Gold Coast | 
| 1039    | Linnett   | Sarah      | Brisbane   |
+---------+-----------+------------+------------+   
3 rows in set (0.05 sec)

To make a WHERE clause find inexact matches, add the pattern-matching operator LIKE. The LIKE operator uses the % (percent symbol) wild card to match zero or more characters, and the underscore ( _) wild card to match exactly one character. For example:

To select the First_Name and Nickname columns from the Friends table for rows in which the Nickname column contains the string “rock”, use this statement:

SELECT First_Name, Nickname FROM Friends WHERE Nickname LIKE '%rock%';

Example result:

 +------------+------------+
 | First_Name | Nickname   |
 +------------+------------+
 | Jen        | The_Rock   |
 | Rod        | RockyBalboa|  
 | Glenn      | TheRock    |
 +------------+------------+
 3 rows in set (0.03 sec)

To query the same table, retrieve all columns for rows in which the First_Name column’s value begins with any letter and ends with “en”, use this statement:

 SELECT * FROM Friends WHERE First_Name LIKE '_en';

The result set might look like:

+------------+------------+-----------+   
| First_Name | Last_Name  | Nickname  |   
+------------+------------+-----------+   
| Jen        | Huxley     | The_Rock  |   
| Glenn      | Adams      | TheRock   |     
+------------+------------+-----------+   
2 rows in set (0.03 sec)

How And When Is It Updated?

ETL occurs every 30 minutes, 24 hours a day. See below as an example:

ETL Schedule Example

9:00am

9:30am

10:00am