By 2025, the world will be deluged with 163 ZBs of data. How much is that? Well, a single zettabyte is equal to 152 million years of ultrahigh definition 8K video. It's also the amount of data that can be stored on 250 billion DVDs. Now, multiply these numbers by 163. Collectively we are producing 1.7 MBs of new information every second.
In this new data climate, companies and organizations of all kinds, including universities, are grappling with new requirements and pressures around storing, processing, and analyzing data. IT teams especially feel the impact of this onslaught of new data.
As a business intelligence architect at Bucknell University in Lewisburg, Pennsylvania, I am a member of a small team tasked with designing data and analytics solutions to help academic and administrative departments leverage information to enhance the student experience and empower employees. To make this happen, we needed a platform that could store and process a data from a variety of sources both internal and external to campus. At the same time, we were implementing a number of new cloud-based enterprise applications, which added additional complexity to our task.
To match this transformation in our overall IT strategy, we needed a platform that was as agile and scalable as the cloud and could manage new data from the multitude of new and emerging data sources.
"With a small BI team, data warehouse automation helps Bucknell University ease the strain of data overload and transition to the cloud"
From a technology standpoint, our platform would need to include four major components:
• A Data Lake for handling the variety of data sources that is available to us today and in the future, including semi-structured and completely unstructured data.
• A traditional Data Warehouse which would allow us to transform our data into a format which is conducive to reporting, data visualization, and analysis.
• An Extract, Transform and Load (ETL) Platform which would allow us to perform the transformations required to move data from sources and/or the data lake to the data warehouse.
• A Business Intelligence Platform which allows us to report, analyze and visualize our data.
The decisions on a number of these tools came relatively easily and quickly. We decided to leverage MongoDB as our data lake platform, running it on their cloud-based database-as-a-service offering MongoDB Atlas. For our data warehouse, we chose to run SQL Server on AWS, using their managed Relational Database Service. And we already had the best available BI tools in Cognos, which we use for operational reporting, and Tableau, which we use for data visualization and dashboards.
The most complicated task was deciding on an ETL platform, which would be the key to bringing all of these four components together. At first, we thought a traditional ETL solution was our only option. Having worked with traditional ETL in the past, I wasn’t looking forward to this task. ETL, like a programming language, is a one-size-fits-all platform, which allows you the flexibility to do any type of data integration you like. But when it’s used to build a data warehouse, you are required to handle all the manual and repetitive tasks, essentially building out an entire library of hand-designed data flows. This means that it can take an extremely long time to build a data warehouse using traditional ETL. And, to make things worse, it has a huge learning curve.
Fortunately, during our research, we discovered the concept of data warehouse automation and immediately realized that this was exactly what we were looking for. Data warehouse automation software allows developers to automate monotonous and repetitive tasks, masking the complexities along the way. This type of capability enables us to automate repetitive, time-consuming tasks associated with building and maintaining a data warehouse, allowing our team to focus on business-level problems.
After investigating and trialling a number of different products, we decided on an automation solution from WhereScape that helped us to significantly re-architect our data warehouse and speed up our development process. WhereScape automation takes on many of the painful, time-consuming, manual and repetitive tasks associated with the design, development, deployment and operation of our data warehouse, which we appreciate as a small development team with a wide variety of responsibilities.
This may seem obvious, but one of the lessons we learned from data warehouse automation is the immense value of automating documentation. Often, despite best intentions, documentation is often left undone or incomplete post development. By leveraging a metadata-driven system such as WhereScape, both technical and business-level documentation is automatically generated alongside development efforts. This relieves a great and time-intensive burden for a small team, and puts us in a better position to understand data lineage and impact as we contemplate future infrastructure evolution.
Also, traditional ETL tools often operate like a programming language. You can do anything you can imagine if you can manually create each instruction. Unfortunately, this approach is less agile and requires a significant learning curve.
Data warehouse automation, however, allowed our team to automate monotonous and repetitive tasks. This resulted in our ability to build, validate and begin using the student module subject area within the new data warehouse twice as fast as would have been possible reliant on an ETL approach.
Data warehouses require on-going enhancements and extensions to include new data sources and deliver additional projects downstream. Through impact analysis capabilities available in WhereScape, we can easily see how objects will be influenced in the long-term.
Conversely, this type of analysis in an ETL environment can be extremely challenging. In fact, conducting any impact analysis through a legacy ETL tool can be a nightmare.
With WhereScape, we can merely click an object to perform an impact analysis. We can also reverse the process to track backwards, seeing which objects were used to build it. This approach helps us anticipate how changes impact the data warehouse.
By implementing data warehouse automation, we can focus on the most critical aspect of a data warehouse—delivering on business requirements. We can let the automation software take care of the low-level details of developing and maintaining the platform.
The overarching theme that resonates through these lessons, and our experience with data warehouse automation, is the amount of time it saves us while better positioning us to execute on overall IT strategy and meet student and faculty needs. With a small BI team, data warehouse automation helps Bucknell University ease the strain of data overload and transition to the cloud.