General Information
Lehigh Administrative Data Lake (“Data Lake”, “LADL”) is an ecosystem of technologies within the AWS (Amazon Web Services) environment. Utilizing several of their services, the goal is twofold:
- To be able to blend data between different data sets in a consistent manner, and
- To be able to analyze data that would be too big, cumbersome, or system-intensive to analyze on a desktop or in a system's database directly.
Utilizing these technologies, our aim is to have a consistent, repeatable, and scalable space for data to be used for analysis and data-directives across the university.
The name (LADL) comes from the concept that this data lake ecosystem is built to help with administrative parts of the university, and to differentiate it from any large-scale data projects that are associated with Lehigh’s Research Mission.
Key Contacts
LADL is a joint effort between the Office of Institutional Data, and Library and Technology Services; namely the Enterprise Systems, Systems Engineering, and Network Engineering teams.
Key Executive Stakeholders are:
- Yenny Anderson, Associate Vice President, Office of Institutional Data
- Ilena Key, Chief Technology Officer, Library and Technology Services
Additional technologists and data scientists within these areas bring together the technology, expertise, and analysis of the data within LADL.
Key Technologies Leveraged
LADL utilizes many of AWS’s services to achieve the desired outcomes:
- S3: Simple Storage Service - We utilize S3 for the purposes of flat files, intermediate data ingress, and other middleware functions.
- RDS: Relational Database Service - We utilize RDS to host databases, namely postgres and SQL Server, to handle smaller, more normalized data sets.
- Redshift: We use AWS’s Redshift environment for bigger datasets, since the technology is based on warehousing appliances, and can more easily handle larger datasets. Additionally, Redshift allows data to be plugged-in from RDS, allowing for a one-stop for cross-domain queries across the LADL ecosystem.
- Glue: This system acts as a go-between for a ton of our data efforts. From data cleansing to ETL efforts to data ingress, Glue acts as an invaluable system for pumping data from place to place.
- DMS: Database Migration Service - This system allows for data to be migrated from one database to another, and allows for semi-real time data ingress. We utilize it when Glue jobs would struggle under the sheer output and/or realtime nature of the data.
- Quicksight: This reporting tool is slowly being rolled out over campus and can use the LADL data to provide analysis and visualizations for the data.
What Data is in LADL?
As above, only administrative data, and for administrative purposes. Current datasets being used for process improvements and general analysis are:
- Moodle (LMS, Coursesite) data
- Banner (ERP) data
- Ccure (Physical Access) data
- Wireless Network data
- Financial Reporting data
We are working on getting more data sets available for the Office of Institutional Data to analyze and normalize for broad-scale reporting!
I Would Like Access to LADL!
The goal, outside of the stated, is to create normalized and anonymized data sets for broader usage of the data across campus. We believe that data should be in an open and inclusive format, but we put guard rails around it so that we avoid potential risk complications with breaches, misuse of data, or other caveats that are apparent.
If you’re interested in contributing to the analysis efforts, feel free to submit a message in the Consult Link below.
I Have Data That Should Be in LADL!
We would love for your data to be part of the LADL ecosystem! Especially if it helps with the broader university goals in our strategic plan.
Note that not all data in the university is necessarily appropriate for this data lake effort. However, we do have tons of different strategies to help you create, manipulate, store, and analyze data. Different areas help do this already, such as:
- Endpoint Engineering
- Client Services
- High Performance Computing
- Research Computing
If you’re not sure, feel free to submit your ideas at the Consult Link below! We would love to discuss the best strategy for you and the university!