Overview

This is a personal research project created to explore modern data engineering, analytics, and visualization techniques using real-world public data. The project uses child care enrollment data from Maryland's Early Childhood Education portal (earlychildhood.marylandpublicschools.org/data) to build a complete end-to-end analytics solution.

The goal was to demonstrate practical skills in data extraction, ETL processing, machine learning forecasting, Power BI dashboard development, and interactive web presentation—all working together as a cohesive platform.

Project Motivation

This showcase project was built to:

  • Work with real public data instead of synthetic datasets
  • Explore the full data lifecycle: ingestion → transformation → analysis → visualization → presentation
  • Experiment with Microsoft Fabric, Power BI, and modern cloud data tools
  • Apply time-series forecasting to actual enrollment patterns
  • Build an interactive web interface to present analytics findings

The Maryland child care data provided a rich, real-world dataset with interesting patterns—enrollment trends, provider capacity, geographic distribution—making it ideal for analytics experimentation.

Technical Implementation

The complete workflow demonstrates end-to-end data engineering and analytics capabilities:

1. Data Acquisition

  • Source: Public data from Maryland's Early Childhood Education portal
  • Extraction: Python scripts to download and parse data files
  • Storage: Raw data loaded into Azure Data Lake for processing

2. Data Engineering Pipeline

  • ETL Processing: Azure Data Factory and Microsoft Fabric for data transformation
  • Data Cleaning: Handling missing values, standardizing formats, data validation
  • Data Modeling: Dimensional modeling for efficient querying and analysis
  • Lakehouse Architecture: Organized storage for both raw and processed data

3. Analytics & Machine Learning

  • Time-Series Forecasting: Python models (Prophet, statsmodels) for enrollment predictions
  • Trend Analysis: Identifying seasonal patterns and long-term trends
  • Statistical Analysis: Provider capacity metrics and utilization rates
  • Model Validation: Cross-validation to ensure forecast reliability

4. Visualization Layer

  • Power BI Dashboards: Interactive reports with filtering and drill-down capabilities
  • Custom Visualizations: Charts tailored for enrollment data patterns
  • Geographic Mapping: Regional distribution and heat maps

5. Web Presentation

  • Blazor Web App: Interactive interface to present findings and visualizations
  • Embedded Power BI: Live dashboards integrated into the web platform
  • Azure Deployment: Hosted solution accessible online

Key Features Demonstrated

Data Engineering Pipeline

Automated ETL workflow from raw public data through transformation to analytics-ready datasets

Predictive Modeling

Machine learning forecasts showing 6-12 month enrollment projections with confidence intervals

Interactive Analytics

Power BI dashboards with drill-down capabilities for exploring enrollment patterns

Full-Stack Solution

Complete platform from data ingestion to web presentation, showcasing end-to-end capabilities

Technical Stack Explored

This project provided hands-on experience with:

  • Cloud Data Platform: Microsoft Fabric for unified data engineering
  • Orchestration: Azure Data Factory for automated ETL pipelines
  • Analytics: Python (Pandas, NumPy) for data manipulation and analysis
  • Machine Learning: Prophet and statsmodels for time-series forecasting
  • Visualization: Power BI for business intelligence dashboards
  • Web Development: Blazor/.NET 8 for interactive presentation layer
  • Cloud Infrastructure: Azure services for hosting and deployment

Data Source

All data used in this project is publicly available from Maryland's Early Childhood Education Data Portal. This personal research project is not affiliated with or endorsed by the Maryland State Department of Education.