Overview
This is a personal research project created to explore modern data engineering, analytics, and visualization techniques using real-world public data. The project uses child care enrollment data from Maryland's Early Childhood Education portal (earlychildhood.marylandpublicschools.org/data) to build a complete end-to-end analytics solution.
The goal was to demonstrate practical skills in data extraction, ETL processing, machine learning forecasting, Power BI dashboard development, and interactive web presentation—all working together as a cohesive platform.
Project Motivation
This showcase project was built to:
- Work with real public data instead of synthetic datasets
- Explore the full data lifecycle: ingestion → transformation → analysis → visualization → presentation
- Experiment with Microsoft Fabric, Power BI, and modern cloud data tools
- Apply time-series forecasting to actual enrollment patterns
- Build an interactive web interface to present analytics findings
The Maryland child care data provided a rich, real-world dataset with interesting patterns—enrollment trends, provider capacity, geographic distribution—making it ideal for analytics experimentation.
Technical Implementation
The complete workflow demonstrates end-to-end data engineering and analytics capabilities:
1. Data Acquisition
- Source: Public data from Maryland's Early Childhood Education portal
- Extraction: Python scripts to download and parse data files
- Storage: Raw data loaded into Azure Data Lake for processing
2. Data Engineering Pipeline
- ETL Processing: Azure Data Factory and Microsoft Fabric for data transformation
- Data Cleaning: Handling missing values, standardizing formats, data validation
- Data Modeling: Dimensional modeling for efficient querying and analysis
- Lakehouse Architecture: Organized storage for both raw and processed data
3. Analytics & Machine Learning
- Time-Series Forecasting: Python models (Prophet, statsmodels) for enrollment predictions
- Trend Analysis: Identifying seasonal patterns and long-term trends
- Statistical Analysis: Provider capacity metrics and utilization rates
- Model Validation: Cross-validation to ensure forecast reliability
4. Visualization Layer
- Power BI Dashboards: Interactive reports with filtering and drill-down capabilities
- Custom Visualizations: Charts tailored for enrollment data patterns
- Geographic Mapping: Regional distribution and heat maps
5. Web Presentation
- Blazor Web App: Interactive interface to present findings and visualizations
- Embedded Power BI: Live dashboards integrated into the web platform
- Azure Deployment: Hosted solution accessible online
Key Features Demonstrated
Data Engineering Pipeline
Automated ETL workflow from raw public data through transformation to analytics-ready datasets
Predictive Modeling
Machine learning forecasts showing 6-12 month enrollment projections with confidence intervals
Interactive Analytics
Power BI dashboards with drill-down capabilities for exploring enrollment patterns
Full-Stack Solution
Complete platform from data ingestion to web presentation, showcasing end-to-end capabilities
Technical Stack Explored
This project provided hands-on experience with:
- Cloud Data Platform: Microsoft Fabric for unified data engineering
- Orchestration: Azure Data Factory for automated ETL pipelines
- Analytics: Python (Pandas, NumPy) for data manipulation and analysis
- Machine Learning: Prophet and statsmodels for time-series forecasting
- Visualization: Power BI for business intelligence dashboards
- Web Development: Blazor/.NET 8 for interactive presentation layer
- Cloud Infrastructure: Azure services for hosting and deployment
Data Source
All data used in this project is publicly available from Maryland's Early Childhood Education Data Portal. This personal research project is not affiliated with or endorsed by the Maryland State Department of Education.