
Data Engineer with 6+ years of experience building scalable cloud-based ETL pipelines, real-time data streams, and analytics workflows. Delivered high-impact data solutions at Fortune 500 companies like Google and CVS using Python, SQL, Apache Spark, PowerBI and GCP.

June 2025 - Present
Developing core components for CVS Health’s enterprise data warehouse and reporting ecosystem. Built and maintained scalable ETL workflows using SQL and SAS to process millions of healthcare records. Integrated Oracle-based financial and clinical data into reporting pipelines. Collaborated with cross-functional teams to ensure HIPAA compliance, accurate reporting, and performance guarantees for clients including City of Philadelphia and FMC.
Sep 2024 - June 2025
Built scalable real-time and batch data pipelines using Apache Spark and Databricks for ingesting high-volume structured and semi-structured data. Developed GCP-native workflows with Dataflow, Cloud Composer, and BigQuery for operational reporting and self-service analytics. Automated data quality checks and collaborated with cross-functional teams to deliver reliable data products for BI and ML use cases.

Jan 2024 - May 2024
Architected high-performance ETL pipelines using Apache Spark to process 1TB+ of data monthly, improving query execution efficiency by 40%. Automated data ingestion with Airflow & Kafka, reducing manual intervention by 60%. Developed a scalable data pipeline for aviation data management across 10+ bases, ensuring high data integrity and operational efficiency. Designed Power BI, AWS QuickSight, and Tableau dashboards, reducing manual reporting by 50% and streamlining business reporting by 40%.

Jan 2024 - May 2024
Implemented and productionized a Student Retention Model, significantly enhancing retention rates. Utilized R and Tableau to analyze and visualize organizational event data, increasing attendance by 25%. Generated semester reports to optimize event planning and identify key contributors.

Jan 2022 - Dec 2022
Created real-time data pipelines for security analytics, integrating Okta SSO, AWS IAM, and Azure AD. Automated user provisioning workflows using Terraform and Jenkins, reducing processing time by 50% while ensuring data consistency.

Nov 2019 - Jan 2022
Refined ETL pipelines with SQL & Airflow, cutting financial report generation time by 50%. Developed ML pipelines for Google’s Core ML team using Kafka, BigQuery, and GCP, improving real-time data tracking by 30%. Built an NLP chatbot using SpaCy and TF-IDF, increasing internal data query response accuracy by 40%.

Nov 2018 - Apr 2019
Developed and maintained web applications using HTML, CSS, JavaScript, and Bootstrap. Improved page load speed by 30% through frontend optimization. Enhanced backend functionality with efficient MySQL database management and query optimization.

Jun 2018 - Jul 2018
Conducted research on morphological operations for digital image preprocessing using MATLAB and Python. Enhanced image datasets for defense applications with dilation and erosion techniques. Optimized algorithms to eliminate structural flaws and reduce noise, improving image quality by 35%.

Designed an end-to-end data pipeline using AWS Glue, S3, Athena, and QuickSight for scalable data analytics and visualization.

Performed end-to-end sales analysis on Walmart’s transaction data using PostgreSQL. Analyzed revenue trends, customer demographics, and product performance to optimize business strategies.

Built an end-to-end data analytics pipeline for retail order analysis, integrating SQL, Python, and Pandas for data transformation, visualization, and reporting.

Implemented big data analysis using PySpark on flight delay data. Optimized query execution, built predictive models, and enhanced data processing efficiency.

Master of Science - MS, Data Science
Jan 2023 - May 2024
Grade: 4/4

Bachelor of Technology - BTech, Electronics and Communications Engineering
2015 - 2019
Grade: 7.5/10
Issued by Databricks · May 2023
Credential ID: 73624532
View CredentialsIssued by Salesforce · May 2023
Credential ID: 3369154
Issued by Maryville University · April 2024
Recognized for outstanding efforts in promoting diversity and inclusion through impactful contributions and leadership initiatives.
Issued by Maryville University · April 2024
Nominated as a delegate to represent Maryville University at the International Education Day in Jefferson City, Missouri, recognizing outstanding achievements and contributions to global awareness and cultural exchange.
Issued by BeyondID · September 2022
Recognized for building strong client relationships and delivering exceptional service, earning the 'Raving Fan' title for exceeding client expectations.
Issued by GlobalLogic · August 2021
Consistently delivered exceptional performance, earning recognition for outstanding productivity and teamwork excellence.
Issued by EDURANET INTELLECTUAL OLYMPIAD FOUNDATION · January 2012
Recognized for exceptional critical thinking and creative skills at the zonal level.
Tiffany praised Sai’s exceptional data visualization and analysis skills, emphasizing his ability to produce impactful reports.
View LinkedIn ProfileCameron recognized Sai’s professionalism, collaborative spirit, and impactful contributions to diversity programs at Maryville University.
View LinkedIn ProfileDavid applauded Sai’s dedication to learning, collaborative contributions to group projects, and active involvement in student programs.
View LinkedIn ProfileShree emphasized Sai’s proficiency in Python and data science concepts, commending his exceptional class participation.
View LinkedIn ProfileSneh highlighted Sai’s analytical mindset, technical expertise, and collaborative communication during their work together at BeyondID.
View LinkedIn ProfileRia commended Sai for his positive energy, quick learning, and reliability as her go-to person for new projects during his tenure.
View LinkedIn ProfileDallas, Texas, USA
+1 (314) 516-3740