ABOUT

My journey in Data Engineering and GenAI

Data Engineer | Databricks | Azure | Snowflake | dbt | GenAI Engineer

My journey into Data Engineering started with a deep curiosity about how large-scale systems process and transform massive amounts of data efficiently. What began as learning Python and SQL gradually evolved into building real-world data pipelines and working with enterprise-grade data platforms. Currently working as a Data Engineer at Cognizant, I design and develop scalable ETL pipelines and data ingestion frameworks using Databricks and PySpark, enabling efficient processing of healthcare data at scale. I have hands-on experience working across modern data architectures involving Snowflake, dbt, Azure Data Lake Storage Gen2 (ADLS Gen2), and AWS S3, helping build reliable and optimized data platforms. I specialize in building distributed data processing systems using Apache Spark, implementing medallion architecture (Bronze, Silver, Gold), optimizing ETL performance, and ensuring data quality through automated validation frameworks. Alongside Data Engineering, I am actively exploring Generative AI applications, building RAG-based intelligent systems using LangChain and vector databases to automate data workflows and improve engineering productivity. My goal is to design intelligent, scalable, and future-ready data platforms that combine the power of distributed computing, cloud technologies, and AI-driven automation.

DATA ENGINEERING SKILLS

Apache Spark & Distributed Processing 90%
PySpark & Databricks 90%
Snowflake & dbt Transformations 85%
Azure Data Lake (ADLS Gen2) & AWS S3 85%
Python for Data Engineering & Automation 95%
SQL & Data Modeling 90%
ETL Pipeline Design & Optimization 90%
GenAI, LangChain & RAG Applications 85%

TOOLS & PLATFORMS

  • Databricks & Delta Lake
  • Snowflake Data Cloud
  • dbt (Data Build Tool)
  • Azure Data Lake Storage Gen2
  • AWS S3 Storage
  • Azure Data Factory
  • Git, GitHub & Version Control
  • Docker & Containerization
  • VS Code & PyCharm
  • Power BI & Data Visualization
  • Postman & API Testing
  • Jira & Agile Development

INTERESTS & FOCUS AREAS

Building Scalable Data Platforms

Cloud Data Engineering (Azure & AWS)

Distributed Computing & Spark Optimization

Generative AI & Intelligent Data Systems

ETL Optimization & Data Automation

RESUME

Check My Resume

Download Resume

SUMMARY

PROFILE SUMMARY

Data Engineer with hands-on experience in designing scalable data pipelines and optimizing large-scale ETL workflows using Databricks and PySpark. Proficient in Azure Data Platform, Delta Lake, and distributed data processing. Experienced in developing multi-threaded ingestion frameworks, automated data validation systems, and performance optimization solutions that reduced processing time by over 80%. Currently exploring Generative AI applications including RAG-based chatbots and AI-powered SQL optimization frameworks using LangChain and LLM integration.

EDUCATION

B.TECH - ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

2020 - 2024

R.M.K College of Engineering and Technology

Completed my B.TECH. Artificial Intelligence and Data Science in R.M.K. College of Engineering and Technology with the CGPA of 9.2.

AISSCE

2019 - 2020

G. R. T. M. Vivekananda Vidyalaya

Cleared my CBSE class XII board examination with 82.2%.

AISSE

2017 - 2018

G. R. T. M. Vivekananda Vidyalaya

Cleared my CBSE class X board examination with 78.6%.

PROFESSIONAL EXPERIENCES

Programmer Analyst, Cognizant

Molina Healthcare

Jan 2025 – Present
  • Designed and implemented a high-performance multi-threaded data ingestion framework in Python, reducing ingestion time by over 80%.
  • Built scalable ETL pipelines using Databricks and PySpark to process large-scale healthcare datasets.
  • Developed automated data validation and QA frameworks ensuring schema accuracy and data integrity.
  • Contributed to building curated analytics layers using Delta Lake architecture.
  • Worked in Agile environment collaborating with cross-functional teams including QA, DevOps, and Data Architects.

GainWell Technologies

Aug 2024 – Dec 2024
  • Developed automation tools in Python to streamline job configuration processes in Databricks, improving consistency and reducing manual effort.
  • Built a data validation framework using PySpark to ensure schema and record-level accuracy during one-time data loads, enhancing data integrity across environments.
  • Participated in the development of data ingestion pipelines to bring raw data from diverse sources into Delta Lake, contributing to the foundation of scalable data infrastructure.

Programmer Analyst - Intern, Cognizant

May 2024 – Jul 2024
  • Completed intensive training in Artificial Intelligence, Machine Learning, ETL, PySpark, SQL, Power BI, and Tableau under the Cognizant GenC program.
  • Gained hands-on experience with real-world data processing techniques and data pipeline development using PySpark and SQL for large-scale data handling.
  • Developed and deployed a machine learning prediction model as part of the final project to classify/predict health outcomes (e.g., diabetes prediction), demonstrating skills in data preprocessing, model training, and evaluation.

ACHIEVEMENTS

Excellence Award - GainWell

Cognizant

Doing The Right Thing

Cognizant

350+ Problems Solved

LeetCode

CERTIFICATES

MY CERTIFICATES

Projects

Enterprise Data Engineering & Generative AI Solutions

AI-Powered PRD & BRD Generation Agent

Developed an intelligent AI agent that automatically generates structured Product Requirements Documents (PRDs) and Business Requirements Documents (BRDs) from natural language inputs. Leveraging modular agents, prompt engineering, and LLM orchestration, the system produces clear, professional documentation including functional requirements, user stories, risks, and acceptance criteria, significantly reducing manual effort and improving documentation quality.

Python LLM Prompt Engineering Agent Architecture GenAI

AI-Powered SQL Data Lineage Generator

Built an intelligent system that analyzes SQL Server stored procedures to extract table-level and column-level lineage using custom SQL parsers and LLM agents. The solution generates clear, human-readable lineage documentation and provides a web interface for visualization and analysis.

Python LLM SQL Server Prompt Engineering Docker

AI-Based Apache Spark Performance Optimizer

Developed an AI-powered performance optimization tool that analyzes PySpark jobs using static code analysis and runtime metrics. The system leverages LLMs to generate intelligent recommendations for improving Spark performance, reducing execution time, and optimizing resource utilization.

PySpark Python LLM Performance Tuning Web Dashboard

Snowflake dbt Analytics Engineering Project

Built a production-grade analytics pipeline using dbt and Snowflake to transform raw e-commerce data into clean, analytics-ready models. Implemented staging, intermediate, and mart layers with data quality tests, snapshots, and reusable macros to enable scalable and reliable business reporting.

dbt Snowflake SQL Data Modeling Analytics Engineering

Tuition Management Portal

Developed a full-stack web application to manage student records, attendance tracking, and payment management. Includes dynamic UI, automated workflows, and centralized academic and financial tracking.

HTML CSS JavaScript Web App

Curator Chatbot (RAG-based Document Assistant)

Built a Retrieval-Augmented Generation chatbot using LangChain, FAISS, and LLMs to enable intelligent querying of documents and structured data, delivering accurate and context-aware responses.

LangChain FAISS LLM Python

Contact

Contact Me

MY ADDRESS

NO:- 1/126 , Phase -II , Manali New Town , Chennai -600103

SOCIAL PROFILES

EMALI ME

sridharmasthan@gmail.com

CALL ME

+91 8248233162

© 2026 SRIDHAR | Data Engineer