Prudhvi Raj

Hi, I'm a

About Me

I'm a Data Engineer with a Master's in Computer Science from Pace University. I build scalable data pipelines, ETL workflows, and analytics platforms that power data-driven decisions. My expertise includes Python, SQL, Airflow, PySpark, AWS (S3, Lambda, Redshift), GCP (BigQuery), and Docker. I've processed 120K+ records, reduced query runtimes by 40%, and built end-to-end data systems from ingestion to visualization. I focus on building reliable, efficient, and production-ready data infrastructure.

Prudhvi Raj

Fun Fact

My superpower? Turning chaotic data into clean, production-ready pipelines. With Python in one hand and SQL in the other, I build ETL workflows that transform messy datasets into analytics-ready tables. Whether it's optimizing queries, orchestrating Airflow DAGs, or containerizing pipelines with Docker, I love making data systems that just work—reliably, efficiently, and at scale.

  • Email: prudhvir1509@gmail.com
  • University: Pace University
  • Location: New York, USA
  • Degree: MS, Computer Science
  • Age: 22
  • Work Auth: F1 OPT
  • Strength: Building reliable, scalable data pipelines with strong focus on data quality and system optimization.
  • Weakness: Tendency to over-engineer solutions when simple approaches work—actively learning to balance robustness with pragmatism.

Interests

ETL/ELT Pipelines

Workflow Orchestration

Cloud Data Platforms

Data Warehousing

Data Quality & Validation

Dimensional Modeling

Batch Processing

Analytics Engineering

Data Pipeline Automation

ML Pipeline Engineering

GenAI Integration

Business Intelligence

Skills

Languages

Python

SQL

Git

PySpark

Data Engineering

Airflow

ETL/ELT

Data Warehousing

Dimensional Modeling

Data Quality

Batch Processing

Cloud & Infrastructure

AWS S3

AWS Lambda

Redshift

BigQuery

Docker

CI/CD

Databases

PostgreSQL

MySQL

Redshift

BigQuery

Data Analysis & Libraries

Pandas

NumPy

Matplotlib

Seaborn

BI Tools & APIs

Tableau

Power BI

REST APIs

FastAPI

ML Engineering (Supporting)

Scikit-learn

XGBoost

LSTM

SHAP/LIME

CERTIFICATIONS

AWS Educate Introduction to Generative AI
AWS
Building LLM Applications With Prompt Engineering
Nvidia
AWS Educate Machine Learning Foundations
AWS
Machine Learning
COLUMBIA+
Fundamentals of Data Analytics
NASSCOM
Microsoft Certified: Azure Fundamentals
Microsoft
Graph Data Science Certification
Neo4j
AWS Academy Graduate - Cloud Architecting
AWS
Oracle Cloud Infrastructure 2025 Certified Data Science Professional
Oracle
Oracle Cloud Infrastructure 2025 Certified Generative AI Professional
Oracle

Projects

GenAI
NLP
LLM
Muffin - GenAI Assistant

For Churn Explainability

Tech Stack
  • Gemini 2.0 Flash
  • SentenceTransformers
  • Streamlit
  • Python
  • NLP

AI chatbot that explains customer churn risk in plain language using advanced NLP.

Data Science
Analytics
ML
Churn & LTV Intelligence

Full-Stack Analytics Platform

Tech Stack
  • SQL
  • Python
  • MySQL
  • Streamlit
  • ETL

Full-stack platform predicting customer churn and lifetime value with automated ETL pipelines.

ML
AWS
Fraud Detection
Fraud Detection Platform

Real-time ML on AWS

Tech Stack
  • XGBoost
  • PySpark
  • AWS SageMaker
  • MLflow
  • DynamoDB

Real-time AWS fraud detection system with 96% accuracy and <120ms response time.

ML
Forecasting
Regression
Real Estate Forecasting

ML Price Prediction App

Tech Stack
  • XGBoost
  • Random Forest
  • Streamlit
  • Python
  • Joblib

ML web app for Connecticut real estate price prediction with interactive user interface.

Data Science
Geospatial
Prediction
EV Prediction App

Washington State Analysis

Tech Stack
  • Streamlit
  • KeplerGL
  • Random Forest
  • Python
  • Pandas

Interactive EV analytics dashboard with geospatial mapping and 94% prediction accuracy.

Data Analysis
BI
Data Visualization
Job Market Dashboard

USA Hiring Trends Analysis

Tech Stack
  • Tableau
  • Data Analysis
  • Visualization
  • Statistics
  • Excel

Tableau dashboard analyzing 10K+ job postings with salary insights across data roles.

Education

Master of Science, Computer Science

Pace University, New York, USA
September 2023 - May 2025 GPA: 3.7/4.0
  • Champion, Pace University Intercollegiate Chess Tournament demonstrating strategic thinking and competitive excellence, outplaying 10 participants across elimination rounds.

Bachelor of Technology, Computer Science Engineering

KL University Hyderabad, Telangana, India
June 2019 - May 2023 CGPA: 8.55/10.0
  • Best Vice-President, Cybersecurity Club - organized 12+ hands-on workshops, grew membership by 35%, and forged industry partnerships for real-world security experience.

Experience

Data Analyst

Tech Mynds Inc
United States · Hybrid September 2025 – Present
  • Built and optimized Python-SQL pipelines integrating multiple data sources for clean, analysis-ready datasets.
  • Created executive dashboards and reports using Tableau and Excel to support data-driven decisions.
  • Key Project: Leadership Reporting Data Pipeline consolidating 5+ sources into analytics-ready tables.

Data Engineering Intern

Urpan Technologies Inc
Remote, USA September 2024 – July 2025
  • Developed Python-SQL transformation layers standardizing data from multiple sources for analytics-ready tables.
  • Optimized GenAI assistant pipelines and caching for faster response times.
  • Key Project: GenAI Data Assistant Pipeline integrating business data with embeddings and caching to reduce redundant API calls.

Data Engineering & Analytics

Vimtra Ventures Pvt. Ltd
Chennai, India · On-site December 2021 – July 2023
  • Built Python-SQL ETL pipelines processing 120K+ real estate assets, replacing manual spreadsheet workflows.
  • Developed NLP extraction pipelines converting unstructured investment reports into structured data.
  • Key Project: Investment Analytics Data Platform with star-schema warehouse simplifying analytics queries.