Prudhvi Raj

Hi, I'm a

About Me

I'm a Data Engineer with a Master's in Computer Science from Pace University (3.7 GPA). I build scalable data systems and backend infrastructure using Python, SQL, and distributed computing. My experience spans ETL pipeline development, data warehouse design, and ML infrastructure—processing 120K+ records, reducing processing time by 40%, and building production systems from scratch. I specialize in Python (Pandas, PySpark), SQL (PostgreSQL, MySQL, BigQuery), Airflow, AWS, and Docker. Currently seeking full-time Data Engineer or Software Engineer roles where I can build reliable, scalable systems.

Prudhvi Raj

What Drives Me

I love solving complex problems with code. Whether it's building data pipelines that process 120K+ records, optimizing SQL queries to run 10x faster, or designing systems that eliminate hours of manual work, I'm driven by creating software that has real impact. My approach: write clean, maintainable code, obsess over data quality, and always think about scalability. I believe the best systems are the ones that run reliably, fail gracefully, and make someone's job easier.

  • Email: prudhvir1509@gmail.com
  • University: Pace University
  • Location: New York, USA
  • Degree: MS, Computer Science
  • Age: 22
  • Work Auth: F1 OPT
  • Seeking: Full-time Data Engineer or Software Engineer roles
  • Available: Immediate start (F1 OPT valid through 2028)

Interests

ETL/ELT Pipelines

Workflow Orchestration

Cloud Data Platforms

Data Warehousing

Data Quality & Validation

Dimensional Modeling

Batch Processing

Analytics Engineering

Data Pipeline Automation

ML Pipeline Engineering

GenAI Integration

Business Intelligence

Skills

Languages

Python

SQL

Git

PySpark

Data Engineering

Airflow

ETL/ELT

Data Warehousing

Dimensional Modeling

Data Quality

Batch Processing

Cloud & Infrastructure

AWS S3

AWS Lambda

Redshift

BigQuery

Docker

CI/CD

Databases

PostgreSQL

MySQL

Redshift

BigQuery

Data Analysis & Libraries

Pandas

NumPy

Matplotlib

Seaborn

BI Tools & APIs

Tableau

Power BI

REST APIs

FastAPI

ML Engineering (Supporting)

Scikit-learn

XGBoost

LSTM

SHAP/LIME

CERTIFICATIONS

AWS Educate Introduction to Generative AI
AWS
Building LLM Applications With Prompt Engineering
Nvidia
AWS Educate Machine Learning Foundations
AWS
Machine Learning
COLUMBIA+
Fundamentals of Data Analytics
NASSCOM
Microsoft Certified: Azure Fundamentals
Microsoft
Graph Data Science Certification
Neo4j
AWS Academy Graduate - Cloud Architecting
AWS
Oracle Cloud Infrastructure 2025 Certified Data Science Professional
Oracle
Oracle Cloud Infrastructure 2025 Certified Generative AI Professional
Oracle

Projects

GenAI
NLP
LLM
Muffin - GenAI Assistant

For Churn Explainability

Tech Stack
  • Gemini 2.0 Flash
  • SentenceTransformers
  • Streamlit
  • Python
  • NLP

AI chatbot that explains customer churn risk in plain language using advanced NLP.

Data Science
Analytics
ML
Churn & LTV Intelligence

Full-Stack Analytics Platform

Tech Stack
  • SQL
  • Python
  • MySQL
  • Streamlit
  • ETL

Full-stack platform predicting customer churn and lifetime value with automated ETL pipelines.

ML
AWS
Fraud Detection
Fraud Detection Platform

Real-time ML on AWS

Tech Stack
  • XGBoost
  • PySpark
  • AWS SageMaker
  • MLflow
  • DynamoDB

Real-time AWS fraud detection system with 96% accuracy and <120ms response time.

ML
Forecasting
Regression
Real Estate Forecasting

ML Price Prediction App

Tech Stack
  • XGBoost
  • Random Forest
  • Streamlit
  • Python
  • Joblib

ML web app for Connecticut real estate price prediction with interactive user interface.

Data Science
Geospatial
Prediction
EV Prediction App

Washington State Analysis

Tech Stack
  • Streamlit
  • KeplerGL
  • Random Forest
  • Python
  • Pandas

Interactive EV analytics dashboard with geospatial mapping and 94% prediction accuracy.

Data Analysis
BI
Data Visualization
Job Market Dashboard

USA Hiring Trends Analysis

Tech Stack
  • Tableau
  • Data Analysis
  • Visualization
  • Statistics
  • Excel

Tableau dashboard analyzing 10K+ job postings with salary insights across data roles.

Education

Master of Science, Computer Science

Pace University, New York, USA
September 2023 - May 2025 GPA: 3.7/4.0
  • Champion, Pace University Intercollegiate Chess Tournament demonstrating strategic thinking and competitive excellence, outplaying 10 participants across elimination rounds.

Bachelor of Technology, Computer Science Engineering

KL University Hyderabad, Telangana, India
June 2019 - May 2023 CGPA: 8.55/10.0
  • Best Vice-President, Cybersecurity Club - organized 12+ hands-on workshops, grew membership by 35%, and forged industry partnerships for real-world security experience.

Experience

Data Analyst

Tech Mynds Inc
United States · Hybrid September 2025 – Present
  • Reduced monthly report generation from 6 hours to 45 minutes by building automated SQL infrastructure consolidating 3 databases (PostgreSQL, MySQL, REST APIs)
  • Implemented automated data quality framework with 4 validation checks that caught upstream issues before reaching 12 production dashboards
  • Refactored 15+ queries into reusable modules, improving team query efficiency by 30%

Data Engineer

Urpan Technologies Inc
Remote, USA September 2024 – July 2025
  • Reduced GenAI assistant latency from 1.2s to 800ms by implementing PostgreSQL caching layer, cutting API calls by 40%
  • Accelerated ML model training by 20% through vectorized Pandas operations on 50K+ customer records
  • Built Python transformation scripts standardizing data from 4 sources (PostgreSQL, CSV, REST APIs) into consistent schema

Data and Analytics Engineering

Vimtra Ventures Pvt. Ltd
Chennai, India · On-site December 2021 – July 2023
  • Built automated ETL pipeline with PySpark processing 120K+ property records daily, eliminating 10-12 hours/week of manual work
  • Designed star-schema data warehouse supporting $2M+ investment analysis, simplifying queries from 6+ joins to single-table selects
  • Reduced data quality issues by 85% through automated validation framework in ETL workflows