About Me
I'm a Data Engineer with a Master's in Computer Science from Pace University (3.7 GPA). I build scalable data systems and backend infrastructure using Python, SQL, and distributed computing. My experience spans ETL pipeline development, data warehouse design, and ML infrastructure—processing 120K+ records, reducing processing time by 40%, and building production systems from scratch. I specialize in Python (Pandas, PySpark), SQL (PostgreSQL, MySQL, BigQuery), Airflow, AWS, and Docker. Currently seeking full-time Data Engineer or Software Engineer roles where I can build reliable, scalable systems.
What Drives Me
I love solving complex problems with code. Whether it's building data pipelines that process 120K+ records, optimizing SQL queries to run 10x faster, or designing systems that eliminate hours of manual work, I'm driven by creating software that has real impact. My approach: write clean, maintainable code, obsess over data quality, and always think about scalability. I believe the best systems are the ones that run reliably, fail gracefully, and make someone's job easier.
- Email: prudhvir1509@gmail.com
- University: Pace University
- Location: New York, USA
- Degree: MS, Computer Science
- Age: 22
- Work Auth: F1 OPT
- Seeking: Full-time Data Engineer or Software Engineer roles
- Available: Immediate start (F1 OPT valid through 2028)
Interests
ETL/ELT Pipelines
Workflow Orchestration
Cloud Data Platforms
Data Warehousing
Data Quality & Validation
Dimensional Modeling
Batch Processing
Analytics Engineering
Data Pipeline Automation
ML Pipeline Engineering
GenAI Integration
Business Intelligence
Skills
Languages
Python
SQL
Git
PySpark
Data Engineering
Airflow
ETL/ELT
Data Warehousing
Dimensional Modeling
Data Quality
Batch Processing
Cloud & Infrastructure
AWS S3
AWS Lambda
Redshift
BigQuery
Docker
CI/CD
Databases
PostgreSQL
MySQL
Redshift
BigQuery
Data Analysis & Libraries
Pandas
NumPy
Matplotlib
Seaborn
BI Tools & APIs
Tableau
Power BI
REST APIs
FastAPI
ML Engineering (Supporting)
Scikit-learn
XGBoost
LSTM
SHAP/LIME
CERTIFICATIONS
Projects
Muffin - GenAI Assistant
For Churn Explainability
Tech Stack
- Gemini 2.0 Flash
- SentenceTransformers
- Streamlit
- Python
- NLP
AI chatbot that explains customer churn risk in plain language using advanced NLP.
Churn & LTV Intelligence
Full-Stack Analytics Platform
Tech Stack
- SQL
- Python
- MySQL
- Streamlit
- ETL
Full-stack platform predicting customer churn and lifetime value with automated ETL pipelines.
Fraud Detection Platform
Real-time ML on AWS
Tech Stack
- XGBoost
- PySpark
- AWS SageMaker
- MLflow
- DynamoDB
Real-time AWS fraud detection system with 96% accuracy and <120ms response time.
Real Estate Forecasting
ML Price Prediction App
Tech Stack
- XGBoost
- Random Forest
- Streamlit
- Python
- Joblib
ML web app for Connecticut real estate price prediction with interactive user interface.
EV Prediction App
Washington State Analysis
Tech Stack
- Streamlit
- KeplerGL
- Random Forest
- Python
- Pandas
Interactive EV analytics dashboard with geospatial mapping and 94% prediction accuracy.
Job Market Dashboard
USA Hiring Trends Analysis
Tech Stack
- Tableau
- Data Analysis
- Visualization
- Statistics
- Excel
Tableau dashboard analyzing 10K+ job postings with salary insights across data roles.
Education
Master of Science, Computer Science
Pace University, New York, USA
September 2023 - May 2025 GPA: 3.7/4.0- Champion, Pace University Intercollegiate Chess Tournament demonstrating strategic thinking and competitive excellence, outplaying 10 participants across elimination rounds.
Bachelor of Technology, Computer Science Engineering
KL University Hyderabad, Telangana, India
June 2019 - May 2023 CGPA: 8.55/10.0- Best Vice-President, Cybersecurity Club - organized 12+ hands-on workshops, grew membership by 35%, and forged industry partnerships for real-world security experience.
Experience
Data Analyst
Tech Mynds Inc
United States · Hybrid September 2025 – Present- Reduced monthly report generation from 6 hours to 45 minutes by building automated SQL infrastructure consolidating 3 databases (PostgreSQL, MySQL, REST APIs)
- Implemented automated data quality framework with 4 validation checks that caught upstream issues before reaching 12 production dashboards
- Refactored 15+ queries into reusable modules, improving team query efficiency by 30%
Data Engineer
Urpan Technologies Inc
Remote, USA September 2024 – July 2025- Reduced GenAI assistant latency from 1.2s to 800ms by implementing PostgreSQL caching layer, cutting API calls by 40%
- Accelerated ML model training by 20% through vectorized Pandas operations on 50K+ customer records
- Built Python transformation scripts standardizing data from 4 sources (PostgreSQL, CSV, REST APIs) into consistent schema
Data and Analytics Engineering
Vimtra Ventures Pvt. Ltd
Chennai, India · On-site December 2021 – July 2023- Built automated ETL pipeline with PySpark processing 120K+ property records daily, eliminating 10-12 hours/week of manual work
- Designed star-schema data warehouse supporting $2M+ investment analysis, simplifying queries from 6+ joins to single-table selects
- Reduced data quality issues by 85% through automated validation framework in ETL workflows