About Me
I'm a Data Engineer with a Master's in Computer Science from Pace University. I build scalable data pipelines, ETL workflows, and analytics platforms that power data-driven decisions. My expertise includes Python, SQL, Airflow, PySpark, AWS (S3, Lambda, Redshift), GCP (BigQuery), and Docker. I've processed 120K+ records, reduced query runtimes by 40%, and built end-to-end data systems from ingestion to visualization. I focus on building reliable, efficient, and production-ready data infrastructure.
Fun Fact
My superpower? Turning chaotic data into clean, production-ready pipelines. With Python in one hand and SQL in the other, I build ETL workflows that transform messy datasets into analytics-ready tables. Whether it's optimizing queries, orchestrating Airflow DAGs, or containerizing pipelines with Docker, I love making data systems that just work—reliably, efficiently, and at scale.
- Email: prudhvir1509@gmail.com
- University: Pace University
- Location: New York, USA
- Degree: MS, Computer Science
- Age: 22
- Work Auth: F1 OPT
- Strength: Building reliable, scalable data pipelines with strong focus on data quality and system optimization.
- Weakness: Tendency to over-engineer solutions when simple approaches work—actively learning to balance robustness with pragmatism.
Interests
ETL/ELT Pipelines
Workflow Orchestration
Cloud Data Platforms
Data Warehousing
Data Quality & Validation
Dimensional Modeling
Batch Processing
Analytics Engineering
Data Pipeline Automation
ML Pipeline Engineering
GenAI Integration
Business Intelligence
Skills
Languages
Python
SQL
Git
PySpark
Data Engineering
Airflow
ETL/ELT
Data Warehousing
Dimensional Modeling
Data Quality
Batch Processing
Cloud & Infrastructure
AWS S3
AWS Lambda
Redshift
BigQuery
Docker
CI/CD
Databases
PostgreSQL
MySQL
Redshift
BigQuery
Data Analysis & Libraries
Pandas
NumPy
Matplotlib
Seaborn
BI Tools & APIs
Tableau
Power BI
REST APIs
FastAPI
ML Engineering (Supporting)
Scikit-learn
XGBoost
LSTM
SHAP/LIME
CERTIFICATIONS
Projects
Muffin - GenAI Assistant
For Churn Explainability
Tech Stack
- Gemini 2.0 Flash
- SentenceTransformers
- Streamlit
- Python
- NLP
AI chatbot that explains customer churn risk in plain language using advanced NLP.
Churn & LTV Intelligence
Full-Stack Analytics Platform
Tech Stack
- SQL
- Python
- MySQL
- Streamlit
- ETL
Full-stack platform predicting customer churn and lifetime value with automated ETL pipelines.
Fraud Detection Platform
Real-time ML on AWS
Tech Stack
- XGBoost
- PySpark
- AWS SageMaker
- MLflow
- DynamoDB
Real-time AWS fraud detection system with 96% accuracy and <120ms response time.
Real Estate Forecasting
ML Price Prediction App
Tech Stack
- XGBoost
- Random Forest
- Streamlit
- Python
- Joblib
ML web app for Connecticut real estate price prediction with interactive user interface.
EV Prediction App
Washington State Analysis
Tech Stack
- Streamlit
- KeplerGL
- Random Forest
- Python
- Pandas
Interactive EV analytics dashboard with geospatial mapping and 94% prediction accuracy.
Job Market Dashboard
USA Hiring Trends Analysis
Tech Stack
- Tableau
- Data Analysis
- Visualization
- Statistics
- Excel
Tableau dashboard analyzing 10K+ job postings with salary insights across data roles.
Education
Master of Science, Computer Science
Pace University, New York, USA
September 2023 - May 2025 GPA: 3.7/4.0- Champion, Pace University Intercollegiate Chess Tournament demonstrating strategic thinking and competitive excellence, outplaying 10 participants across elimination rounds.
Bachelor of Technology, Computer Science Engineering
KL University Hyderabad, Telangana, India
June 2019 - May 2023 CGPA: 8.55/10.0- Best Vice-President, Cybersecurity Club - organized 12+ hands-on workshops, grew membership by 35%, and forged industry partnerships for real-world security experience.
Experience
Data Analyst
Tech Mynds Inc
United States · Hybrid September 2025 – Present- Built and optimized Python-SQL pipelines integrating multiple data sources for clean, analysis-ready datasets.
- Created executive dashboards and reports using Tableau and Excel to support data-driven decisions.
- Key Project: Leadership Reporting Data Pipeline consolidating 5+ sources into analytics-ready tables.
Data Engineering Intern
Urpan Technologies Inc
Remote, USA September 2024 – July 2025- Developed Python-SQL transformation layers standardizing data from multiple sources for analytics-ready tables.
- Optimized GenAI assistant pipelines and caching for faster response times.
- Key Project: GenAI Data Assistant Pipeline integrating business data with embeddings and caching to reduce redundant API calls.
Data Engineering & Analytics
Vimtra Ventures Pvt. Ltd
Chennai, India · On-site December 2021 – July 2023- Built Python-SQL ETL pipelines processing 120K+ real estate assets, replacing manual spreadsheet workflows.
- Developed NLP extraction pipelines converting unstructured investment reports into structured data.
- Key Project: Investment Analytics Data Platform with star-schema warehouse simplifying analytics queries.