// Data Scientist & ML Engineer

Wasiq
Bakhsh

MS Data Science @ University at Buffalo. I build end-to-end ML systems, intelligent analytics pipelines, and data-driven products — from raw data to deployed models.

wasiq.json
{ "name": "Wasiq Bakhsh", "role": "Data Scientist", "location": "Buffalo, NY", "education": "MS @ UB (2027)", "experience": 3 years, "stack": [ "Python", "SQL", "TensorFlow", "Scikit-learn", "Tableau", "Docker", "AWS", "MLflow" ], "open_to": "internships", "status": "available" }
1M+
Transactions Analyzed
600K+
News Headlines Processed
90%+
Model Accuracy Achieved
30%
Avg. Efficiency Gains

Where I've worked

Jan 2026 — Present
University at Buffalo
Data Graduate Assistant
  • Built reusable ingestion & validation scripts (Python/SQL) standardizing data for 500+ international students; enforced schema checks, deduplication, and null thresholds.
  • Designed Tableau dashboards tracking retention, engagement, and service KPIs; reduced manual effort by 30%.
  • Prototyped predictive risk-flagging features using Python, Pandas, and Scikit-learn for student cohort analysis.
Jul 2024 — Jul 2025
Bank of Khyber
Data Science Intern
  • Developed fraud detection models (scikit-learn, TensorFlow) improving detection accuracy by 30%; calibrated using Precision, Recall, and ROC-AUC.
  • Engineered features from 1M+ transactions (merchant risk, velocity, amount deviations); built ETL pipelines across SQL and AWS S3.
  • Reduced false positives by 20% via class-imbalance handling; automated dashboards cutting fraud response time by 15%.
Feb 2024 — Oct 2024
House of Brands Media
Data Scientist (Freelance)
  • Built marketing KPI datasets and A/B testing frameworks; improved lead generation by 25% and drove 30% revenue growth.
  • Delivered Salesforce-to-Tableau franchise analytics; automated data pipelines reducing latency by 50%.
Dec 2023 — May 2024
Alkhidmat Hospital
Data Scientist Intern
  • Developed ML models predicting patient readmission with 90%+ accuracy using Python, Scikit-learn, and TensorFlow.
  • Improved data accuracy by 20%, reduced reporting time by 30%, and saved staff 10 hrs/week through EHR workflow optimization.

Things I've built

🚀
End-to-End ML Deployment Pipeline

Complete ML system from data ingestion to cloud deployment. Ran 16 classification experiments tracked via MLflow/DagsHub, served best model via FastAPI + Streamlit, fully Dockerized and deployed on Render.

PythonFastAPIDockerMLflowDagsHubRender
View on GitHub →
📈
Financial Sentiment & Stock Movement Prediction

Scalable pipeline aligning 600K+ financial news headlines with 7,000+ U.S. stocks. Extracted sentiment via TextBlob & FinBERT, engineered temporal features, applied SHAP explainability across XGBoost and LSTM models.

FinBERTXGBoostLSTMSHAPNLP
View on GitHub →
🤖
Multilingual Real-Time AI Chatbot

Led a 4-member team building a bilingual AI conversational agent for e-learning. Integrated OpenAI Whisper (STT), Meta AI NLP models for dialogue, and ElevenLabs TTS — with real-time low-latency processing.

WhisperTensorFlowElevenLabsNLPLLMOps
View on GitHub →
🧠
EEG Seizure Prediction

Intelligent classification system predicting epileptic seizures from 150×1,025 high-dimensional EEG data. Applied dimensionality reduction, normalization, and tuned SVM/RF/Logistic Regression — achieving 78% accuracy.

Scikit-learnEEGSVMFeature Engineering
View on GitHub →
💬
YouTube Comment Sentiment & Emotion Analysis

End-to-end NLP pipeline scraping comments via YouTube Data API, applying VADER sentiment polarity detection and NRC Emotion Lexicon tagging. Visualized emotion distributions across millions of comments.

NLTKVADERPythonMatplotlibYouTube API
View on GitHub →
🗳️
Pakistan 2024 Elections Analytics Dashboard

Interactive Tableau dashboard analyzing national election results via Dawn API. Visualized voter turnout, candidate performance, regional vote share, and demographic breakdowns across all districts.

TableauPythonETLAPIData Viz
View Dashboard →

What I work with

⚙️   Languages & Querying
Python SQL R Bash MySQL PostgreSQL
🤖   ML & Modeling
Scikit-learn TensorFlow XGBoost PyTorch LSTM SHAP Optuna SMOTE
🧬   NLP & AI
FinBERT OpenAI Whisper NLTK VADER Generative AI LLMOps ElevenLabs
🚢   MLOps & Deployment
Docker FastAPI MLflow DagsHub Streamlit Render Git / GitHub
☁️   Cloud & Data Engineering
AWS S3 GCP Hadoop Spark ETL Pipelines Pandas NumPy
📊   Visualization & BI
Tableau Power BI Matplotlib Seaborn Excel

Certifications

🏅
Google Data Analytics Certificate
Google · Coursera
🏅
Fundamentals of Visualization with Tableau
Tableau · Coursera — With Honors
🏅
Data Analysis with Python
IBM · Coursera
🏅
Getting Started with Data Analytics on AWS
Amazon Web Services · Coursera

Let's connect

I'm actively looking for Data Science, Data Analysis, and Business Intelligence internships. If you have an opportunity or just want to talk data — reach out.

Send an Email →