Hi, I'm Himanshu Wagh
I'm a passionate and self-driven Data Science professional currently pursuing my Master's in Data Science at Michigan Technological University. With an undergraduate degree in Engineering and a self-taught expertise in AI, Machine Learning, and Python, I've built a strong foundation in solving complex problems through data-driven solutions.
Now, as a graduate student, I'm deepening my expertise through research assistantships at Michigan Tech, where I contribute to cutting-edge projects, develop advanced models, and collaborate with experienced professors. My goal is to apply my skills in Data Science, AI, and Machine Learning to solve real-world challenges and create meaningful innovations.
I'm actively seeking opportunities to contribute to Data Science, AI, and ML — whether in industry or research. Let's connect and explore how we can work together to push the boundaries of what's possible with data!.
I'm best reached via email. I'm always open to interesting conversations and collaboration.
Recent Publications
DyGAF: Dynamic Graph Attention Framework for COVID-19 Biomarker Identification
Bioinformatics and Biology Insights (2024)
- Developed an attention-based neural model for biomarker detection and COVID-19 diagnostics using gene expression data.
- Achieved 94.23% classification accuracy by combining deep learning with traditional ML feature selection methods.
- Integrated dual-model attention mechanisms to rank genes by diagnostic relevance, outperforming standard statistical approaches.
- Conducted KEGG, WikiPathways, and Gene Ontology analyses to validate gene significance in COVID-19 pathogenesis.
- Benchmarked against traditional models (e.g., DE analysis, Random Forest) and demonstrated superior performance.
Made the model and codebase publicly available: GitHub Repository - DyGAF
Professional Experience
Graduate Research Assistant
Michigan Technological University | Aug 2024 – Feb 2025
- Designed and implemented machine learning models to analyze complex genomic data, achieving 93% accuracy in predicting genetic interactions
- Preprocessed large-scale biomedical datasets from NCBI, applying techniques like TMM normalization and advanced feature engineering
- Collaborated with faculty and graduate teams to deliver scalable ML workflows for healthcare research
Graduate Research Assistant
- Developed DyGAF, an attention-based neural model for COVID-19 biomarker ranking — published in Bioinformatics and Biology Insights (2024)
- Applied Random Forest and dimensionality reduction to extract key genetic signals from RNA-seq data
- Led data cleaning and transformation pipelines, improving research reproducibility and ML model stability
Software Engineer
- Built an NLP pipeline for real-time receipt data extraction using Named Entity Recognition (NER), improving processing speed by 30%
- Contributed to backend architecture in Python and C++ for scalable document analysis
- Integrated AI models into production systems in collaboration with DevOps and product teams
Latest Learnings
This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related products. It is covers the full training stack of how the models are developed, along with mental models.
George Hotz: Comma.ai, OpenPilot, and Autonomous Vehicles | Lex Fridman Podcast #31
This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic knowledge of Python and a vague recollection of calculus from high school.