Diya Vij
Berkeley, CA · (925) 785-4040 · diyavij@berkeley.edu · LinkedIn
Education
University of California, Berkeley
Expected May 2026Bachelors, Applied Mathematics and Statistics; minor, Data Science
Relevant coursework: Data Structures & Algorithms, Bayesian Statistics & Machine Learning, Principles & Techniques of Data Science, Probability, Regression Analysis, Design of Experiments, Abstract Linear Algebra, Numerical Analysis, Abstract Algebra, Stochastic Processes, Time Series, Game Theory, Human Contexts & Ethics of Data
Skills
Languages: SQL, Python, R, MATLAB
Applications & databases: Power BI, Tableau, Excel, MySQL, SQLite, Git, Google Cloud, R Shiny, Azure Data Services
Machine learning & tools: Pandas, Matplotlib, TensorFlow, scikit-learn, SAS, React, Flask, FastAPI
Hackathons: Winner, BayHacks 2021; Best Use of Twilio, AthenaHacks (USC) 2023; LAHacks (UCLA); TreeHacks (Stanford)
Highlighted experience and projects
Data Science & Machine Learning Intern — Regeneron Pharmaceuticals, Tarrytown, NY
Summer 2025- Processed high-dimensional accelerometer time series from patients with neurological disorders for downstream modeling.
- Built a RAG application combining GPT-4 / LLMs with biomedical APIs (PubMed, ClinicalTrials.gov, UK Biobank) for literature review and cross-population analysis; semantic search and retrieval-augmented workflows to support digital biomarker hypotheses.
- Trained and evaluated ML models on large time series to predict disease progression (85% accuracy), supporting go/no-go decisions in clinical development.
- Presented in three cross-functional meetings; collaborated with medical directors and technical working groups. Selected to present at an internal symposium for 150+ faculty and postdocs.
UC Berkeley Physics 188 Group Final — Predicting sleep apnea (DREAMT)
Fall 2025- Developed a CNN–Transformer hybrid in TensorFlow on ~150 GB multimodal time-series sensor data, achieving 78% accuracy and 0.86 ROC-AUC.
- Engineered nine predictive features from raw ECG and respiratory signals (FFT, bandpass filtering, peak detection); trained four ML approaches (CNN, Transformer, MLP, Random Forest) with cross-validation.
Data Services Intern — University of California Information Technology
Spring–Fall 2024- Updated, maintained, and created Power BI reports for Facilities Financial Services across the UC system.
- Contributed to Power BI development and refactored SSIS ETL pipelines into Python, using masked data for testing within Administrative and Residential IT.
Data Science & Analytics Intern — 365Labs, Baton Rouge, LA
Summer 2023- Analyzed data in Power BI, queried Microsoft SQL Server, and presented findings to law enforcement stakeholders.
- Cleaned and documented data; built data dictionaries to improve accessibility across five-plus applications.
- Worked with the Chief Software Architect on integrating Microsoft Fabric and LLMs into the data science workflow.
Leadership & extracurricular activities
Marketing Lead & Vice President — Hackathons @ Berkeley
Aug 2024 – present- Helps run Cal Hacks (largest collegiate hackathon worldwide) and the UC Berkeley AI Hackathon.
- Manages social media, content, and outreach to 25,000+ applicants, 4,000+ hackers, and sponsors.
- Organizes club events, Berkeley makerspace collaborations, and budget planning on the order of a $1.25M annual program budget.
Officer — Mathematics Undergraduate Student Association
Aug 2024 – present- Leads reading groups on stochastic applications, game theory, and other pure math topics.
- Undergraduate representative on the Berkeley Math Equity and Inclusion Committee; organizes events and math talks.
Student Consultant — UC & Haas Information Technology
Aug 2023 – present- ~20 hours/week supporting faculty and staff: desktops, laptops, printers, and mobile devices for 400+ users.
- Standard configurations, inventory, imaging, and device setup.
Director of Operations — Data Science Club
May 2023 – Aug 2024- Supported 150+ students on 30+ personal projects; taught exploratory data analysis and workflows.
- Ran team meetings, alumni relations, operations, scheduling, and room bookings.