I'm a Master's student in Data Science at Arizona State University, bridging full-stack software engineering and advanced machine learning — with a focus on Generative AI (RAG), MLOps, and statistical modeling.
A real-time road pothole detection system that uses a camera feed to classify road conditions and pins detected potholes on a live map — won 1st place out of 200+ teams at Hack The League and received the Best Use of Public API Award from Postman.
Fine-tuned a pre-trained Xception CNN (ImageNet weights) on a custom web-scraped dataset for binary road classification. Integrated Google Maps API for real-time pothole mapping and built a Flutter mobile app with a Node.js backend. Incorporated federated learning concepts to reduce inference overhead.
A production-ready document Q&A system that lets you upload PDFs and ask questions in plain language — with every answer citing the exact page and document it came from, eliminating hallucination risk from unsupported claims.
Built a full RAG pipeline using LangChain, Qdrant (persistent vector store), and BAAI/bge-small-en-v1.5 embeddings for semantic retrieval. Runs Llama 3 locally via Ollama — no API costs or data leakage. Includes performance metrics tracking (retrieval time, LLM processing time) and chat history export. GPU/MPS acceleration cuts query time from ~20s to 2–3s.
A comparative study of full fine-tuning vs. LoRA (PEFT) for code generation — built to show that parameter-efficient methods can match expensive full-parameter training at a fraction of the compute cost.
Adapted GPT-2 (124M) on the 250k+ sample CodeXGLUE Python dataset. Went beyond standard BLEU/ROUGE metrics by building a custom Execution Pass Rate pipeline that actually runs generated code against unit tests — baseline achieved 39.58% pass rate and BLEU of 17.03 after 3 epochs. ASU group project; responsible for the evaluation pipeline and LoRA training configuration.
An end-to-end recommendation system that goes beyond just building a model — it includes a rigorous A/B testing framework to statistically measure whether personalized recommendations actually improve engagement over a popularity baseline.
Implemented SVD-based collaborative filtering (matrix factorization) on the MovieLens 100k dataset with a popularity fallback for cold-start users. Built a simulation engine using held-out test data as ground truth, deterministic user bucketing via hashing, and Z-test statistical analysis to measure engagement lift. Containerized with Docker; interactive dashboard built with Streamlit + Plotly.
A data science initiative investigating Arizona's housing crisis — with findings presented directly to the AZ Department of Housing under supervision of Unit for Data Science and Analytics ASU. Arizona has one of the highest eviction rates in the nation; in 2024 alone, nearly 90,000 eviction notices were filed in Maricopa County.
Applied K-Means clustering to U.S. Census and Maricopa County records to identify eviction hotspots across zip codes. Built a regression model achieving R² = 0.92, with median income and demographic composition emerging as the strongest predictors (p < 0.05). Mapped eviction trends revealing disproportionately high rates in the bottom 20–30% rent price tier — translating findings into concrete policy recommendations.
A rigorous statistical study of 353 federal civil penalties issued to educational institutions (2010–2019), examining whether institutional type — public, private non-profit, or for-profit — meaningfully predicts how severely schools are penalized for student aid violations.
Applied a Kruskal-Wallis H-test (non-parametric, robust to the skewed penalty distribution) and confirmed institutional sector as a significant predictor of penalty severity (p < 10⁻⁸). Benchmarked Linear Regression, SVM, and GLM (Gamma) using 5-fold cross-validation; OLS achieved R² = 0.66 in identifying the regulatory risk factors driving high-value penalties.
A full ML system for predicting telecom customer churn — with a built-in business value layer that calculates ROI per customer and recommends whether a retention offer is worth sending, avoiding wasteful spend on low-risk customers.
Trained and compared Gradient Boosting, Random Forest, and Logistic Regression models on the IBM Telco Churn dataset (33 features). Built a Flask REST API for real-time predictions with automatic field mapping, feature engineering (service counts, charge ratios), and CLTV-based ROI recommendations. Includes full EDA visualizations and a web interface for live inference.
A chat interface for querying industrial IoT databases in plain English — no SQL knowledge required — with role-based access control that restricts which tables each user type can query.
Built a LangChain SQL chain over three SQLite databases (sensor readings, maintenance logs, revenue) powered by Llama 3 via Ollama for fully local inference. Implemented four access roles (SensorViewer, MaintenanceManager, RevenueAnalyst, PlantDirector) with table-level RBAC enforcement. Gradio interface for interactive querying.
A personal knowledge management system where every note, snippet, and bookmark becomes a node in an interactive D3.js knowledge graph — visually surfacing how your ideas connect, cluster, and build on each other.
Built a full-stack app with a GraphQL (Apollo) API, Prisma ORM (PostgreSQL), and Next.js 14 App Router. Features typed semantic relationships between entries (references, builds-on, contradicts), full-text search with color-coded tagging, a Chrome/Edge browser extension for one-click page saving, and JSON/Markdown export. Deployed to Vercel.
A modern file sharing platform with password protection, auto-expiration, download limits, and QR code generation — built with a production-grade async backend and type-safe frontend, with CI/CD pipelines running on every push.
Built a FastAPI async Python backend with SQLModel (Pydantic validation) and Argon2 password hashing for secure file protection. Frontend in React 19 + TypeScript using TanStack Router and TanStack Query for type-safe routing and server state. Includes in-browser preview for images and PDFs, dark/light theme, and separate GitHub Actions CI/CD pipelines for frontend and backend.
Contributions to production open source projects.
Added "sideEffects": false to the
package.json of 5 packages in the
OpenFeature JS SDK — a CNCF project and the open
standard for feature flagging used across the industry. This signals
to bundlers like Webpack that these packages are safe for
tree-shaking, reducing final bundle sizes for all downstream
consumers of the SDK.
Performed a full codebase audit confirming zero global side effects across all 5 packages. Verified with a successful build, all 35 Jest test suites, and 32 Angular tests before submitting. Reviewed and approved by a project maintainer.
Implemented a semantic version sorting utility for
the cdnjs API server — the backend powering one of
the world's largest open source CDNs, serving billions of asset
requests globally. Previously, the
/libraries/{name}?fields=versions endpoint returned
versions in a random, unsorted order — a long-standing open issue
affecting every API consumer.
Built src/utils/sort.js with a
sortVersions helper handling three cases:
valid semver (via semver.rcompare),
coercible non-semver strings (pre-release tags),
and fully non-parseable strings (fallback to
localeCompare with numeric collation, so
release-1.10 correctly sorts before
release-1.9). Integrated into the
libraryVersions function and added
10 new tests covering semver, pre-release, mixed,
and edge cases. Full suite of 430 tests passes.
Fixed the in-process provider to use
FLAGD_SYNC_PORT for port configuration in the
OpenFeature Python SDK contrib repo — part of
the CNCF OpenFeature project, the open standard for feature
flagging adopted across the industry. Previously, the in-process
provider incorrectly read FLAGD_PORT, the variable
intended for RPC/remote mode, causing a spec misalignment with
flagd and other OpenFeature SDKs.
Added ENV_VAR_SYNC_PORT constant and updated the
port resolution logic so the in-process provider prioritizes
FLAGD_SYNC_PORT, falls back to
FLAGD_PORT for backwards compatibility, and finally
defaults to 8015 — while leaving RPC mode's
FLAGD_PORT → 8013 behavior entirely unchanged.
Updated the README configuration table to document the new
variable. Added 5 new unit tests covering sync
port usage, fallback, priority when both vars are set, RPC
isolation, and default behavior. Full suite of
181 tests passes.