[ cluster · zoomed ]
esc to exit · click node to open
[[ _ initializing _ ]]
  mapping latent space
    loading embeddings
[ 00% ]
Code that ships.
Systems that scale.// they call me ashmit
ML Systems · Computer Vision · India

I don't use models.
I break them.
Rebuild them
// better.

CS student · ML systems · builder.
Strong in math. Stronger in curiosity.

[ 01 ]Selected Work
001
Document Vision Pipeline
YOLOv8 · OpenCV · Tesseract OCR
~18% accuracy liftLayout detectionNoisy scans
002
Claim Extractor
Scikit-learn · Tesseract · NLP
~34% F1 liftClass imbalanceDeduplication
003
torch-surgeon
PyTorch · Python · Open Source
Sub-1% overhead94% test coveragepip installable
004
Semantic Search
Sentence Transformers · FAISS
Sub-10ms latency10k-chunk corpusLocal deploy
005
Drug Safety Guard
Flutter · Node.js · PostgreSQL · Redis
~3× more interactions detectedSub-12ms p99● live demo
006
Claim Extractor
Scikit-learn · NLP · SMOTE
~34% F1 liftin development
007
Churn Prediction
XGBoost · SHAP · Flask
SHAP explainabilityin development
[ 02 ]Experience
Internship · 2025
Singsys Pvt Ltd
Mobile App Developer Intern

Worked on production Flutter flows — handling async data, state consistency, and UI reliability under real device constraints.

What it taught me: production is a different discipline from building.

[ 03 ]Lab / Thinking
ML Breakdown
Why Attention Collapses
Notes on attention sink phenomena in long-context transformers.
Published
Experiment
Loss Landscape Explorer
Visualizing gradient loss landscapes for small networks using random direction perturbation.
Live
CV Note
YOLOv8 vs the Real World
COCO-trained weights vs real document scans — what benchmarks don't surface.
In Progress
ML Breakdown
Tokenization is Lossy
What BPE tokenization silently destroys.
Published
Math Note
Linear Algebra in ML
The geometry behind PCA, SVD, and attention.
Live
Idea
OCR is a Systems Problem
Why structured extraction after recognition is harder than OCR accuracy.
Draft
[ 04 ]About
"I care about understanding why something works, not just that it works."

CS student, India. CGPA 8.90 (top 12% · 10.0 scale). Most of my learning happens outside the curriculum.

I know what real systems feel like when they break.

LanguagesPython, C++, Java, Dart, SQL
ML / DLPyTorch, Scikit-learn, Pandas, NumPy
Computer VisionOpenCV, YOLOv8, Tesseract OCR
ToolsGit, Linux, Jupyter, Colab
Open Sourcetorch-surgeon · qr-scanner
Status2nd Year · Actively building
Open to ML / SWE internships — 2026
[ 05 ] — Let's talk

Got a model
that needs to
actually work.

Inference pipelines, training bugs, real-world CV. That's where I live.

ML Systems · Computer Vision · India
ASHMIT
SINGH
engineer.
hover a cluster to identify
click to enter · click again to open
scroll
Machine LearningComputer VisionPyTorchYOLOv8PythonOpenCVScikit-learnTesseract OCRNumPy · PandasC++FlutterSQL Machine LearningComputer VisionPyTorchYOLOv8PythonOpenCVScikit-learnTesseract OCRNumPy · PandasC++FlutterSQL
[ 01 ]

Selected Work

001
Document Vision Pipeline
YOLOv8 · OpenCV · Tesseract OCR · Python
Layout detection Structured extraction Noisy scan handling ~18% accuracy lift
↗ GitHub
OpenCV preprocessing lifted OCR accuracy on degraded scans by ~18% over raw Tesseract baseline. Layout-aware extraction where full-page OCR produced unusable output.
002
Claim Extractor
Scikit-learn · Tesseract · Pandas · Python
High precisionMulti-format inputsEvaluated on real datasets
SMOTE + macro F1 training lifted minority-class recall by ~34% over naive baseline. Semantic deduplication collapsed paraphrased claims keyword matching can't see.
003
torch-surgeon
PyTorch · Python · Open Source · v0.1.0
Gradient hooks Per-layer attribution Sub-1% overhead 94% test coverage
↗ pip install torch-surgeon GitHub ↗
Sub-1% training overhead via stats-in-hook design. Surfaces vanishing, exploding, and stagnant gradients per layer — in real time, before loss curves show anything.
004
Semantic Search
Sentence Transformers · FAISS · Flask
Embedding-based retrieval Sub-10ms latency Local deployment
↗ GitHub
Sub-10ms retrieval over 10k-chunk corpus via FAISS flat index. Surfaces semantically relevant results for queries that return zero keyword matches.
005
Drug Safety Guard
Flutter · Node.js · PostgreSQL · Redis · Cassandra · Docker
Ingredient-level detection Real-time alerts HIPAA audit trail ~3× more interactions detected
↗ live demo GitHub ↗
Ingredient-level detection catches ~3× more interactions than brand-name lookup. Cold path 1,008 checks in ~340ms; Redis-cached hot path under 1ms.
in development
006
Claim Extractor
Scikit-learn · NLP · SMOTE · Python
~34% F1 lift Class imbalance Semantic deduplication in development
SMOTE + macro F1 training lifted minority-class recall by ~34% over naive baseline. Semantic deduplication collapsed paraphrased claims keyword matching can't see.
007
Churn Prediction
Scikit-learn · XGBoost · SHAP · Flask
Binary classification SHAP explainability Imbalance handling in development
Gradient boosting + SHAP values surface why customers leave — not just whether they will. Contract type and tenure emerge as the dominant churn signals.
[ 02 ]

Experience

Internship · 2025
Singsys Pvt Ltd
Mobile App Developer Intern

Worked on production Flutter flows — handling async data, state consistency, and UI reliability under real device constraints.

What it taught me: production is a different discipline from building. Race conditions and state drift aren't edge cases. They're the contract.

Shipped
  • Designed and implemented multiple responsive Flutter screens adhering to Material Design guidelines — improving visual consistency across the app
  • Integrated RESTful APIs to fetch, parse, and display live backend data — managed async state and error conditions using Flutter's built-in patterns
  • Identified, diagnosed, and resolved UI and logic bugs through systematic testing across device configurations — contributing to a stable release build
[ 03 ]

Lab / Thinking

ML Breakdown
Why Attention Collapses
Notes on attention sink phenomena in long-context transformers — when keys cluster around early tokens and why positional bias compounds over longer sequences.
Published
Experiment
Loss Landscape Explorer
Visualizing gradient loss landscapes for small networks using random direction perturbation. Built to understand why identical architectures converge to very different minima.
Live
CV Note
YOLOv8 vs the Real World
What happens when you move from COCO-trained weights to real document scans — lighting variance, skew, and compression artifacts that benchmarks don't surface.
In Progress
ML Breakdown
Tokenization is Lossy
What BPE tokenization silently destroys — morphology, number arithmetic, code structure. The invisible ceiling most NLP systems hit without realizing it.
Published
Math Note
Linear Algebra in ML
Working through the geometry behind PCA, SVD, and attention — what these operations actually do to representation space, not just how to implement them.
Live
Idea
OCR is a Systems Problem
Why OCR accuracy is only half the problem — the harder part is structured extraction after recognition, and why naive pipelines fail on real-world variance.
Draft
Live Project
QR Scanner
Cross-platform Flutter QR scanner — deployed live on Vercel across Android, iOS, Linux, macOS, and Web from a single codebase.
Live↗ view live
[ 04 ]

How I Think

01 — First principles
Understand the math, then use the library.
I read papers before I run notebooks. When something breaks, I want to know why — not just which hyperparameter to tune. Gradient flow, information bottlenecks, the geometry of loss surfaces. Foundation, not abstraction.
02 — Failure modes first
A system that knows when it's wrong is more useful than one that's always confident.
Every pipeline I build has a failure path. Confidence thresholds, flagging logic, graceful degradation. Uncertainty is information. Silencing it to look cleaner is a design mistake.
03 — Metrics are decisions
What you measure is what you optimise for. Choose carefully.
Swapping accuracy for macro F1 on an imbalanced dataset isn't a technical detail — it's a statement about what the model should care about. The metric is the objective function.
04 — Preprocessing is load-bearing
The difference between usable output and garbage usually happens before the model.
Deskewing, binarization, adaptive thresholding — these aren't boring steps. They determine whether downstream inference is possible at all. Garbage in, garbage out is a law, not a warning.
05 — Mistakes are load-bearing too
The bug that costs you a day teaches you more than the feature that ships cleanly.
I once spent three days chasing a model accuracy problem that turned out to be a label indexing error introduced in preprocessing — the model was learning perfectly, from the wrong signal. What it taught me: when something behaves wrongly but consistently, trust the data pipeline less than the model. I check inputs before I tune hyperparameters now.
[ 05 ]

About

// philosophy"I care about understanding why something works, not just that it works. That's the only knowledge that transfers."

CS student at SRM IST, India. CGPA 8.90 (top 12% · 10.0 scale). Most of my learning happens outside the curriculum — reading papers, building small systems, understanding the math underneath the abstractions.

Background in ML fundamentals: probability, linear algebra, statistics. I work from first principles. I've shipped production code. I know what real systems feel like when they break — and what it takes to stop them breaking.

Open to ML / SWE internships — 2026
LanguagesPython, C++, Java, Dart, SQL, R, Julia
ML / Data SciencePyTorch, Scikit-learn, Pandas, Polars, NumPy, SymPy, Matplotlib, Seaborn
Computer VisionOpenCV, YOLOv8, Tesseract OCR, Image Segmentation
MobileFlutter, Dart, REST APIs, Material Design
ToolsGit, Linux, Jupyter, Colab, VS Code, LaTeX
FoundationLinear Algebra, Probability, Statistics, Calculus, DSA
Open Sourcetorch-surgeon · qr-scanner · semantic-search
Status2nd Year SRM IST · Actively building
ASHMIT
ML · CV · Systems
[ 06 ] — Let's talk

Got a model
that needs to
actually work.

Inference pipelines, training bugs, real-world CV. That's where I live.