Notes
Thoughts and notes
Long-form writing on ML engineering, model evaluation, responsible deployment, technical tutorials, and industry observations.
The Three Horizons of Epistemic Change
AI-generated text is epistemically different from human text, and detection methods face inherent limitations. Three distinct phases emerge as synthetic content accumulates in our information systems over time.
The Noise That Looks Like Signal
If AI-generated content is epistemically different from human content, can't we just detect and filter it? A look at why detection tools face fundamental challenges as language models keep improving.
Your Mistakes Are More Valuable Than You Think
A counterintuitive proposition: one of the most valuable properties of training data is human error — not random error, but the structured, systematic, informative errors Gerd Gigerenzer called 'good errors.'
The Photocopier Was the Wrong Metaphor
When people explain model collapse, they reach for the photocopy-of-a-photocopy analogy. It captures iterative degradation — but it frames the problem in a way that limits how we think about solutions.
We're Not Just Degrading AI. We're Reshaping Human Knowledge.
There's been significant attention on model collapse — AI models trained on AI output that gradually degrade. But there's a more consequential question underneath it.
Domain Adaptation: Fine-Tune Pre-Trained NLP Models
A comprehensive guide to fine-tuning pre-trained NLP models for improved performance in specialized domains — covering theoretical frameworks, baseline evaluation, fine-tuning strategies, and result analysis.
Practical Introduction to Transformer Models: BERT
A hands-on tutorial on using BERT for sentiment analysis — walking through the transformer architecture and demonstrating practical implementation for text classification tasks.
6 Steps Towards a Successful Machine Learning Project
A structured framework for approaching machine learning projects end-to-end — from problem definition and data collection through model development, evaluation, and deployment.
Recommendation System in Python: LightFM
A practical walkthrough of building a book recommendation system using LightFM — covering data preparation, hybrid matrix factorization, model training, and generating personalized recommendations.
Topic Modeling in Python: Latent Dirichlet Allocation (LDA)
An end-to-end guide to topic modeling using LDA — covering the intuition behind generative probabilistic models and a practical implementation in Python with Gensim.
Evaluate Topic Models: Latent Dirichlet Allocation (LDA)
A framework for quantitatively evaluating topic models through topic coherence metrics — with code templates in Python for systematic model selection and validation.
Building Blocks: Text Pre-Processing
Foundational text pre-processing concepts for statistical NLP — tokenization, stemming, lemmatization, and stop-word removal — with practical Python implementations.
Language Models: N-Gram
A step into statistical language modeling — explaining how n-gram models assign probabilities to word sequences and their role as building blocks for modern NLP systems.
For writing updates, speaking notes, or ML engineering briefs, send a quick note.
Request Updates