🧠 My GPT — Built from Scratch

𝗔𝘀𝘀𝗲𝗺𝗯𝗹𝗲𝗱 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝗡𝗲𝗲𝘁𝗖𝗼𝗱𝗲 𝗠𝗟 𝗰𝗼𝘂𝗿𝘀𝗲 𝗮𝗻𝗱 𝗯𝘂𝗶𝗹𝘁 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲𝗹𝘆 𝗳𝗿𝗼𝗺 𝘀𝗰𝗿𝗮𝘁𝗰𝗵 𝘄𝗵𝗶𝗹𝗲 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘁𝗵𝗲 𝗶𝗻𝘁𝗲𝗿𝗻𝗮𝗹𝘀 𝗼𝗳 𝗺𝗼𝗱𝗲𝗿𝗻 𝘁𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲𝘀 𝗮𝗻𝗱 𝗚𝗣𝗧 𝗺𝗼𝗱𝗲𝗹𝘀.

Built by Shriram Lahane

𝗩𝗶𝘀𝗶𝘁 𝗠𝘆 𝗡𝗲𝗲𝘁𝗖𝗼𝗱𝗲 𝗩𝗶𝘀𝗶𝘁 𝗠𝘆 𝗥𝗲𝗽𝗼𝘀𝗶𝘁𝗼𝗿𝘆

🚀 Overview

This repository documents my complete journey through the NeetCode Machine Learning course — starting from the mathematical foundations of neural networks and progressing all the way to building a functional GPT architecture from scratch.

Every module, implementation, and experiment in this project was written and submitted by me while completing the course.

📂 Project Structure

model/
  attention.py
  multi_head_attention.py
  transformer.py
  gpt.py
  normalization.py
  batch_normalization.py
  rms_normalization.py
  embeddings.py
  positional_encoding.py
  kv_cache.py
  grouped_query_attention.py

data/
  tokenizer.py
  vocab.py
  loader.py
  dataset.py
  nlp_preprocessing.py
  tokenizer_utils.py

train.py
generate.py

foundations/
  neuron.py
  backprop.py
  mlp.py
  activations.py
  loss.py
  training_loop.py
  dead_relu_detector.py

⚡ Features

🧩 Neural Networks

Backpropagation
Gradient descent
MLP implementation
Activation functions

🧠 Transformer Components

Self-attention
Multi-head attention
RMS normalization
KV Cache

🔤 NLP Pipeline

BPE tokenizer
Dataset preparation
Vocabulary generation
Text preprocessing

🤖 GPT Model

Autoregressive generation
Transformer-based GPT
Training pipeline
Inference utilities

🛠️ Quick Start

pip install -r requirements.txt

python train.py

python generate.py

📚 Course Reference

This project was built while completing the NeetCode ML Course .

Math foundations
Backpropagation
Neural networks
Attention mechanisms
Transformers
GPT architecture
Text generation

🎯 Why This Project Exists

Most GPT repositories abstract away the internals behind modern language models.

This repository takes the opposite approach — implementing the core mechanics manually to deeply understand how transformers and GPTs work internally.

📈 Future Improvements

🚀 Scaling

Larger datasets
Bigger models
Distributed training

⚡ Optimization

Flash Attention
Mixed precision
Inference optimization

📊 Evaluation

Benchmarks
Model evaluation
Fine-tuning pipeline