🧠 My GPT — Built from Scratch

𝗔𝘀𝘀𝗲𝗺𝗯𝗹𝗲𝗱 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝗡𝗲𝗲𝘁𝗖𝗼𝗱𝗲 𝗠𝗟 𝗰𝗼𝘂𝗿𝘀𝗲 𝗮𝗻𝗱 𝗯𝘂𝗶𝗹𝘁 𝗰𝗼𝗺𝗽𝗹𝗲𝘁𝗲𝗹𝘆 𝗳𝗿𝗼𝗺 𝘀𝗰𝗿𝗮𝘁𝗰𝗵 𝘄𝗵𝗶𝗹𝗲 𝗹𝗲𝗮𝗿𝗻𝗶𝗻𝗴 𝘁𝗵𝗲 𝗶𝗻𝘁𝗲𝗿𝗻𝗮𝗹𝘀 𝗼𝗳 𝗺𝗼𝗱𝗲𝗿𝗻 𝘁𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝗲𝗿 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲𝘀 𝗮𝗻𝗱 𝗚𝗣𝗧 𝗺𝗼𝗱𝗲𝗹𝘀.

Built by Shriram Lahane

🚀 Overview

This repository documents my complete journey through the NeetCode Machine Learning course — starting from the mathematical foundations of neural networks and progressing all the way to building a functional GPT architecture from scratch.


Every module, implementation, and experiment in this project was written and submitted by me while completing the course.

📂 Project Structure

model/
  attention.py
  multi_head_attention.py
  transformer.py
  gpt.py
  normalization.py
  batch_normalization.py
  rms_normalization.py
  embeddings.py
  positional_encoding.py
  kv_cache.py
  grouped_query_attention.py

data/
  tokenizer.py
  vocab.py
  loader.py
  dataset.py
  nlp_preprocessing.py
  tokenizer_utils.py

train.py
generate.py

foundations/
  neuron.py
  backprop.py
  mlp.py
  activations.py
  loss.py
  training_loop.py
  dead_relu_detector.py

⚡ Features

🧩 Neural Networks

  • Backpropagation
  • Gradient descent
  • MLP implementation
  • Activation functions

🧠 Transformer Components

  • Self-attention
  • Multi-head attention
  • RMS normalization
  • KV Cache

🔤 NLP Pipeline

  • BPE tokenizer
  • Dataset preparation
  • Vocabulary generation
  • Text preprocessing

🤖 GPT Model

  • Autoregressive generation
  • Transformer-based GPT
  • Training pipeline
  • Inference utilities

🛠️ Quick Start

pip install -r requirements.txt

python train.py

python generate.py

📚 Course Reference

This project was built while completing the NeetCode ML Course .


🎯 Why This Project Exists

Most GPT repositories abstract away the internals behind modern language models.


This repository takes the opposite approach — implementing the core mechanics manually to deeply understand how transformers and GPTs work internally.

📈 Future Improvements

🚀 Scaling

  • Larger datasets
  • Bigger models
  • Distributed training

⚡ Optimization

  • Flash Attention
  • Mixed precision
  • Inference optimization

📊 Evaluation

  • Benchmarks
  • Model evaluation
  • Fine-tuning pipeline