LLM-Powered Coder Assistants
On Friday 23 May 2025 Evgenii Grigorev held an Open Mic session with the topic "LLM-Powered Coder Assistants". Video and presentation included.
Abstract
Code-generating LLMs are not wizards — they’re sophisticated pattern matchers trained on terabytes of code. But how do they turn a prompt like “Sort this CSV by date and calculate weekly averages” into working Python? This session will demystify the :
- Core mechanics – Transformers, attention layers, and tokenization
- Training secrets: From GitHub scrapes to context-aware fine-tuning.
- Why they fail: Hallucinations, hidden biases, and the “copy-paste paradox”.
- Examples from data analysis (Pandas, SQL) will illustrate key concepts.
Outline
-
Introduction to LLMs: Transformers, tokenization, and the “autocomplete on steroids” paradigm.
-
Tools Deep Dive: GitHub Copilot, ChatGPT, CodeWhisperer, and open-source alternatives (StarCoder, Llama 3).
-
Under the Hood: Training on GitHub data, context window limitations, and safety guardrails.
-
Pros vs. Cons: 55% faster coding (GitHub study) vs. 40% of generated code containing vulnerabilities (Stanford research).
The presentation slides are available at this link.
Further reading: