β’ 317 words
Nearly two years ago Large Language Models were all the rage. To some extent they still are. Anyway, I wanted to see what the hype was about and did a litte experiment with ChatGTP. I tried to use it for a coding exercise. The outcome was a mixed bag, I wrote an article about it, and have basically ignored LLMs in my professional life ever since. What shall I say: my job has not been automated away just yet.
On the other hand though, I keep hearing about productivity gains from colleagues, I also see that the capabilities of the various models enable people who are not professional programmers to build useful things that they would have perceived as out of their reach before.
I remain sceptical, foremost because the necessary presence of hallucinations implies that the output cannot - at least in good conscience - be used in any (business-)critical system without thorough critical review. And human review places an upper cap on productivity gains. An empirical case study from 2006 on the effectiveness of lightweight code reviews reports that review rates faster than 500 SLOC per hour, review session length > 90 minutes and code size under review > 400 SLOC all make the process significantly less effective and miss critical defects.
I think Fred Brooks' famous assertion from his 1986 paper No Silver Bullet still holds true: There is no single development, in either technology or management technique, which by itself promises even one order of magnitude improvement in productivity, in reliability, in simplicity.
.
But, even if it is not an order-of-magnitude, if a consistent percentage improvement can be had, language models might become a tool like IDEs, autoformatters, linters and debuggers in the long run. As github just today advertised a free tier for their copilot product, it might be worth to repeat the experiment and gather a few more anecdata...