My 2,718281828459045235 Cents - State Of AI: Year 3 of the hype. Things I learned.

What about coding

People don’t get tired of announcing that AI will replace developers. Mostly salespeople. Microsoft claims that AI is already writing 30–50% of their code. They also admit, that quality of C+ is behind expectations. In 2030 they plan to reach a rate of 95%, though human oversight and authorship remains important. But they are silent about further details. I am not totally disagreeing, but I am pretty sure that there’s still a bumpy road ahead. You may argue: Well, then you don’t know how to properly operate AI. Problem usually sits in front of the screen, as they say. Fair enough. That’s why I want to share some thoughts and experiences.

Copilot - inline auto completion

AI is increasing my efficiency to a very high degree (I am referring to common LLM solutions here, but prefer the term AI, knowing that this can be misleading). But it also requires good skills as a developer. And if you don’t keep your head up, you lose more time “playing around” than you win by using AI.

Speaking of playing around: These days I was just doing that. AI enables me to do this more frequently. It’s so much easier to wrap your head around foreign ground. And it’s fun, without crawling through endless Google search results or SO threads. So I am having this idea of a simple side project, it’s one of those little things I get stuck on. And I was in need of an SVG showing a bike. I asked several LLM to generate one.

I am keeping it short. They all grandiosely failed.

Gemini does not support SVG creation on its web interface. Claude does. And this was the first result: (I am sparing you the detailed prompts, basically I just asked for a simple bike SVG, no extensive prompt, follow-up prompts were quite brief, but detailed enough to get the point across)

Claude: SVG bike - try 1

OK. Close. What else you got?

Claude: SVG bike - try 2

We are getting closer. But still not a bike.

Claude: SVG bike - try 3

No. Last try?

Claude: SVG bike - try 4

Ok. Close. Grok. Your turn.

Grok: SVG bike - try 1

Expressionism. ChatGPT. Last chance to hold the lance for AI.

ChatGPT: SVG bike - try 1

What’s that? You need a doctor?

ChatGPT: SVG bike - try 2

Nope. Next try!

ChatGPT: SVG bike - try 3

Yeah, that’s impressive! But wait. That’s not SVG. Caught you cheating. Use it as a template, try again!

ChatGPT: SVG bike - try 4

What? Last try. You can do it!

ChatGPT: SVG bike - try 5

Cheating. Again.

OK. Unfair comparison. I know. This is clearly not about coding, but it makes a point. I am not blaming an LLM for not being able to create a simple bike SVG. Especially because SVG creation is not really something a developer does and it’s not what LLM are meant for in the first place. But it’s a perfect metaphor for how the process quickly goes sideways:

Those bots have an over-confident way of approaching things. They always try to flatter the user (I learned this wonderful word “sycophancy” these days – and forgot it the next day). And they rotate around problems. Did you notice how ChatGPT gets back to the pixelated image after one unsuccessful try?

Sicophancy - my second favorite word, first one is: Serendipity... hehe... pity!

While they are perfect in solving “closed and recurring tasks”, they suffer from the “open-ended” problem. Microsoft just published a study about jobs that can best benefit from AI. And most of them are exactly this: Translator and interpreters. Salespeople - easy, just train a model on product data and you are good. And so on. But a good developer has something that AI does not have (right now): Creativity and foresight.

(Don’t get me wrong, I don’t want to discredit the capabilities of those jobs!)

Microsoft AI Study: Applicability of AI to jobs

Another devastating example where Gemini-Pro failed on a large scale: I asked for a simple Python script to download an XML sitemap and parse a couple of URLs from it. Please trust me when I say I did the prompting correctly. Also, Python and XML are not exactly an exotic combination. However, after 1 hour, over 80 prompts, and I-don’t-know-how-many tokens, I gave up. Gemini wasn’t able to deliver (I am using gemini-cli for that). It kept circling around the same problem, ignoring correction advice, and repeatedly promising, “Now everything should work,” many, many times. If you want to see more interesting examples of the mixed quality of LLMs at coding, check out this wonderful video.

If you are into troubleshooting or exploring, you may have experienced this: You ask the AI to fix a problem or explain how to achieve something. It will provide you with a solution. Which is usually quite overwhelming. You start implementing this “pamphlet” from step 1 to step 42 or so and at step 2 you get stuck.

AI troubleshooting deviation phenomenon

Why? The system you are working on returns an error. We have to rethink our solution process and feed that back to the LLM. We can’t blame the machine, because the reason is usually: You did not provide enough context about the environment and circumstances. But it’s also important to say: How do you know what details are missing?

You can’t. Especially in troubleshooting, it’s part of the process. The process!

The process of troubleshooting is not having a page of steps to follow and after 10 minutes you are done. If you are a system administrator, you know what troubleshooting means. It’s a complex forth and back, trial-and-error rocky road. You get better with time, that’s what a good, seasoned, senior system administrator is: Someone who has seen a lot of problems and knows how to approach them.

Sure: AI has seen a lot of problems, too. It’s literally in their “brains” as they are trained on a lot of problems. Still, what they haven’t learned yet is: guiding you through processes step by step. Or at least offering it! Instead: wasting 1,000 tokens on a 42-ish-step solution.

You may think it’s part of the business model: Why should the bot efficiently guide you through a process, if it can provide you with a 42-ish-step solution, which won’t work and then: Your context window is out of tokens. Purchase the pro plan or wait 6 hours to get this thing done.

I asked ChatGPT to look over my article, do a fact check. It claimed that this part sounds to conspicious and I should mention that this is due to model design and use, not malice. So, disclaimer: My goal is not to imply a business model behind this phenomenon. I’m just kidding, okay?

Wich finally leads me to more than ranting and complaining with this word of advice: End every prompt like this. It will save you a lot of time and tokens:

Guide me through the process step by step, don’t answer all at once, wait for my confirmation and feedback before you continue.

But what about coding, at all? There are so many platforms offering perfect low-code or even no-code solutions. People telling you they built whole businesses without writing a single line of code.

Don’t believe it? You tried it yourself? Try.

I am not saying that AI/LLM are not capable of e.g. writing code. They are - totally, unironically!

But “real apps” are complex. They are not just an HTML one-pager with a nice parallax effect, some neat JavaScript interactions and CSS. There are backends, data models, APIs, authentication, security, data storage and so on.

These days I was writing this little app that ought to visualize biking trips on a map. It grew a little out of hand. Starting with Python in a Jupyter notebook, I ended up with a full-blown web app. And you know what? I did it almost the “vibe coding” way. See? I am also one of those guys with this impressive stories! I only spent time thinking about user experience and features.

And at some point I thought: “Hey buddy, what did you actually do there under the hood?”

Yes. AI was able to build this complex app (FWIW: It’s Claude 4 that right now provides best results for coding - not being paid for that).

I am using an extensive pre-prompt instructions file, telling the agent to use a structure of separated files, speaking function names and so on. Still I was facing a huge code base after some hours of “vibe coding”. The main app file was over 5,000 lines of code. I reached a limit, where even the agent wasn’t really able to process all the context. It was a mess. Ok, blame me. My instructions probably weren’t that precise. “The problem sits in front of the screen”, as they say.

Well, we still can use the agent to refactor, can’t we?

Yes, we can. I did it. And it didn’t really work out. It was creating stubs and redundant functions and endpoints.

The end of the story: I had to crawl through the code myself. Like back in the days. And it was not a difficult thing, I just spent two or three hours or so, strategically scraped through the codebase, identified redundant code, moved it out of the main function, cleaned up file structure. Like back in the days. Sure, with the help of AI, but this time only as a sidekick.

And this is what I am saying: You still need to be a good developer to understand code architecture and how code works. AI will not be able to deliver a fully fledged market-ready app for you. Not now.

However, this is not only a rant or my two cents, it’s also about “things I learned”, so let’s summarize:

Always include something like “step by step” in your prompts if you are about troubleshooting or exploring
Use a pre-prompt instructions file to guide the agent!
Split the work! Use one agent to get the pre-prompt instructions, let another agent build you the first starting prompt, and so on - I’ve established a mixed toolset, Copilot on one hand, Gemini-CLI on the other and some browser-based agents on the third hand
I also learned that it is really helpful when the agent maintains a “protocol” that contains the initial task and then, in very short bullet points, the current status of the task – this way you can always ask the agent to continue with the last task without having to re-explain everything
And this may not bring me many fans: Also separate tasks and codebases, to keep the context window small. This is the rise of some kind of microservices-ish architecture, because “monolithic” apps can’t be eaten by the agent.
Discipline when it comes to version management. Let the LLM fix a thing, then commit the changes. Fix. Commit. Fix. Commit. You catch my drift.

Hopefully my boss doesn't read this

What about “non coding”?

Yeah, ok, this is a sweeping blow. I also have a opinion here.

Do you remember, back in the days, when you were “Googling” for information? First people started looking for the whole term, like “how to build a web app with AI”. But the Google algorithm was improving and so did the prompts. People reduced their search terms to “web app AI”. You don’t necessarily need filler words to understand the intention of a search. And not only filler words, also spelling or grammar.

Right now the same happens to AI, apparently. Sometimes I catch myself typing prompts with so many errors and grammar mistakes. But it’s doing its job. I wonder if this leads to “degradation” or better “optimization” of our language. Have you heard of Gibberlink or Neuralese? They are supposed to be “languages” or “concepts” that are used by AI to communicate with each other. Like simplified versions of human languages, without all the noise and “unnecessary” words. As we are talking to AI more and more, what if, at some point, we adapt our language to the AI?

Second thing that will change is probably how we will use our “computers”. There are not many user interfaces right now. We have the GUI - the graphical user interface, it’s the colorful windows and buttons you see on the screen. You can touch them or use your mouse cursor to select text. We have the TUI - text user interface, which is most common for CLI (command line interface) applications. You can type commands and see the output in text form. And we have the VUI - voice user interface, which is becoming more and more common with the rise of smart speakers and voice assistants. All of them have their strengths and weaknesses. Building a PowerPoint app from the command line is probably quite difficult (side note: it’s possible and it’s fun to automate data analytics pipelines using Python libraries and create extensive PowerPoint slide sets!). Grepping through multiple servers’ request logs using voice or a mouse? No one does it.

There are exceptions, though. Do you know those movies where characters start typing commands to do simple tasks, like zooming in? They have the most expensive and advanced technologies, but somehow forgot to add a “Zoom” button, so they have to program this rare feature in real time—without even having a shell open. Absolute geniuses!

"Let me zoom in" - *starts typing on they keyboard (Source: Series "SEALS", Paramount, 2017, S01E16

(Leaving out brain computer interfaces here, because they are still in the research phase and not widely used yet.)

Right now, we are using AI in two ways: Most common ways are TUI and VUI. And this is interesting. Because in most cases this breaks the flow of work. You have to write long prompts, edit them, add more details. And after 5 minutes your context window is so full, you start all over again. Good thing: You are forced to think about what you want to achieve. Which should be a no-brainer (even pre-AI): Think before you act! With AI it’s even more important. You also have to copy results back, interpret them, translate them into “graphical” interactions. And so on. Doesn’t really sound like the peak of possible efficiency, does it?

Well, here comes the MCP into play - the Model Context Protocol, which connects the agent to other systems, like - sticking with this example - PowerPoint. You prompt your task, the AI controls PowerPoint and builds stuff. But at the end, you still need to read, type or speak your commands.

What we are missing is a GUI for AI. And as long as we don’t have brain computer interfaces, we will find a way around this. Microsoft is trying to achieve a similar thing with their Recall feature, which raised a lot of anger and concerns about data privacy - which is totally understandable. Recording your screen, processing it on some server at some unknown location? Seriously?

But this will be the future. Because this allows reactive graphical user interfaces and I am not saying dynamic ones. Thinking of a user interface that changes because of some previous actions - that doesn’t sound very useful, right? Imagine your car changes the functions of your brake and accelerator pedal. It’s more an interface that “follows” the user, not makes the user follow the interface. Form follows function.

Sounds bold, I know. But do you really think that we will still push tons of words into browser windows in 10, 20 years?