A benchmark of 200 images found that OpenAI's elaborate "GeoGuessr" prompt did not improve o3's geolocation accuracy over a basic prompt—it performed slightly worse. The author warns against overestimating prompt engineering based on anecdotal success, and notes o3's geolocation skill has not carried over to newer GPT models.
seangoedecke-com
30 items from seangoedecke-com
Prompts for AI coding tools are a form of technical debt that decays silently with each model upgrade, unlike code. The author advises against investing heavily in bespoke agentic setups, recommending instead using third-party tools with minimal configuration and keeping custom prompts limited to concrete project facts.
The "just-say-no engineer" who blocks changes to maintain quality was a product of the ZIRP era, when tech companies had bloated teams and little pressure to deliver. With ZIRP over, companies now prioritize shipping features, making this role less valued. The article argues this shift is often blamed on AI but would have happened regardless.
An engineer describes the shift from 2025 to 2026, where AI agents are now reliable enough to write entire PRs, diagnose most bugs, and handle testing and setup tasks. He avoids using LLMs for public communication like PR descriptions or Slack messages, and emphasizes finding the right balance between over- and under-utilizing agents.
DeepSeek-V4-Flash and DwarfStar 4 make LLM steering—directly manipulating model activations mid-inference—practical for local use. The author is skeptical about its utility, arguing most gains can be replicated with prompting or fine-tuning, but expects the next six months to reveal if steering has real applications.
The article examines Elon Musk's proposal to build AI datacenters in space. While cooling via heat radiation is feasible—requiring roughly 250,000 square meters of radiators for a 100MW facility—it remains wildly impractical overall due to the massive launch mass needed, repair difficulties, and other logistical challenges. Cooling itself is not the fundamental obstacle.
Thinking Machines released Interaction Models, a fully-duplex voice system using 200ms micro-turns for more natural conversation, with a separate reasoning model bolted on for complex tasks. Its main achievement is scaling a fully-duplex model to include video input and far more parameters than previous systems like Moshi.
The article outlines progressive arguments for AI: LLMs aid disabled and chronically ill people, level class-based communication barriers, provide equal access to tutoring, and could advance a left-wing technological utopia.
AI coding tools like Claude Code have raised the floor for weak software engineers, turning net-negative contributors into merely mediocre ones. Some engineers now act as thin wrappers around LLMs, adding little value themselves, which may threaten their long-term job security as companies assess what value engineers add beyond AI.
Most incidents are boring and resolve on their own; impulsive fixes often worsen them. The recommended first step is to do nothing, and effective actions are usually simple, like disabling a feature flag. While resolving incidents earns political credit, it does not build lasting power since executives cannot assess the effort involved.
AI progress hasn't slowed despite longer training horizons, possibly due to huge FLOP efficiency gains from fixing bugs, unreliable human intuitions about near-human intelligence, and capabilities depending on traits beyond intelligence like persistence.
A staff engineer argues Will Larson's famous "staff engineer archetypes" are descriptively useful but bad goals. Instead of aiming for an archetype, engineers should focus on building trust and being useful to the company, delivering value regardless of circumstances.
Even if AI makes software engineers worse over time by reducing learning, they may be forced to use it for short-term gains or be outcompeted. The author compares this to pro athletes with short careers, suggesting software engineering may no longer be a lifetime career and engineers should plan accordingly.
No title
3.0Notes comparing historical Luddism to modern anti-AI movements. Luddites were skilled weavers with concrete local goals who enjoyed near-universal class solidarity but were ultimately crushed by the state. The author argues Luddism is a poor model because AI models are intangible information, not smashable machines.
The article examines historical Luddism as a decentralized 19th-century movement where skilled textile workers destroyed machines automating their jobs. It draws parallels to modern anti-AI activism but notes key differences: Luddism was local and targeted specific machines, while AI concerns are global and datacenters can affect jobs worldwide.
The article argues that only three types of AI products currently work effectively: chatbots like ChatGPT, completion-based products like GitHub Copilot, and agentic products like coding agents. It suggests AI-generated feeds and AI-based video games are promising but not yet successful product categories.
Evaluating new AI models takes months because standard benchmarks are unreliable and often gamed by companies. Real-world testing requires significant time and effort, while subjective "vibe checks" provide limited insight. This makes it difficult to determine if AI progress is stagnating or if models are genuinely improving.
The article provides strategies for software engineers to avoid work blockers, including working on multiple tasks, sequencing work to minimize blockers, maintaining reliable developer tooling, debugging outside one's area, building relationships with other teams, and leveraging senior managers for support.
Big tech companies produce sloppy code because engineers often work outside their expertise due to short tenures and frequent team changes. Most code changes are made by relative beginners unfamiliar with codebases, while experienced engineers are overloaded. Companies prioritize flexibility over long-term expertise, accepting bad code as a tradeoff.
AI detection tools cannot prove that text is AI-generated, as language models produce text similar to human writing. These tools can only make educated guesses with limited accuracy, and false positives can cause significant social harm. The billion-dollar AI detection industry often overstates the reliability of these tools.
Large software products become extremely complex as companies add features like self-hosting and enterprise controls. This complexity makes basic questions about what the software does difficult to answer, often requiring investigation. Engineers gain institutional power because they can reliably answer questions about how the software works.
Only engineers actively working on a large software system can meaningfully participate in its design, as effective design requires intimate knowledge of concrete codebase details. Generic software design advice is typically useless for practical problems in existing systems, though it can help with new projects or tie-breaking decisions.
The author argues that software engineers should maintain some cynicism to better understand how large organizations operate. He suggests that pragmatic engagement with organizational politics enables meaningful impact, while idealistic purity often masks deeper cynicism about corporate motives.
xAI's Grok image model is being widely used to generate nonconsensual lewd images of women on Twitter. Users are prompting the AI to create sexualized versions of women's photos, resulting in public harassment. While Grok refuses to generate nude images, it still produces obscene content that enables mass sexual harassment.
In 2025, the author published 141 blog posts, with 33 reaching the front page of Hacker News. The blog peaked at 1.3 million monthly views in August and gained over 2,500 email subscribers. The author was the third most popular blogger on Hacker News for the year.
The article discusses "The Dictator's Handbook," which presents a political theory where leaders maintain power through coalitions. It explores how this theory might apply to tech companies, noting that while coalition politics may dominate at the top levels, technical competence becomes more critical for success at middle management levels in engineering organizations.
Cryptocurrency coins $GAS and $RALPH have been created using the Bags platform, nominating AI developers Steve Yegge and Geoff Huntley as beneficiaries. The coins are technically unrelated to the developers' open-source AI projects Gas Town and Ralph Wiggum loop. This represents a new cryptocurrency airdrop tactic targeting open-source AI developers.
The author describes being addicted to being useful, which drives their enjoyment of software engineering despite industry challenges. They compare themselves to Gogol's character Akaky Akaievich, whose dysfunctional traits matched his terrible job. Many software engineers are motivated by internal compulsions like solving puzzles rather than external rewards.
The author argues that accurate software project estimation is impossible because work involves unpredictable unknowns. Instead, estimates serve as political tools for managers, and engineers should work backward from desired timelines to determine feasible technical approaches.
Software engineers must understand how tech companies operate to succeed, regardless of their career goals. This includes knowing organizational politics, project dynamics, and how to navigate company structures. The analogy is that you need to know how to drive the car to reach your destination, whatever that may be.