<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://adrianhall.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://adrianhall.github.io/" rel="alternate" type="text/html" /><updated>2026-04-21T00:13:43-07:00</updated><id>https://adrianhall.github.io/feed.xml</id><title type="html">Because Developers are Awesome</title><subtitle>Musings about cloud development.</subtitle><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><entry><title type="html">Tools for troubleshooting DNS</title><link href="https://adrianhall.github.io/posts/2026/2026-04-21-dns-tools.html" rel="alternate" type="text/html" title="Tools for troubleshooting DNS" /><published>2026-04-21T00:00:00-07:00</published><updated>2026-04-21T00:00:00-07:00</updated><id>https://adrianhall.github.io/posts/2026/dns-tools</id><content type="html" xml:base="https://adrianhall.github.io/posts/2026/2026-04-21-dns-tools.html"><![CDATA[<p>One of the best things about joining a new company is that you get to go to a series of training for that company.  For Cloudflare, I get a technical bootcamp focused on Internet technologies.  I learned Internet technologies a long time ago, so it’s a chance to catch up and immerse myself in the improvements that have happened along the way.  My role recently has focused on developers (and it still does), but that doesn’t mean you should be oblivious to standard trouble shooting.</p>

<p><img src="/assets/images/2026/Apr21-banner.png" alt="Tools for troubleshooting DNS" /></p>

<p>Take DNS, for example - subject #1 in the Cloudflare tech bootcamp.  There is an old mantra when things go wrong on the Internet - “It’s always DNS”.  There’s <a href="https://www.joom.com/en/products/6937e8f2d855ad01027b49e8">even a t-shirt</a>. So, quite obviously, you will want to fix or rule out DNS quickly.  Fortunately, there are tools for that.  Unfortunately - some of them need DNS to work.</p>

<p>This post isn’t about learning DNS.  There are <a href="https://www.cloudflare.com/learning/dns/what-is-dns/">much better sites than mine for that</a>.</p>

<h2 id="looking-up-a-name---nslookup-dig-delv">Looking up a name - nslookup, dig, delv</h2>

<p>The most obvious thing you are going to need to do is look up something within DNS.  Every single system - Windows, Mac, Linux - has <a href="https://linux.die.net/man/1/nslookup"><code class="language-plaintext highlighter-rouge">nslookup</code></a> installed.  It allows you to do a query against a specific DNS resolver.  For instance, you might type the following:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>nslookup <span class="nt">-query</span><span class="o">=</span>a <span class="nt">-timeout</span><span class="o">=</span>10 cloudflare.com 1.1.1.1
Server:		1.1.1.1
Address:	1.1.1.1#53

Non-authoritative answer:
Name:	cloudflare.com
Address: 104.16.132.229
Name:	cloudflare.com
Address: 104.16.133.229
</code></pre></div></div>

<p>It gives you the information without any fuss.  If you want more information, however, you need a better tool.  That tool is <a href="https://www.commandinline.com/cheat-sheet/dig/"><code class="language-plaintext highlighter-rouge">dig</code></a>.  If I do the same query using <code class="language-plaintext highlighter-rouge">dig</code>, I get more information:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>dig cloudflare.com A @1.1.1.1

<span class="p">;</span> &lt;&lt;<span class="o">&gt;&gt;</span> DiG 9.10.6 &lt;&lt;<span class="o">&gt;&gt;</span> cloudflare.com A @1.1.1.1
<span class="p">;;</span> global options: +cmd
<span class="p">;;</span> Got answer:
<span class="p">;;</span> -&gt;&gt;HEADER<span class="o">&lt;&lt;-</span> <span class="no">opcode</span><span class="sh">: QUERY, status: NOERROR, id: 29931
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;cloudflare.com.			IN	A

;; ANSWER SECTION:
cloudflare.com.		134	IN	A	104.16.133.229
cloudflare.com.		134	IN	A	104.16.132.229

;; Query time: 24 msec
;; SERVER: 1.1.1.1#53(1.1.1.1)
;; WHEN: Mon Apr 20 09:11:10 BST 2026
;; MSG SIZE  rcvd: 75
</span></code></pre></div></div>

<p>Same result, but now I can see some details of the DNS protocol used behind the scenes.  DNS is hierarchical with multiple servers involved, so I may want to see why the resolution happened that way - I can add <code class="language-plaintext highlighter-rouge">+trace</code> to the command (although I would also add <code class="language-plaintext highlighter-rouge">+nodnssec</code> to avoid seeing too much).</p>

<p>Talking of DNSSEC, sometimes you are going to need to check the certificate chain of your DNS query.  <code class="language-plaintext highlighter-rouge">dig</code> doesn’t quite work for that.  While you can use <code class="language-plaintext highlighter-rouge">dig +sigchase</code> for this, <a href="https://kb.isc.org/docs/aa-01152"><code class="language-plaintext highlighter-rouge">delv</code></a> is better.  If you say <code class="language-plaintext highlighter-rouge">delv name</code> (where name is a signed zone), <code class="language-plaintext highlighter-rouge">delv</code> will report “fully validated”, giving you a confidence that things are working.  <code class="language-plaintext highlighter-rouge">delv</code> is preferred because it works in a way that is much closer to what really happens inside a DNS server.</p>

<p>If you are on a Mac, you can install both <code class="language-plaintext highlighter-rouge">dig</code> and <code class="language-plaintext highlighter-rouge">delv</code> using <code class="language-plaintext highlighter-rouge">brew install bind</code>.  Linux (and WSL on Windows) can use <a href="https://www.isc.org/download/">the ISC package manager downloads</a>.  There are also web sites that will do this for you.  However, you should download the tools so you can check <strong>YOUR</strong> resolver.</p>

<h2 id="handling-modern-protocols---kdig-and-dog">Handling modern protocols - kdig and dog</h2>

<p>While <code class="language-plaintext highlighter-rouge">nslookup</code> and <code class="language-plaintext highlighter-rouge">dig</code> are great for standard DNS (which uses TCP or UDP port 53), they aren’t designed for newer encrypted protocols like DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT).  DoT operates on port 853 and creates a TLS tunnel for DNS traffic, making it easier for network admins to identify (and potentially block) compared to DoH.  <a href="https://www.knot-dns.cz/docs/latest/html/man_kdig.html"><code class="language-plaintext highlighter-rouge">kdig</code></a> is available from the knot-dnsutils package (or knot package for brew) for handling this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kdig +tls @1.1.1.1 www.google.com
<span class="p">;;</span> TLS session <span class="o">(</span>TLS1.3<span class="o">)</span>-<span class="o">(</span>ECDHE-X25519<span class="o">)</span>-<span class="o">(</span>ECDSA-SECP256R1-SHA256<span class="o">)</span>-<span class="o">(</span>AES-256-GCM<span class="o">)</span>
<span class="p">;;</span> -&gt;&gt;HEADER<span class="o">&lt;&lt;-</span> <span class="no">opcode</span><span class="sh">: QUERY; status: NOERROR; id: 34748
;; Flags: qr rd ra; QUERY: 1; ANSWER: 8; AUTHORITY: 0; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR
;; PADDING: 293 B

;; QUESTION SECTION:
;; www.google.com.              IN      A

;; ANSWER SECTION:
www.google.com.         298     IN      A       142.251.150.119
www.google.com.         298     IN      A       142.251.156.119
www.google.com.         298     IN      A       142.251.154.119
www.google.com.         298     IN      A       142.251.153.119
www.google.com.         298     IN      A       142.251.151.119
www.google.com.         298     IN      A       142.251.155.119
www.google.com.         298     IN      A       142.251.157.119
www.google.com.         298     IN      A       142.251.152.119

;; Received 468 B
;; Time 2026-04-20 10:04:54 BST
;; From 1.1.1.1@853(TLS) in 97.6 ms
</span></code></pre></div></div>

<p>Yes, it looks exactly like the output from <code class="language-plaintext highlighter-rouge">dig</code>, but it’s use DNS-over-TLS (see the last line for confirmation).  You can also use <code class="language-plaintext highlighter-rouge">dog</code>, which is a Rust-based DNS client that tends to be much more user-friendly for encrypted queries.  While <code class="language-plaintext highlighter-rouge">dog</code> is preferred by practitioners for its JSON output, it’s more painful to install. <code class="language-plaintext highlighter-rouge">kdig</code> has the advantage of being human readable and installable from the normal package managers you use.</p>

<p>DoH (DNS-over-HTTP) wraps DNS queries inside standard HTTP traffic on port 443.  This makes it nearly indistinguishable from regular web traffic, which is excellent for bypassing censorship but harder for enterprise monitoring.  You can use <code class="language-plaintext highlighter-rouge">kdig</code> and <code class="language-plaintext highlighter-rouge">dog</code> again:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>kdig @1.1.1.1 +https cloudflare.com
<span class="p">;;</span> TLS session <span class="o">(</span>TLS1.3<span class="o">)</span>-<span class="o">(</span>ECDHE-SECP256R1<span class="o">)</span>-<span class="o">(</span>ECDSA-SECP384R1-SHA384<span class="o">)</span>-<span class="o">(</span>AES-256-GCM<span class="o">)</span>
<span class="p">;;</span> HTTP session <span class="o">(</span>HTTP/2-POST<span class="o">)</span>-<span class="o">(</span>1.1.1.1/dns-query<span class="o">)</span>-<span class="o">(</span>status: 200<span class="o">)</span>
<span class="p">;;</span> -&gt;&gt;HEADER<span class="o">&lt;&lt;-</span> <span class="no">opcode</span><span class="sh">: QUERY; status: NOERROR; id: 0
;; Flags: qr rd ra ad; QUERY: 1; ANSWER: 2; AUTHORITY: 0; ADDITIONAL: 1

;; EDNS PSEUDOSECTION:
;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR
;; PADDING: 389 B

;; QUESTION SECTION:
;; cloudflare.com.              IN      A

;; ANSWER SECTION:
cloudflare.com.         153     IN      A       104.16.133.229
cloudflare.com.         153     IN      A       104.16.132.229

;; Received 468 B
;; Time 2026-04-20 10:16:30 BST
;; From 1.1.1.1@443(HTTPS) in 128.9 ms
</span></code></pre></div></div>

<p>I prefer <code class="language-plaintext highlighter-rouge">kdig</code> to <code class="language-plaintext highlighter-rouge">dog</code> as it is more readily available on more platforms, being part of Knot DNS.</p>

<h2 id="getting-to-the-server---traceroute-and-mtr">Getting to the server - traceroute and mtr</h2>

<p>One of the common problems you have to solve is “can I get there from here?” - is the resolver (or name server) I am checking reachable for me?  For this, there are two tools - <a href="https://linux.die.net/man/8/traceroute"><code class="language-plaintext highlighter-rouge">traceroute</code></a> (<a href="https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/tracert"><code class="language-plaintext highlighter-rouge">tracert</code></a> on windows) and <a href="https://linux.die.net/man/8/mtr"><code class="language-plaintext highlighter-rouge">mtr</code></a>.  You will probably have access to traceroute, but mtr may not be available and require to be installed.</p>

<p>Traceroute uses ICMP packets with ever increasing time-to-live to plot the route a packet will take to your specified destination.  It’s got a basic functionality:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>traceroute <span class="nt">-n</span> linux.die.net.
traceroute: Warning: linux.die.net. has multiple addresses<span class="p">;</span> using 172.67.69.187
traceroute to linux.die.net <span class="o">(</span>172.67.69.187<span class="o">)</span>, 64 hops max, 40 byte packets
 1  192.168.1.1  8.735 ms  6.944 ms  7.335 ms
 2  212.158.250.39  13.987 ms  13.826 ms  10.548 ms
 3  63.130.172.37  11.505 ms  11.517 ms <span class="k">*</span>
 4  90.255.251.37  17.378 ms  11.493 ms  14.453 ms
 5  162.158.32.9  15.367 ms  14.616 ms
    162.158.32.45  13.774 ms
 6  172.67.69.187  10.802 ms  14.140 ms  9.756 ms
</code></pre></div></div>

<p>Note the <code class="language-plaintext highlighter-rouge">* * *</code> entries.  This means that it found a hop, but the router at that hop is not responding to ICMP packets.  This is normal on the Internet and not a concern.  You can also see that hop 5 went through two different routers - again, relatively common.</p>

<p>On the internet, routes may be asymmetric - you may not take the same route back from a destination as you did to get to that destination.  Thus, in an ideal world, you would be able to do a <code class="language-plaintext highlighter-rouge">traceroute</code> from either end.  Unfortunately, it doesn’t work like that.  Fortunately, <a href="https://linux.die.net/man/8/mtr"><code class="language-plaintext highlighter-rouge">mtr</code></a> can sort of handle it.  <code class="language-plaintext highlighter-rouge">mtr</code> combines the functionality of <code class="language-plaintext highlighter-rouge">traceroute</code> and <code class="language-plaintext highlighter-rouge">ping</code> in a single network diagnostic tool.  While <code class="language-plaintext highlighter-rouge">traceroute</code> shows a single snapshot, <code class="language-plaintext highlighter-rouge">mtr</code> provides rolling statistics which makes it superior for catching intermittent packet loss.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">sudo </span>mtr google.com <span class="nt">-c</span> 10 <span class="nt">-r</span>
Start: 2026-04-21T09:33:36+0100
HOST: DC2K0HQTXH                  Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 162.158.73.16              0.0%    10   27.1  13.3  10.1  27.1   5.0
  2.|-- 162.158.73.16              0.0%    10   13.2  13.6  11.4  18.3   2.4
  3.|-- 104.28.0.0                 0.0%    10   11.6  13.1  11.5  16.3   1.7
  4.|-- 162.158.73.1               0.0%    10   22.2  18.8  12.1  39.5   8.1
  5.|-- 162.158.32.44              0.0%    10   13.5  18.9  12.7  35.2   8.0
  6.|-- man-b2-link.ip.twelve99.n  0.0%    10   21.1  18.6  13.8  33.1   5.7
  7.|-- dln-b6-link.ip.twelve99.n  0.0%    10   18.9  20.1  18.0  26.1   2.6
  8.|-- dln-b3-link.ip.twelve99.n 30.0%    10   16.3  17.0  15.9  20.5   1.7
  9.|-- 72.14.243.178              0.0%    10   23.7  18.2  15.9  23.7   2.7
 10.|-- lclhrb-in-f139.1e100.net   0.0%    10   28.6  30.5  23.8  65.0  12.4
</code></pre></div></div>

<p>Notice that there is packet loss at hop 8.  This is generally because there is a problem on the return leg.  30% suggests that 3 out of 10 packets didn’t make the round-trip.  However, the final hop shows 0% packet loss, so it’s not a problem.</p>

<p>You can install <code class="language-plaintext highlighter-rouge">mtr</code> using brew on Mac and through the normal Linux package managers and there is a Windows package called <a href="https://winmtr.net">WinMTR</a> for you.</p>

<blockquote>
  <p><strong>Note</strong>: <code class="language-plaintext highlighter-rouge">mtr</code> needs to create raw sockets to send ICMP or UDP packets, which is a privileged operation on most Unix-like systems.  You may need to run <code class="language-plaintext highlighter-rouge">mtr</code> within <code class="language-plaintext highlighter-rouge">sudo</code>, depending on permissions.</p>
</blockquote>

<h2 id="use-another-resolver">Use another resolver</h2>

<p>If you work in an enterprise (or you have a solid home lab), you are probably running your own recursive resolver.  What if that is broken?  At that point, you will want to do comparisons between the results from your resolver and someone elses resolver.  Fortunately, there are <a href="https://en.wikipedia.org/wiki/Public_recursive_name_server">plenty of those</a> that you can use for free:</p>

<ul>
  <li>Cloudflare has 1.1.1.1 (this is the one I recommend because it’s everywhere you are and privacy focused)</li>
  <li>Google has 8.8.8.8 and 8.8.4.4</li>
  <li>Quad9 has 9.9.9.9, 9.9.9.10, 9.9.9.11</li>
  <li>Cisco OpenDNS has 208.67.222.222 and 208.67.220.220</li>
</ul>

<p>When you are wondering what the Internet sees, target a public resolver instead of your own resolver.</p>

<h2 id="web-sites-you-might-want-to-know">Web sites you might want to know</h2>

<p>One of the common things to watch for is mis-configured DNS.  It works - it’s just giving out the wrong information. Here are a few websites I have in my collection:</p>

<ul>
  <li><a href="https://mxtoolbox.com">MxToolbox</a> is focused on DNS for email transfer.  It will not only do DNS lookups, but it will analyze email headers, see if your SMTP outbound IP is on a blacklist, and perform basic SMTP diagnostics.  This is a useful site if your DNS problems stem from an email report.</li>
  <li><a href="https://dnschecker.org">DNSChecker</a> is a basic DNS lookup - much like <code class="language-plaintext highlighter-rouge">nslookup</code>, but using someone elses computer.  It’s main feature is being able to see whether your change has propagated out to the Internet or not.</li>
  <li><a href="https://dnstools.ws">DNSTools</a> provides access to all the tools that you can run on your local machine - but on someone elses machine.</li>
  <li><a href="https://who.is">who.is</a> provides access to the registrar information.  The main job here is to understand which name servers are authoritative for a specific domain.</li>
</ul>

<p>You also want to have specific tests for handling DNSSEC:</p>

<ul>
  <li><a href="https://dnssec-debugger.verisignlabs.com">Verisign Labs</a> provides a DNSSEC domain debugger.</li>
  <li><a href="https://dnsviz.net">DNSViz</a> visualizes the status of a DNS zone, explicitly providing a visual analysis of the DNSSEC chain.</li>
  <li><a href="https://caatest.co.uk">DNS CAA Tester</a> let’s you view the certificate authority authorization (CAA) embedded in your DNS records, which let’s you specify which certificate authorities are allows to issue certificates for the domains you own.</li>
</ul>

<h2 id="final-thoughts">Final thoughts</h2>

<p>It’s always DNS, but it doesn’t have to be.  With these tools in your pocket, you can quickly and easily determine if the problem is actually DNS.  You still need to “learn DNS”, but there are lots of resources for that. Make sure the tools are available on all the machines you use to diagnose issues and that you have practiced how to use them before they are needed.  You need to know what “good” looks like before you can determine if the current state is good or bad.</p>

<p>DNS will never be problematic again with these tools, DNS know-how, and basic troubleshooting skills.</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Tools" /><category term="dns" /><summary type="html"><![CDATA[One of the best things about joining a new company is that you get to go to a series of training for that company. For Cloudflare, I get a technical bootcamp focused on Internet technologies. I learned Internet technologies a long time ago, so it’s a chance to catch up and immerse myself in the improvements that have happened along the way. My role recently has focused on developers (and it still does), but that doesn’t mean you should be oblivious to standard trouble shooting. Take DNS, for example - subject #1 in the Cloudflare tech bootcamp. There is an old mantra when things go wrong on the Internet - “It’s always DNS”. There’s even a t-shirt. So, quite obviously, you will want to fix or rule out DNS quickly. Fortunately, there are tools for that. Unfortunately - some of them need DNS to work. This post isn’t about learning DNS. There are much better sites than mine for that. Looking up a name - nslookup, dig, delv The most obvious thing you are going to need to do is look up something within DNS. Every single system - Windows, Mac, Linux - has nslookup installed. It allows you to do a query against a specific DNS resolver. For instance, you might type the following: $ nslookup -query=a -timeout=10 cloudflare.com 1.1.1.1 Server: 1.1.1.1 Address: 1.1.1.1#53 Non-authoritative answer: Name: cloudflare.com Address: 104.16.132.229 Name: cloudflare.com Address: 104.16.133.229 It gives you the information without any fuss. If you want more information, however, you need a better tool. That tool is dig. If I do the same query using dig, I get more information: $ dig cloudflare.com A @1.1.1.1 ; &lt;&lt;&gt;&gt; DiG 9.10.6 &lt;&lt;&gt;&gt; cloudflare.com A @1.1.1.1 ;; global options: +cmd ;; Got answer: ;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 29931 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 1232 ;; QUESTION SECTION: ;cloudflare.com. IN A ;; ANSWER SECTION: cloudflare.com. 134 IN A 104.16.133.229 cloudflare.com. 134 IN A 104.16.132.229 ;; Query time: 24 msec ;; SERVER: 1.1.1.1#53(1.1.1.1) ;; WHEN: Mon Apr 20 09:11:10 BST 2026 ;; MSG SIZE rcvd: 75 Same result, but now I can see some details of the DNS protocol used behind the scenes. DNS is hierarchical with multiple servers involved, so I may want to see why the resolution happened that way - I can add +trace to the command (although I would also add +nodnssec to avoid seeing too much). Talking of DNSSEC, sometimes you are going to need to check the certificate chain of your DNS query. dig doesn’t quite work for that. While you can use dig +sigchase for this, delv is better. If you say delv name (where name is a signed zone), delv will report “fully validated”, giving you a confidence that things are working. delv is preferred because it works in a way that is much closer to what really happens inside a DNS server. If you are on a Mac, you can install both dig and delv using brew install bind. Linux (and WSL on Windows) can use the ISC package manager downloads. There are also web sites that will do this for you. However, you should download the tools so you can check YOUR resolver. Handling modern protocols - kdig and dog While nslookup and dig are great for standard DNS (which uses TCP or UDP port 53), they aren’t designed for newer encrypted protocols like DNS-over-HTTPS (DoH) and DNS-over-TLS (DoT). DoT operates on port 853 and creates a TLS tunnel for DNS traffic, making it easier for network admins to identify (and potentially block) compared to DoH. kdig is available from the knot-dnsutils package (or knot package for brew) for handling this: $ kdig +tls @1.1.1.1 www.google.com ;; TLS session (TLS1.3)-(ECDHE-X25519)-(ECDSA-SECP256R1-SHA256)-(AES-256-GCM) ;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY; status: NOERROR; id: 34748 ;; Flags: qr rd ra; QUERY: 1; ANSWER: 8; AUTHORITY: 0; ADDITIONAL: 1 ;; EDNS PSEUDOSECTION: ;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR ;; PADDING: 293 B ;; QUESTION SECTION: ;; www.google.com. IN A ;; ANSWER SECTION: www.google.com. 298 IN A 142.251.150.119 www.google.com. 298 IN A 142.251.156.119 www.google.com. 298 IN A 142.251.154.119 www.google.com. 298 IN A 142.251.153.119 www.google.com. 298 IN A 142.251.151.119 www.google.com. 298 IN A 142.251.155.119 www.google.com. 298 IN A 142.251.157.119 www.google.com. 298 IN A 142.251.152.119 ;; Received 468 B ;; Time 2026-04-20 10:04:54 BST ;; From 1.1.1.1@853(TLS) in 97.6 ms Yes, it looks exactly like the output from dig, but it’s use DNS-over-TLS (see the last line for confirmation). You can also use dog, which is a Rust-based DNS client that tends to be much more user-friendly for encrypted queries. While dog is preferred by practitioners for its JSON output, it’s more painful to install. kdig has the advantage of being human readable and installable from the normal package managers you use. DoH (DNS-over-HTTP) wraps DNS queries inside standard HTTP traffic on port 443. This makes it nearly indistinguishable from regular web traffic, which is excellent for bypassing censorship but harder for enterprise monitoring. You can use kdig and dog again: $ kdig @1.1.1.1 +https cloudflare.com ;; TLS session (TLS1.3)-(ECDHE-SECP256R1)-(ECDSA-SECP384R1-SHA384)-(AES-256-GCM) ;; HTTP session (HTTP/2-POST)-(1.1.1.1/dns-query)-(status: 200) ;; -&gt;&gt;HEADER&lt;&lt;- opcode: QUERY; status: NOERROR; id: 0 ;; Flags: qr rd ra ad; QUERY: 1; ANSWER: 2; AUTHORITY: 0; ADDITIONAL: 1 ;; EDNS PSEUDOSECTION: ;; Version: 0; flags: ; UDP size: 1232 B; ext-rcode: NOERROR ;; PADDING: 389 B ;; QUESTION SECTION: ;; cloudflare.com. IN A ;; ANSWER SECTION: cloudflare.com. 153 IN A 104.16.133.229 cloudflare.com. 153 IN A 104.16.132.229 ;; Received 468 B ;; Time 2026-04-20 10:16:30 BST ;; From 1.1.1.1@443(HTTPS) in 128.9 ms I prefer kdig to dog as it is more readily available on more platforms, being part of Knot DNS. Getting to the server - traceroute and mtr One of the common problems you have to solve is “can I get there from here?” - is the resolver (or name server) I am checking reachable for me? For this, there are two tools - traceroute (tracert on windows) and mtr. You will probably have access to traceroute, but mtr may not be available and require to be installed. Traceroute uses ICMP packets with ever increasing time-to-live to plot the route a packet will take to your specified destination. It’s got a basic functionality: $ traceroute -n linux.die.net. traceroute: Warning: linux.die.net. has multiple addresses; using 172.67.69.187 traceroute to linux.die.net (172.67.69.187), 64 hops max, 40 byte packets 1 192.168.1.1 8.735 ms 6.944 ms 7.335 ms 2 212.158.250.39 13.987 ms 13.826 ms 10.548 ms 3 63.130.172.37 11.505 ms 11.517 ms * 4 90.255.251.37 17.378 ms 11.493 ms 14.453 ms 5 162.158.32.9 15.367 ms 14.616 ms 162.158.32.45 13.774 ms 6 172.67.69.187 10.802 ms 14.140 ms 9.756 ms Note the * * * entries. This means that it found a hop, but the router at that hop is not responding to ICMP packets. This is normal on the Internet and not a concern. You can also see that hop 5 went through two different routers - again, relatively common. On the internet, routes may be asymmetric - you may not take the same route back from a destination as you did to get to that destination. Thus, in an ideal world, you would be able to do a traceroute from either end. Unfortunately, it doesn’t work like that. Fortunately, mtr can sort of handle it. mtr combines the functionality of traceroute and ping in a single network diagnostic tool. While traceroute shows a single snapshot, mtr provides rolling statistics which makes it superior for catching intermittent packet loss. $ sudo mtr google.com -c 10 -r Start: 2026-04-21T09:33:36+0100 HOST: DC2K0HQTXH Loss% Snt Last Avg Best Wrst StDev 1.|-- 162.158.73.16 0.0% 10 27.1 13.3 10.1 27.1 5.0 2.|-- 162.158.73.16 0.0% 10 13.2 13.6 11.4 18.3 2.4 3.|-- 104.28.0.0 0.0% 10 11.6 13.1 11.5 16.3 1.7 4.|-- 162.158.73.1 0.0% 10 22.2 18.8 12.1 39.5 8.1 5.|-- 162.158.32.44 0.0% 10 13.5 18.9 12.7 35.2 8.0 6.|-- man-b2-link.ip.twelve99.n 0.0% 10 21.1 18.6 13.8 33.1 5.7 7.|-- dln-b6-link.ip.twelve99.n 0.0% 10 18.9 20.1 18.0 26.1 2.6 8.|-- dln-b3-link.ip.twelve99.n 30.0% 10 16.3 17.0 15.9 20.5 1.7 9.|-- 72.14.243.178 0.0% 10 23.7 18.2 15.9 23.7 2.7 10.|-- lclhrb-in-f139.1e100.net 0.0% 10 28.6 30.5 23.8 65.0 12.4 Notice that there is packet loss at hop 8. This is generally because there is a problem on the return leg. 30% suggests that 3 out of 10 packets didn’t make the round-trip. However, the final hop shows 0% packet loss, so it’s not a problem. You can install mtr using brew on Mac and through the normal Linux package managers and there is a Windows package called WinMTR for you. Note: mtr needs to create raw sockets to send ICMP or UDP packets, which is a privileged operation on most Unix-like systems. You may need to run mtr within sudo, depending on permissions. Use another resolver If you work in an enterprise (or you have a solid home lab), you are probably running your own recursive resolver. What if that is broken? At that point, you will want to do comparisons between the results from your resolver and someone elses resolver. Fortunately, there are plenty of those that you can use for free: Cloudflare has 1.1.1.1 (this is the one I recommend because it’s everywhere you are and privacy focused) Google has 8.8.8.8 and 8.8.4.4 Quad9 has 9.9.9.9, 9.9.9.10, 9.9.9.11 Cisco OpenDNS has 208.67.222.222 and 208.67.220.220 When you are wondering what the Internet sees, target a public resolver instead of your own resolver. Web sites you might want to know One of the common things to watch for is mis-configured DNS. It works - it’s just giving out the wrong information. Here are a few websites I have in my collection: MxToolbox is focused on DNS for email transfer. It will not only do DNS lookups, but it will analyze email headers, see if your SMTP outbound IP is on a blacklist, and perform basic SMTP diagnostics. This is a useful site if your DNS problems stem from an email report. DNSChecker is a basic DNS lookup - much like nslookup, but using someone elses computer. It’s main feature is being able to see whether your change has propagated out to the Internet or not. DNSTools provides access to all the tools that you can run on your local machine - but on someone elses machine. who.is provides access to the registrar information. The main job here is to understand which name servers are authoritative for a specific domain. You also want to have specific tests for handling DNSSEC: Verisign Labs provides a DNSSEC domain debugger. DNSViz visualizes the status of a DNS zone, explicitly providing a visual analysis of the DNSSEC chain. DNS CAA Tester let’s you view the certificate authority authorization (CAA) embedded in your DNS records, which let’s you specify which certificate authorities are allows to issue certificates for the domains you own. Final thoughts It’s always DNS, but it doesn’t have to be. With these tools in your pocket, you can quickly and easily determine if the problem is actually DNS. You still need to “learn DNS”, but there are lots of resources for that. Make sure the tools are available on all the machines you use to diagnose issues and that you have practiced how to use them before they are needed. You need to know what “good” looks like before you can determine if the current state is good or bad. DNS will never be problematic again with these tools, DNS know-how, and basic troubleshooting skills.]]></summary></entry><entry><title type="html">Setting up a project for Agentic AI</title><link href="https://adrianhall.github.io/posts/2026/2026-04-19-project-setup.html" rel="alternate" type="text/html" title="Setting up a project for Agentic AI" /><published>2026-04-19T00:00:00-07:00</published><updated>2026-04-19T00:00:00-07:00</updated><id>https://adrianhall.github.io/posts/2026/project-setup</id><content type="html" xml:base="https://adrianhall.github.io/posts/2026/2026-04-19-project-setup.html"><![CDATA[<p>In my last three articles, I produced a set of recipes and <a href="/posts/2026/2026-04-07-agentic-engineer-1.html">rules for doing AI-first development using Spec-Driven Design</a>.  These rules really get you started in how to think about development when the coding (the easy part) is done for you.  But how do you get started?</p>

<p><img src="/assets/images/2026/Apr19-banner.png" alt="Project Scaffolding for Agentic Development" /></p>

<p>When I am setting up a new project, my process has now changed.  It starts off the same, but the additional files I need to write before I can start coding have changed.  Here is what I do:</p>

<h2 id="1-scaffolding">1. Scaffolding</h2>

<p>Inevitably, I have some clue as to what framework or platform I am going to build on top of.  These days, I am doing a lot of development on top of the <a href="https://www.cloudflare.com/developer-platform/">Cloudflare Dev Platform</a>.  Disclaimer: I now work for Cloudflare, so you can see how this is my go to platform.  My project scaffolding is inevitably the same starting point:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>npm create cloudflare@latest <span class="nt">--</span> my-project
<span class="nb">cd </span>my-project
</code></pre></div></div>

<p>This gives me a whole slew of files.  If you start with NextJS, ASP.NET Core, or Spring Boot - the effect is the same.  You go to your repo storage directory, scaffold the app, and then change directory into the created project directory.  This is the bit that has not changed.</p>

<h2 id="2-create-the-constitution">2. Create the constitution</h2>

<p>The constitution is a document describing the “rules” for developing your application. I’ve <a href="/posts/2026/2026-04-11-agentic-engineer-3.html">got a process</a> for creating this for any project. I place mine in <code class="language-plaintext highlighter-rouge">.spec/CONSTITUTION.md</code> and write it in Markdown.  For the Cloudflare Dev Platform, this is:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># The Cloudflare Platform Constitution</span>

These are the rules that you <span class="gs">**MUST**</span> follow for developing on the Cloudflare Platform.

<span class="gu">## 1. Edge-first, Always</span>

The primary goal is to minimize the distance between the user and the logic.
<span class="p">
*</span> <span class="gs">**Principle**</span>: if it can be done at the edge, it _must_ be done at the edge.
<span class="p">*</span> <span class="gs">**Mandate**</span>: Avoid "hairpinning" requests to centralized legacy origins (like an RFS database on AWS) unless absolutely necessary.  Use <span class="gs">**Hyperdrive**</span> or <span class="gs">**Durable Objects**</span> to manage connection overhead or state.

<span class="gu">## 2. Isolate-Native Design (Statelessness)</span>

Cloudflare Workers are built on V8 isolates, which are spun up and down instantly.
<span class="p">
*</span> <span class="gs">**Principle**</span>: Design for zero cold starts and ephemeral lifecycles.
<span class="p">*</span> <span class="gs">**Mandate**</span>: Never rely on global mutable state in a Worker script.  Every request should be treated as a fresh execution.  Use <span class="gs">**Workers KV**</span> for eventual consistency or <span class="gs">**Durable Objects**</span> for strong consistency.

<span class="gu">## 3. Binding over Requesting</span>

Cloudflare’s internal bus is faster than the public internet.
<span class="p">
*</span> <span class="gs">**Principle**</span>: Prefer internal "bindings" over external REST/HTTP calls.
<span class="p">*</span> <span class="gs">**Mandate**</span>: Services within the platform should communicate via Service Bindings. Accessing storage should use direct bindings to R2, D1, or KV, avoiding the overhead of authentication and network hops required by external APIs.

<span class="gu">## 4. Performance as a Functional Requirement</span>

On the edge, a 100ms delay is a failure.
<span class="p">
*</span> <span class="gs">**Principle**</span>: Latency is a bug.
<span class="p">*</span> <span class="gs">**Mandate**</span>: All Workers must stream responses using TransformStream to ensure a low Time to First Byte (TTFB). Large payloads must be streamed, not buffered in the 128MB memory limit.

<span class="gu">## 5. Type-Safe Contracts</span>

With a distributed architecture, small mismatches lead to global outages.
<span class="p">
*</span> <span class="gs">**Principle**</span>: End-to-end type safety is non-negotiable.
<span class="p">*</span> <span class="gs">**Mandate**</span>: 
<span class="p">  *</span> Use TypeScript with strict: true across all projects.
<span class="p">  *</span> Generate binding types automatically using wrangler types.
<span class="p">  *</span> Use Hono or similar lightweight, type-safe frameworks for routing.

<span class="gu">## 6. Observability by Default</span>

You cannot SSH into an isolate; if you can't see it, it doesn't exist.
<span class="p">
*</span> <span class="gs">**Principle**</span>: No code reaches production without structured tracing.
<span class="p">*</span> <span class="gs">**Mandate**</span>: Every Worker must have Workers Logs and Tail enabled. Export traces to a centralized provider (like Honeycomb or Sentry) using OpenTelemetry.

<span class="gu">## 7. Versioning via Compatibility Dates</span>

The platform evolves, but the code should not break.
<span class="p">
*</span> <span class="gs">**Principle**</span>: Stability is maintained through "Compatibility Dates."
<span class="p">*</span> <span class="gs">**Mandate**</span>: 
<span class="p">  *</span> Workers must specify a compatibility_date in wrangler.jsonc. 
<span class="p">  *</span> Updating this date is a breaking change that requires a full regression test.
</code></pre></div></div>

<p>You may have additional rules.  These rules are the “must-haves” for the project.  I’ve started collecting constitutions that I’ve used - I’ve got one for an ASP.NET Core application, one for an iPhone app, and so on.</p>

<h2 id="3-set-up-opencode">3. Set up OpenCode</h2>

<p>I use <a href="https://opencode.school/">OpenCode</a> these days, so my first stop is to set up OpenCode.  I’ve got a “standard” <code class="language-plaintext highlighter-rouge">opencode.jsonc</code> file for each project type that provides default permissions.  For example, I encode everything I do in a <code class="language-plaintext highlighter-rouge">package.json</code> file so most things (like deployment, builds, tests) are codified there.</p>

<p>My standard config looks like this:</p>

<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
  </span><span class="nl">"$schema"</span><span class="p">:</span><span class="w"> </span><span class="s2">"https://opencode.ai/config.json"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"autoupdate"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
  </span><span class="nl">"default_agent"</span><span class="p">:</span><span class="w"> </span><span class="s2">"plan"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"small_model"</span><span class="p">:</span><span class="w"> </span><span class="s2">"anthropic/claude-haiku-4.6"</span><span class="p">,</span><span class="w">
  </span><span class="nl">"compaction"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"auto"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w">
    </span><span class="nl">"prune"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
  </span><span class="p">},</span><span class="w">

  </span><span class="err">//</span><span class="w"> </span><span class="err">MCP</span><span class="w"> </span><span class="err">Servers</span><span class="w">
  </span><span class="nl">"mcp"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="err">//</span><span class="w"> </span><span class="err">Put</span><span class="w"> </span><span class="err">your</span><span class="w"> </span><span class="err">MCP</span><span class="w"> </span><span class="err">servers</span><span class="w"> </span><span class="err">here</span><span class="w">
  </span><span class="p">},</span><span class="w">

  </span><span class="err">//</span><span class="w"> </span><span class="err">Global</span><span class="w"> </span><span class="err">permissions:</span><span class="w"> </span><span class="err">permissive</span><span class="w"> </span><span class="err">reads</span><span class="p">,</span><span class="w"> </span><span class="err">ask</span><span class="w"> </span><span class="err">for</span><span class="w"> </span><span class="err">writes</span><span class="w"> </span><span class="err">and</span><span class="w"> </span><span class="err">bash</span><span class="w">
  </span><span class="nl">"permission"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"read"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"glob"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"grep"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"edit"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ask"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"bash"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ask"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"skill"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"task"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
    </span><span class="nl">"webfetch"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="w">
  </span><span class="p">},</span><span class="w">

  </span><span class="err">//</span><span class="w"> </span><span class="err">Agent</span><span class="w"> </span><span class="err">specific</span><span class="w"> </span><span class="err">permissions</span><span class="w">
  </span><span class="nl">"agent"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">"plan"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"model"</span><span class="p">:</span><span class="w"> </span><span class="s2">"anthropic/claude-opus-4-7"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"effort"</span><span class="p">:</span><span class="w"> </span><span class="s2">"xhigh"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"permission"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"read"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"webfetch"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"glob"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"grep"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"edit"</span><span class="p">:</span><span class="w"> </span><span class="s2">"deny"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"bash"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
          </span><span class="nl">"*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"deny"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"wc *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"cat *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"echo *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"find *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"grep *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"ls *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"head *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"tail *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"which *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git status*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git log*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git diff*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"npm run *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="w"> 
        </span><span class="p">}</span><span class="w">
      </span><span class="p">}</span><span class="w">
    </span><span class="p">},</span><span class="w">
    </span><span class="nl">"build"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
      </span><span class="nl">"model"</span><span class="p">:</span><span class="w"> </span><span class="s2">"anthropic/claude-sonnet-4-6"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"effort"</span><span class="p">:</span><span class="w"> </span><span class="s2">"max"</span><span class="p">,</span><span class="w">
      </span><span class="nl">"permission"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="nl">"read"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"webfetch"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"glob"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"grep"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"edit"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
        </span><span class="nl">"bash"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
          </span><span class="nl">"*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"ask"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"mkdir *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"cp *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git checkout*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git branch*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git add*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git commit*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git status*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git log*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git diff*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"npm install*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"npm ci*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"npm run *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"npx *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"wc *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"cat *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"echo *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"find *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"grep *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"ls *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"head *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"tail *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"which *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"allow"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git push*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"deny"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git rebase*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"deny"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"git reset --hard*"</span><span class="p">:</span><span class="w"> </span><span class="s2">"deny"</span><span class="p">,</span><span class="w">
          </span><span class="nl">"rm -rf *"</span><span class="p">:</span><span class="w"> </span><span class="s2">"deny"</span><span class="w">
        </span><span class="p">}</span><span class="w">
      </span><span class="p">}</span><span class="w">      
    </span><span class="p">}</span><span class="w">
  </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>

<p>I pre-define the foreseeable commands may want to use.  This allows you to concentrate your efforts on the commands that matter and ensures you don’t get “confirmation fatigue” where you just confirm that the LLM is allowed to do something without considering it.  The permission list is an ever-growing list for me.  When an LLM asks me about a permission, I’ll think about whether I want to be asked every time or not.  If I would just approve it anyway, I add it to the permissions block.</p>

<p>The other thing to note is the models:</p>

<ul>
  <li>A good thinking model for planning - I’m using the latest Claude Opus model all the time.</li>
  <li>A good doing model for building - I’m using either Claude Sonnet or GPT Codex, and trial these as new versions come out.</li>
  <li>A “small model” for doing things like generating titles.</li>
</ul>

<h2 id="4-set-up-a-code-reviewer-agent">4. Set up a Code Reviewer Agent</h2>

<p>My personal setup uses sub-agents to parallelize things.  Since I am using Claude Sonnet for coding, I am NOT going to use it for code review.  Right now, I’m using OpenAI GPT-5.4 and Gemini 3.1 Pro Preview as code review models.  I also have a separate security reviewer.</p>

<p>Agents are defined in Markdown files within <code class="language-plaintext highlighter-rouge">.opencode/agents</code>.  For example, here is my <code class="language-plaintext highlighter-rouge">code-reviewer-1.md</code> file:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Reviews code for quality, correctness, and best practices</span>
<span class="na">mode</span><span class="pi">:</span> <span class="s">subagent</span>
<span class="na">model</span><span class="pi">:</span> <span class="s">openai/gpt-5.4</span>
<span class="na">temperature</span><span class="pi">:</span> <span class="m">0.1</span>
<span class="na">color</span><span class="pi">:</span> <span class="s">accent</span>
<span class="na">permission</span><span class="pi">:</span>
  <span class="na">edit</span><span class="pi">:</span> <span class="s">deny</span>
  <span class="na">bash</span><span class="pi">:</span>
    <span class="s2">"</span><span class="s">*"</span><span class="err">:</span> <span class="s">deny</span>
    <span class="s">"git diff*"</span><span class="err">:</span> <span class="s">allow</span>
    <span class="s">"git log*"</span><span class="err">:</span> <span class="s">allow</span>
    <span class="s">"git show*"</span><span class="err">:</span> <span class="s">allow</span>
  <span class="na">webfetch</span><span class="pi">:</span> <span class="s">deny</span>
<span class="nn">---</span>

You are Code Reviewer #1 for the Ensemble project, a multi-agent coding orchestration system built on the Cloudflare developer platform.

Before reviewing, read AGENTS.md to understand the project conventions.

Focus your review on:
<span class="p">
-</span> <span class="gs">**Correctness**</span>: Does the code do what it claims? Are there logic errors or off-by-one mistakes?
<span class="p">-</span> <span class="gs">**TypeScript quality**</span>: Proper typing, no <span class="sb">`any`</span> escapes, correct use of generics and utility types.
<span class="p">-</span> <span class="gs">**Cloudflare Workers patterns**</span>: Correct use of Durable Object RPC, Artifacts bindings, AI Gateway calls, WebSocket Hibernation API.
<span class="p">-</span> <span class="gs">**Error handling**</span>: Are failures handled gracefully? Are errors propagated correctly across DO RPC boundaries?
<span class="p">-</span> <span class="gs">**Naming and clarity**</span>: Do names match the Ensemble conventions (not "Squad")? Is the code self-documenting?
<span class="p">-</span> <span class="gs">**Testing**</span>: Are edge cases covered? Are Durable Objects tested via RPC? Are AI Gateway responses mocked?

Provide specific, actionable feedback with file paths and line numbers. Do not make changes directly.
</code></pre></div></div>

<p>These are short and to-the-point.  Note that this is a subagent.  It isn’t used directly.  Let’s look at my actual code reviewer definition:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nn">---</span>
<span class="na">description</span><span class="pi">:</span> <span class="s">Runs all code and security reviews in parallel, then synthesizes findings</span>
<span class="na">mode</span><span class="pi">:</span> <span class="s">primary</span>
<span class="na">model</span><span class="pi">:</span> <span class="s">anthropic/claude-sonnet-4-6</span>
<span class="na">variant</span><span class="pi">:</span> <span class="s">max</span>
<span class="na">temperature</span><span class="pi">:</span> <span class="m">0.1</span>
<span class="na">color</span><span class="pi">:</span> <span class="s2">"</span><span class="s">#e06c75"</span>
<span class="na">permission</span><span class="pi">:</span>
  <span class="na">edit</span><span class="pi">:</span> <span class="s">deny</span>
  <span class="na">bash</span><span class="pi">:</span>
    <span class="s2">"</span><span class="s">*"</span><span class="err">:</span> <span class="s">deny</span>
    <span class="s">"git diff*"</span><span class="err">:</span> <span class="s">allow</span>
    <span class="s">"git log*"</span><span class="err">:</span> <span class="s">allow</span>
    <span class="s">"git show*"</span><span class="err">:</span> <span class="s">allow</span>
    <span class="s">"git status*"</span><span class="err">:</span> <span class="s">allow</span>
  <span class="na">task</span><span class="pi">:</span>
    <span class="s2">"</span><span class="s">*"</span><span class="err">:</span> <span class="s">deny</span>
    <span class="s">"code-reviewer-1"</span><span class="err">:</span> <span class="s">allow</span>
    <span class="s">"code-reviewer-2"</span><span class="err">:</span> <span class="s">allow</span>
    <span class="s">"security-reviewer"</span><span class="err">:</span> <span class="s">allow</span>
<span class="nn">---</span>

You are the Review Coordinator for the Ensemble project.

Your job is to orchestrate a thorough multi-perspective code review by delegating to three specialized reviewers <span class="gs">**in parallel**</span>, then synthesizing their findings into a single actionable report.

<span class="gu">## Workflow</span>
<span class="p">
1.</span> <span class="gs">**Understand the scope.**</span> Look at the changes the user wants reviewed. Use <span class="sb">`git diff`</span>, <span class="sb">`git log`</span>, or <span class="sb">`git status`</span> to understand what has changed. If the user points you at specific files, read those.
<span class="p">
2.</span> <span class="gs">**Delegate to all three reviewers simultaneously.**</span> Always launch all three as parallel tasks using the Task tool:
<span class="p">   -</span> <span class="sb">`@code-reviewer-1`</span> -- Reviews correctness, TypeScript quality, Workers patterns, error handling, testing
<span class="p">   -</span> <span class="sb">`@code-reviewer-2`</span> -- Reviews architecture alignment, state management, scalability, API design, dependency hygiene
<span class="p">   -</span> <span class="sb">`@security-reviewer`</span> -- Reviews security: webhook verification, sandboxing, secrets, prompt injection, access control

   Give each reviewer the same context: which files changed, what the purpose of the change is, and any relevant background from the spec.
<span class="p">
3.</span> <span class="gs">**Synthesize the results.**</span> After all three reviewers complete, produce a unified review report:

   ### Report Format

   <span class="gs">**Summary**</span>: One paragraph overview of the review findings.

   <span class="gs">**Critical/High Issues**</span> (must fix before merge):
<span class="p">   -</span> List each issue with: source reviewer, file:line, description, suggested fix

   <span class="gs">**Medium Issues**</span> (should fix):
<span class="p">   -</span> Same format

   <span class="gs">**Low Issues / Suggestions**</span> (nice to have):
<span class="p">   -</span> Same format

   <span class="gs">**Consensus**</span>: Note where multiple reviewers flagged the same concern (these deserve extra attention).

   <span class="gs">**Verdict**</span>: APPROVE, REQUEST CHANGES, or NEEDS DISCUSSION -- with a brief rationale.

<span class="gu">## Rules</span>
<span class="p">
-</span> Always delegate to all three reviewers. Never skip one.
<span class="p">-</span> Always run the three reviews in parallel, not sequentially.
<span class="p">-</span> Do not make code changes yourself. You are read-only.
<span class="p">-</span> If reviewers disagree, present both perspectives and let the developer decide.
</code></pre></div></div>

<p>It runs the subagents I’ve defined in parallel, then gives me a solid report that can be actioned.</p>

<h2 id="5-build-skills">5. Build Skills</h2>

<p>Now that you’ve got your agents set up, you are ready to go, right?  Not so fast.  You probably need to write a few skills. A <strong>skill</strong> is a modular bridge that connects the LLM to your codebase and tools.  The LLM knows <em>how</em> to write code whereas the skill gives it the <em>authority</em> and <em>instruction manual</em> to perform a specific task.  Skills should be atomic (it should do one thing well), type-safe (it should explicitly define what data it expects and what it returns), and verbosely documented.</p>

<p>Skills are placed in <code class="language-plaintext highlighter-rouge">.opencode/skills/skill-name/SKILL.md</code> and written in Markdown.  You cn learn more about agent skills from <a href="https://agentskills.io/home">Anthropic</a> or <a href="https://opencode.school/lessons/skills/">OpenCode School</a>.  You can find examples at <a href="https://officialskills.sh">officialskills.sh</a>.</p>

<p>The good news is you don’t have to start from nothing.  You can add skills using <code class="language-plaintext highlighter-rouge">npx skills</code>:</p>

<ul>
  <li>Cloudflare: <code class="language-plaintext highlighter-rouge">npx skills add https://github.com/cloudflare/skills</code></li>
  <li>Replicate: <code class="language-plaintext highlighter-rouge">npx skills add replicate/skills</code></li>
  <li>Frontend Design: <code class="language-plaintext highlighter-rouge">npx skills add https://github.com/anthropics/skills --skill frontend-design</code></li>
</ul>

<p>You can also find skills on GitHub quite readily.  One skill I tend to come back to often is “how should the LLM handle a GitHub issue?”  I’ve got what’s known as a “Workflow skill” that does my process:</p>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gh"># Skill: GitHub Issue to PR Orchestrator</span>

<span class="gu">## Description</span>
A high-level workflow skill that automates the lifecycle of a feature or bug fix: from issue ingestion to Pull Request creation, using Git worktrees for environment isolation.

<span class="gu">## Context &amp; Constraints</span>
<span class="p">-</span> <span class="gs">**Platform:**</span> Cloudflare Dev Platform.
<span class="p">-</span> <span class="gs">**Environment:**</span> Requires <span class="sb">`gh`</span> (GitHub CLI) and <span class="sb">`git`</span> installed and authenticated.
<span class="p">-</span> <span class="gs">**Isolation:**</span> Always use <span class="sb">`git worktree`</span> to prevent polluting the main development branch or losing uncommitted local work.
<span class="p">-</span> <span class="gs">**Quality Gate:**</span> This skill depends on the <span class="sb">`quality-gate`</span> skill.

<span class="gu">## Workflow Steps</span>

<span class="gu">### 1. Ingestion</span>
<span class="p">-</span> <span class="gs">**Command:**</span> <span class="sb">`gh issue view &lt;issue_number&gt; --json title,body,labels`</span>
<span class="p">-</span> <span class="gs">**Action:**</span> Summarize the requirement. If the issue is a bug, look for reproduction steps.

<span class="gu">### 2. Planning</span>
<span class="p">-</span> <span class="gs">**Action:**</span> Before writing code, output a <span class="sb">`PLAN.md`</span> in the root (temporary).
<span class="p">-</span> <span class="gs">**Review:**</span> Wait for a momentary internal check: Does this plan violate the <span class="p">[</span><span class="nv">Cloudflare Constitution</span><span class="p">](</span><span class="sx">.spec/CONSTITUTION.md</span><span class="p">)</span>?

<span class="gu">### 3. Environment Setup (The Worktree)</span>
<span class="p">-</span> <span class="gs">**Branch Naming:**</span> <span class="sb">`feat/issue-&lt;number&gt;`</span> or <span class="sb">`fix/issue-&lt;number&gt;`</span>.
<span class="p">-</span> <span class="gs">**Command:**</span> 
<span class="p">  -</span> <span class="sb">`git worktree add ~/worktrees/issue-&lt;number&gt; -b &lt;branch_name&gt;`</span>
<span class="p">  -</span> <span class="sb">`cd ~/worktrees/issue-&lt;number&gt;`</span>
<span class="p">  -</span> <span class="sb">`npm install`</span>

<span class="gu">## 4. Implementation</span>
<span class="p">-</span> <span class="gs">**Action**</span>: Perform the code changes as outlined in the <span class="sb">`PLAN.md`</span>.
<span class="p">-</span> <span class="gs">**Constraint**</span>: Follow the <span class="sb">`AGENTS.md`</span> rules (Hono, Drizzle, etc.).

<span class="gu">## 5. Quality Gate</span>
<span class="p">-</span> <span class="gs">**Execution**</span>: Trigger <span class="sb">`invoke-skill("quality-gate")`</span>.
<span class="p">-</span> <span class="gs">**Requirement**</span>:
<span class="p">  -</span> <span class="sb">`npm run quality-gate`</span> must pass
<span class="p">  -</span> <span class="sb">`npm run test:coverage`</span> must pass with 80% code coverage.
<span class="p">-</span> <span class="gs">**Rollback**</span>:
<span class="p">  -</span> If the gate fails and cannot be fixed in 3 iterations, stop and report.

<span class="gu">## 6. Commit &amp; PR</span>
<span class="p">-</span> <span class="gs">**Commit Style**</span>: Conventional Commits (e.g. <span class="sb">`feat(worker): add auth middleware (closes #&lt;number&gt;)`</span>)
<span class="p">-</span> <span class="gs">**Command**</span>:
<span class="p">  -</span> <span class="sb">`git add -A`</span>
<span class="p">  -</span> <span class="sb">`git commit -m "&lt;message&gt;"`</span>
<span class="p">  -</span> <span class="sb">`git push -u origin &lt;branch-name&gt;`</span>
<span class="p">  -</span> <span class="sb">`gh pr create --title "&lt;issue-title&gt;" --body "Closes #&lt;number&gt;.  &lt;commit-message&gt;"`</span>

<span class="gu">## 7. Cleanup</span>
<span class="p">-</span> <span class="sb">`cd &lt;original-dir&gt;`</span>

Do not remove the worktree.  This will be done separately.

<span class="gu">## Error handling</span>
<span class="p">
-</span> If <span class="sb">`gh`</span> is not authenticated, stop and request the user to run <span class="sb">`gh auth login`</span>.
<span class="p">-</span> If a worktree already exists for that issue, ask to resume or delete.
</code></pre></div></div>

<p>I can now provide the following prompt: <code class="language-plaintext highlighter-rouge">Use skill github-issue to implement issue 1234</code>.  The LLM will then use this skill to run the workflow - basically, your entire software development lifecycle for issues - in a separated worktree.  You can run a good terminal multiplexor (like <a href="https://tmux.info/">tmux</a>) to run multiple OpenCode sessions, allowing concurrent development to take place.</p>

<h2 id="6-set-up-agentsmd">6. Set up AGENTS.md</h2>

<p>If you are using Claude Code, this will be <code class="language-plaintext highlighter-rouge">CLAUDE.md</code> instead.  When an AI coding assistant opens your repository, it has general knowledge of coding, but it lacks the specific context of <em>your</em> project.  An <code class="language-plaintext highlighter-rouge">AGENTS.md</code> file serves three primary purposes:</p>

<ul>
  <li><strong>Setting boundaries</strong>:  Along with the <code class="language-plaintext highlighter-rouge">CONSTITUTION.md</code>, it explicitly tells the AI what <em>not</em> to do.  (e.g. “Do not use Node.js core modules like <code class="language-plaintext highlighter-rouge">fs</code> or <code class="language-plaintext highlighter-rouge">path</code> because this runs in a V8 isolate”).</li>
  <li><strong>Defining the stack</strong>: It prevents hallucinations by declaring your exact tools.  I normally include a “library list” of allowed and denied libraries complete with versions. (e.g.”Use Drizzle ORM with Cloudflare D1, not Prisma”).</li>
  <li><strong>Standardizing Workflows</strong>: It tells the AI how to test, deploy, and format code so its output matches your human development standards.</li>
</ul>

<p>I generally start with an AI assisted version:</p>

<blockquote>
  <p>Read my <code class="language-plaintext highlighter-rouge">package.json</code>, <code class="language-plaintext highlighter-rouge">wrangler.jsonc</code>, and <code class="language-plaintext highlighter-rouge">tsconfig.json</code>.  Based on these files, generate a comprehensive <code class="language-plaintext highlighter-rouge">AGENTS.md</code> file in the root directory.  Include sections for our specific stack context, hard constraints from the <code class="language-plaintext highlighter-rouge">.spec/CONSTITUTION.md</code>, database patterns, and preferred routing framework.</p>
</blockquote>

<p>Once you have this, you can iterate whenever your AI makes a contextual mistake.  For example, if it tries to install <code class="language-plaintext highlighter-rouge">axios</code> instead of using the native <code class="language-plaintext highlighter-rouge">fetch</code> API, you can correct the AI.  Again, there are <a href="https://github.com/search?q=path%3AAGENTS.md+NOT+is%3Afork+NOT+is%3Aarchived&amp;type=code">lots of examples on GitHub</a> that you can use as a starting point.</p>

<h2 id="final-thoughts">Final thoughts</h2>

<p>Congratulations.  If you made it through all that, you are now ready to develop your new project or work on your existing project using OpenCode and an LLM coding assistant.  It may seem a lot, but the time you invest up front will prevent hallucination and make your code more maintainable.</p>

<p>Happy coding!</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Devtools" /><category term="ai" /><summary type="html"><![CDATA[In my last three articles, I produced a set of recipes and rules for doing AI-first development using Spec-Driven Design. These rules really get you started in how to think about development when the coding (the easy part) is done for you. But how do you get started? When I am setting up a new project, my process has now changed. It starts off the same, but the additional files I need to write before I can start coding have changed. Here is what I do: 1. Scaffolding Inevitably, I have some clue as to what framework or platform I am going to build on top of. These days, I am doing a lot of development on top of the Cloudflare Dev Platform. Disclaimer: I now work for Cloudflare, so you can see how this is my go to platform. My project scaffolding is inevitably the same starting point: npm create cloudflare@latest -- my-project cd my-project This gives me a whole slew of files. If you start with NextJS, ASP.NET Core, or Spring Boot - the effect is the same. You go to your repo storage directory, scaffold the app, and then change directory into the created project directory. This is the bit that has not changed. 2. Create the constitution The constitution is a document describing the “rules” for developing your application. I’ve got a process for creating this for any project. I place mine in .spec/CONSTITUTION.md and write it in Markdown. For the Cloudflare Dev Platform, this is: # The Cloudflare Platform Constitution These are the rules that you **MUST** follow for developing on the Cloudflare Platform. ## 1. Edge-first, Always The primary goal is to minimize the distance between the user and the logic. * **Principle**: if it can be done at the edge, it _must_ be done at the edge. * **Mandate**: Avoid "hairpinning" requests to centralized legacy origins (like an RFS database on AWS) unless absolutely necessary. Use **Hyperdrive** or **Durable Objects** to manage connection overhead or state. ## 2. Isolate-Native Design (Statelessness) Cloudflare Workers are built on V8 isolates, which are spun up and down instantly. * **Principle**: Design for zero cold starts and ephemeral lifecycles. * **Mandate**: Never rely on global mutable state in a Worker script. Every request should be treated as a fresh execution. Use **Workers KV** for eventual consistency or **Durable Objects** for strong consistency. ## 3. Binding over Requesting Cloudflare’s internal bus is faster than the public internet. * **Principle**: Prefer internal "bindings" over external REST/HTTP calls. * **Mandate**: Services within the platform should communicate via Service Bindings. Accessing storage should use direct bindings to R2, D1, or KV, avoiding the overhead of authentication and network hops required by external APIs. ## 4. Performance as a Functional Requirement On the edge, a 100ms delay is a failure. * **Principle**: Latency is a bug. * **Mandate**: All Workers must stream responses using TransformStream to ensure a low Time to First Byte (TTFB). Large payloads must be streamed, not buffered in the 128MB memory limit. ## 5. Type-Safe Contracts With a distributed architecture, small mismatches lead to global outages. * **Principle**: End-to-end type safety is non-negotiable. * **Mandate**: * Use TypeScript with strict: true across all projects. * Generate binding types automatically using wrangler types. * Use Hono or similar lightweight, type-safe frameworks for routing. ## 6. Observability by Default You cannot SSH into an isolate; if you can't see it, it doesn't exist. * **Principle**: No code reaches production without structured tracing. * **Mandate**: Every Worker must have Workers Logs and Tail enabled. Export traces to a centralized provider (like Honeycomb or Sentry) using OpenTelemetry. ## 7. Versioning via Compatibility Dates The platform evolves, but the code should not break. * **Principle**: Stability is maintained through "Compatibility Dates." * **Mandate**: * Workers must specify a compatibility_date in wrangler.jsonc. * Updating this date is a breaking change that requires a full regression test. You may have additional rules. These rules are the “must-haves” for the project. I’ve started collecting constitutions that I’ve used - I’ve got one for an ASP.NET Core application, one for an iPhone app, and so on. 3. Set up OpenCode I use OpenCode these days, so my first stop is to set up OpenCode. I’ve got a “standard” opencode.jsonc file for each project type that provides default permissions. For example, I encode everything I do in a package.json file so most things (like deployment, builds, tests) are codified there. My standard config looks like this: { "$schema": "https://opencode.ai/config.json", "autoupdate": true, "default_agent": "plan", "small_model": "anthropic/claude-haiku-4.6", "compaction": { "auto": true, "prune": true }, // MCP Servers "mcp": { // Put your MCP servers here }, // Global permissions: permissive reads, ask for writes and bash "permission": { "read": "allow", "glob": "allow", "grep": "allow", "edit": "ask", "bash": "ask", "skill": "allow", "task": "allow", "webfetch": "allow" }, // Agent specific permissions "agent": { "plan": { "model": "anthropic/claude-opus-4-7", "effort": "xhigh", "permission": { "read": "allow", "webfetch": "allow", "glob": "allow", "grep": "allow", "edit": "deny", "bash": { "*": "deny", "wc *": "allow", "cat *": "allow", "echo *": "allow", "find *": "allow", "grep *": "allow", "ls *": "allow", "head *": "allow", "tail *": "allow", "which *": "allow", "git status*": "allow", "git log*": "allow", "git diff*": "allow", "npm run *": "allow" } } }, "build": { "model": "anthropic/claude-sonnet-4-6", "effort": "max", "permission": { "read": "allow", "webfetch": "allow", "glob": "allow", "grep": "allow", "edit": "allow", "bash": { "*": "ask", "mkdir *": "allow", "cp *": "allow", "git checkout*": "allow", "git branch*": "allow", "git add*": "allow", "git commit*": "allow", "git status*": "allow", "git log*": "allow", "git diff*": "allow", "npm install*": "allow", "npm ci*": "allow", "npm run *": "allow", "npx *": "allow", "wc *": "allow", "cat *": "allow", "echo *": "allow", "find *": "allow", "grep *": "allow", "ls *": "allow", "head *": "allow", "tail *": "allow", "which *": "allow", "git push*": "deny", "git rebase*": "deny", "git reset --hard*": "deny", "rm -rf *": "deny" } } } } } I pre-define the foreseeable commands may want to use. This allows you to concentrate your efforts on the commands that matter and ensures you don’t get “confirmation fatigue” where you just confirm that the LLM is allowed to do something without considering it. The permission list is an ever-growing list for me. When an LLM asks me about a permission, I’ll think about whether I want to be asked every time or not. If I would just approve it anyway, I add it to the permissions block. The other thing to note is the models: A good thinking model for planning - I’m using the latest Claude Opus model all the time. A good doing model for building - I’m using either Claude Sonnet or GPT Codex, and trial these as new versions come out. A “small model” for doing things like generating titles. 4. Set up a Code Reviewer Agent My personal setup uses sub-agents to parallelize things. Since I am using Claude Sonnet for coding, I am NOT going to use it for code review. Right now, I’m using OpenAI GPT-5.4 and Gemini 3.1 Pro Preview as code review models. I also have a separate security reviewer. Agents are defined in Markdown files within .opencode/agents. For example, here is my code-reviewer-1.md file: --- description: Reviews code for quality, correctness, and best practices mode: subagent model: openai/gpt-5.4 temperature: 0.1 color: accent permission: edit: deny bash: "*": deny "git diff*": allow "git log*": allow "git show*": allow webfetch: deny --- You are Code Reviewer #1 for the Ensemble project, a multi-agent coding orchestration system built on the Cloudflare developer platform. Before reviewing, read AGENTS.md to understand the project conventions. Focus your review on: - **Correctness**: Does the code do what it claims? Are there logic errors or off-by-one mistakes? - **TypeScript quality**: Proper typing, no `any` escapes, correct use of generics and utility types. - **Cloudflare Workers patterns**: Correct use of Durable Object RPC, Artifacts bindings, AI Gateway calls, WebSocket Hibernation API. - **Error handling**: Are failures handled gracefully? Are errors propagated correctly across DO RPC boundaries? - **Naming and clarity**: Do names match the Ensemble conventions (not "Squad")? Is the code self-documenting? - **Testing**: Are edge cases covered? Are Durable Objects tested via RPC? Are AI Gateway responses mocked? Provide specific, actionable feedback with file paths and line numbers. Do not make changes directly. These are short and to-the-point. Note that this is a subagent. It isn’t used directly. Let’s look at my actual code reviewer definition: --- description: Runs all code and security reviews in parallel, then synthesizes findings mode: primary model: anthropic/claude-sonnet-4-6 variant: max temperature: 0.1 color: "#e06c75" permission: edit: deny bash: "*": deny "git diff*": allow "git log*": allow "git show*": allow "git status*": allow task: "*": deny "code-reviewer-1": allow "code-reviewer-2": allow "security-reviewer": allow --- You are the Review Coordinator for the Ensemble project. Your job is to orchestrate a thorough multi-perspective code review by delegating to three specialized reviewers **in parallel**, then synthesizing their findings into a single actionable report. ## Workflow 1. **Understand the scope.** Look at the changes the user wants reviewed. Use `git diff`, `git log`, or `git status` to understand what has changed. If the user points you at specific files, read those. 2. **Delegate to all three reviewers simultaneously.** Always launch all three as parallel tasks using the Task tool: - `@code-reviewer-1` -- Reviews correctness, TypeScript quality, Workers patterns, error handling, testing - `@code-reviewer-2` -- Reviews architecture alignment, state management, scalability, API design, dependency hygiene - `@security-reviewer` -- Reviews security: webhook verification, sandboxing, secrets, prompt injection, access control Give each reviewer the same context: which files changed, what the purpose of the change is, and any relevant background from the spec. 3. **Synthesize the results.** After all three reviewers complete, produce a unified review report: ### Report Format **Summary**: One paragraph overview of the review findings. **Critical/High Issues** (must fix before merge): - List each issue with: source reviewer, file:line, description, suggested fix **Medium Issues** (should fix): - Same format **Low Issues / Suggestions** (nice to have): - Same format **Consensus**: Note where multiple reviewers flagged the same concern (these deserve extra attention). **Verdict**: APPROVE, REQUEST CHANGES, or NEEDS DISCUSSION -- with a brief rationale. ## Rules - Always delegate to all three reviewers. Never skip one. - Always run the three reviews in parallel, not sequentially. - Do not make code changes yourself. You are read-only. - If reviewers disagree, present both perspectives and let the developer decide. It runs the subagents I’ve defined in parallel, then gives me a solid report that can be actioned. 5. Build Skills Now that you’ve got your agents set up, you are ready to go, right? Not so fast. You probably need to write a few skills. A skill is a modular bridge that connects the LLM to your codebase and tools. The LLM knows how to write code whereas the skill gives it the authority and instruction manual to perform a specific task. Skills should be atomic (it should do one thing well), type-safe (it should explicitly define what data it expects and what it returns), and verbosely documented. Skills are placed in .opencode/skills/skill-name/SKILL.md and written in Markdown. You cn learn more about agent skills from Anthropic or OpenCode School. You can find examples at officialskills.sh. The good news is you don’t have to start from nothing. You can add skills using npx skills: Cloudflare: npx skills add https://github.com/cloudflare/skills Replicate: npx skills add replicate/skills Frontend Design: npx skills add https://github.com/anthropics/skills --skill frontend-design You can also find skills on GitHub quite readily. One skill I tend to come back to often is “how should the LLM handle a GitHub issue?” I’ve got what’s known as a “Workflow skill” that does my process: # Skill: GitHub Issue to PR Orchestrator ## Description A high-level workflow skill that automates the lifecycle of a feature or bug fix: from issue ingestion to Pull Request creation, using Git worktrees for environment isolation. ## Context &amp; Constraints - **Platform:** Cloudflare Dev Platform. - **Environment:** Requires `gh` (GitHub CLI) and `git` installed and authenticated. - **Isolation:** Always use `git worktree` to prevent polluting the main development branch or losing uncommitted local work. - **Quality Gate:** This skill depends on the `quality-gate` skill. ## Workflow Steps ### 1. Ingestion - **Command:** `gh issue view &lt;issue_number&gt; --json title,body,labels` - **Action:** Summarize the requirement. If the issue is a bug, look for reproduction steps. ### 2. Planning - **Action:** Before writing code, output a `PLAN.md` in the root (temporary). - **Review:** Wait for a momentary internal check: Does this plan violate the [Cloudflare Constitution](.spec/CONSTITUTION.md)? ### 3. Environment Setup (The Worktree) - **Branch Naming:** `feat/issue-&lt;number&gt;` or `fix/issue-&lt;number&gt;`. - **Command:** - `git worktree add ~/worktrees/issue-&lt;number&gt; -b &lt;branch_name&gt;` - `cd ~/worktrees/issue-&lt;number&gt;` - `npm install` ## 4. Implementation - **Action**: Perform the code changes as outlined in the `PLAN.md`. - **Constraint**: Follow the `AGENTS.md` rules (Hono, Drizzle, etc.). ## 5. Quality Gate - **Execution**: Trigger `invoke-skill("quality-gate")`. - **Requirement**: - `npm run quality-gate` must pass - `npm run test:coverage` must pass with 80% code coverage. - **Rollback**: - If the gate fails and cannot be fixed in 3 iterations, stop and report. ## 6. Commit &amp; PR - **Commit Style**: Conventional Commits (e.g. `feat(worker): add auth middleware (closes #&lt;number&gt;)`) - **Command**: - `git add -A` - `git commit -m "&lt;message&gt;"` - `git push -u origin &lt;branch-name&gt;` - `gh pr create --title "&lt;issue-title&gt;" --body "Closes #&lt;number&gt;. &lt;commit-message&gt;"` ## 7. Cleanup - `cd &lt;original-dir&gt;` Do not remove the worktree. This will be done separately. ## Error handling - If `gh` is not authenticated, stop and request the user to run `gh auth login`. - If a worktree already exists for that issue, ask to resume or delete. I can now provide the following prompt: Use skill github-issue to implement issue 1234. The LLM will then use this skill to run the workflow - basically, your entire software development lifecycle for issues - in a separated worktree. You can run a good terminal multiplexor (like tmux) to run multiple OpenCode sessions, allowing concurrent development to take place. 6. Set up AGENTS.md If you are using Claude Code, this will be CLAUDE.md instead. When an AI coding assistant opens your repository, it has general knowledge of coding, but it lacks the specific context of your project. An AGENTS.md file serves three primary purposes: Setting boundaries: Along with the CONSTITUTION.md, it explicitly tells the AI what not to do. (e.g. “Do not use Node.js core modules like fs or path because this runs in a V8 isolate”). Defining the stack: It prevents hallucinations by declaring your exact tools. I normally include a “library list” of allowed and denied libraries complete with versions. (e.g.”Use Drizzle ORM with Cloudflare D1, not Prisma”). Standardizing Workflows: It tells the AI how to test, deploy, and format code so its output matches your human development standards. I generally start with an AI assisted version: Read my package.json, wrangler.jsonc, and tsconfig.json. Based on these files, generate a comprehensive AGENTS.md file in the root directory. Include sections for our specific stack context, hard constraints from the .spec/CONSTITUTION.md, database patterns, and preferred routing framework. Once you have this, you can iterate whenever your AI makes a contextual mistake. For example, if it tries to install axios instead of using the native fetch API, you can correct the AI. Again, there are lots of examples on GitHub that you can use as a starting point. Final thoughts Congratulations. If you made it through all that, you are now ready to develop your new project or work on your existing project using OpenCode and an LLM coding assistant. It may seem a lot, but the time you invest up front will prevent hallucination and make your code more maintainable. Happy coding!]]></summary></entry><entry><title type="html">Effective prompts for agentic engineering</title><link href="https://adrianhall.github.io/posts/2026/2026-04-11-agentic-engineer-3.html" rel="alternate" type="text/html" title="Effective prompts for agentic engineering" /><published>2026-04-11T00:00:00-07:00</published><updated>2026-04-11T00:00:00-07:00</updated><id>https://adrianhall.github.io/posts/2026/agentic-engineer-3</id><content type="html" xml:base="https://adrianhall.github.io/posts/2026/2026-04-11-agentic-engineer-3.html"><![CDATA[<p>This is the third and final part of my series on agentic engineering.  If you want to read the first two articles, see the following links:</p>

<ul>
  <li><a href="/posts/2026/2026-04-07-agentic-engineer-1.html">Ten rules for spec-driven design</a></li>
  <li><a href="/posts/2026/2026-04-09-agentic-engineer-2.html">Using Jobs-to-be-done to improve your agentic spec</a></li>
</ul>

<p>In the previous article, I covered the spec, which documents the “what” and the “why” of what you want to build using <a href="https://www.productplan.com/glossary/jobs-to-be-done-framework/">JTBD</a> as a primary product requirements framework to use.  This is recommeneded because the alternatives rely on a conversation between a product manager and an engineer to fill in the gaps - the agentic developer can’t rely on engineer intuition to determine the answers to all the edge cases you are going to uncover - you have to specify them.  By focusing on the objective, you can use best practices and actually imbue the agent with a lot of that intuition.</p>

<p><img src="/assets/images/2026/Apr11-banner.png" alt="Effective prompts for agentic engineering" /></p>

<p>While a spec will tell an agent what to build, the agent has a primary objective to satisfy the spec with the least resistance, frequently sacrificing the “invisible” pillars of professional software - for example, observability, security, and maintainability - to get to a visual result faster.  By codifying these requirements, you bridge the intuition gap.  You are effectively hard-coding tje senior-level rigor that a human engineer would apply instinctively - such as error handling, schema migrations, and unit tests - ensuring that the agent doesn’t just deliver code that works, but code that is production-grade and maintainable.</p>

<p>An agent without a constitution is like a junior dev on three energy drinks: they’ll move fast, but you’ll spend the next week cleaning up the mess.  The constitution gives them the “Senior” conscience they weren’t born with.</p>

<h2 id="generating-the-constitution-with-moscow">Generating the Constitution with MoSCoW</h2>

<p>Left to themselves, agents often “lazy-code” (for example, skipping error handling or comments) because they are optimized for completion speed.  A constitution is a set of rules that prevents agents from taking short cuts for the sake of speed. The framework I most often reach for is the <a href="https://en.wikipedia.org/wiki/MoSCoW_method"><strong>MoSCoW</strong></a> framework.  MoSCoW stands for:</p>

<ul>
  <li>Must-haves</li>
  <li>Should-haves</li>
  <li>Could-haves</li>
  <li>Won’t-have</li>
</ul>

<p>Anyone who has read an Internet RFC will be familiar here: <strong>DO</strong>, <strong>SHOULD</strong>, *SHOULDN’T<strong>, and **DON’T</strong> map to the same concepts.  I have a basic prompt:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Review the spec in @.spec/spec.md.  Create a constitution in .spec/constitution.md using the MoSCoW framework.  This constitution must define the technical standards and engineering intuition that a senior developer work use.

* **Must Have**: List critical Cloudflare-specific standards (for example, D1 migrations, drizzle type-safety, turnstile verification, vitest coverage &gt;80%, and structured logging).
* **Should/Could Have**: List technical enhancements (e.g. KV caching, R2 image optimization)
* **Rules of Engagement**: Explicitly state that the agent MUST NOT skip observability, security, maintainability, error handling to save tokens.
</code></pre></div></div>

<p>This will produce another living document (don’t forget <em>Rule 10: The spec is a living document</em>) and that it’s your responsibility to read and edit it.  Some of the things I normally end up adding include:</p>

<ul>
  <li><strong>MUST HAVE</strong>: Abstract service interfaces as close to the service as possible.
    <ul>
      <li>Adding an abstraction at the service level is a great way to simplify mocking for unit tests.  This addition is something a senior dev would likely do automatically irrespective of if a PM asked or not - it just makes sense once you’ve been in industry for a while.</li>
    </ul>
  </li>
  <li><strong>SHOULD HAVE</strong>: Automated cache-aside pattern using Cloudflare Workers KV.
    <ul>
      <li>While a “Must Have” would be the raw D1 database connection, a “Should Have” like this ensures the system isn’t hitting the database for every single page load.  This tells the agent that while the system needs to work, it should be optimized for the edge.  If the agent is running low on context (or time), it can priorize the drizzle logic first, but it knows the architectural expectation is to include a KV caching layer for a read-heavy operation.</li>
    </ul>
  </li>
  <li><strong>COULD HAVE</strong>: Real-time ‘User is typing’ indicators via Cloudflare durable objects.
    <ul>
      <li>This adds polish to an experience, but it isn’t required for a functional comment validation and hosting system.  By labeling this as “Could have”, you prevent the agent from over-engineering the initial state-management logic.  It signals to the agent: “If we have extra budget in our work breakdown, this is the direction we want to head” without letting it distract from the security of the API.</li>
    </ul>
  </li>
  <li><strong>WONT HAVE</strong>: Cross-platform (non-D1) database drivers.
    <ul>
      <li>Explicitly stating that the system will not support PostgreSQL, MySQL, or other external databases.  This is a crucial guardrail.  Without it, an agent might try to write “agnostic” code or suggest heavy libraries that work in any environment.  This “Won’t have” forces the agent to stay lean and hyper-focused on my Cloudflare native stack, ensuring the code remains lightweight and specific to my architecture.</li>
    </ul>
  </li>
</ul>

<p>It’s the selective ignorance that keeps the agent from hallucinating complex integrations or pulling in unnecessary NPM packages that “might be useful later”.</p>

<h2 id="generating-the-work-breakdown-with-gist">Generating the work breakdown with GIST</h2>

<p>Now that I have the tech stack, the constitution, and the spec available, I have everything I need to start building.  Complex projects, however, will exhaust the context pretty quickly.  Our spec and constitution are not small and they cost tokens.  Breaking the work down into logical phases is another key skill that senior developers use often prior to touching the keyboard.  The key framework here is GIST:</p>

<ul>
  <li>GOAL</li>
  <li>IDEA</li>
  <li>STEP-PROJECT</li>
  <li>TASK</li>
</ul>

<p>I focus in on the STEP-PROJECT and TASK (or more appropriately, “Phase” and “Task”) where each task can be considered a single GitHub issue.  This is what allows the LLM to focus their attention on the right part of the project.  Fortunately, LLMs (particularly MoE thinking LLMs like Opus 4.6) are really good at this step.  I use the following prompt:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Convert the @.spec/spec.md ionto a work breakdown using the GIST framework.  Use the @.spec/constitution.md and @.spec/stack.md to frame the requirements for the Cloudflare dev platform.  

For each step-project:

1. **Single File**: Create a file in `.spec/work` for the step-project, named `xxx-short-title.md` where xxx is an incrementing zero-padded number.
2. **Atomic Scope**: Each step must be small enough to be completed in a single LLM chat session without compacting the context.
3. **Definition of Done**: Each task must include a definition of done. 
4. **Indicate Parallelization**: Indicate which tasks can be parallelized within a step-project.
4. **Quality Gate**: At the end of each step-project:
  - Additional unit tests for new code should be written
  - TypeScript type check must be clean
  - Eslint should show no errors
  - All tests should pass
  - Minimum 80% test coverage
  - Wrangler dry-run should ensure no more than 1MB bundle size
</code></pre></div></div>

<p>Step-projects protect the context window, ensuring the agent doesn’t (for example) forget the database schema while trying to write the CSS.  What the LLM will produce is a flight map - a set of files you can now execute through chat or feed into Squad and tell it to go ahead and build the project.  By forcing the agent to output each step-project into its own zero-padded file, you are creating a chain of custody for the project evolution.  It also prevents the agent from getting “lazy” or “helpful” by trying to cram a 40-task breakdown into a single message.</p>

<p>Let’s look at what one of these files looks like:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>File: 0002-auth-turnstile.md
Prerequisites: 001-db-setup.md
Status: Pending
Goal: Implement Cloudflare Turnstile verification and basic user identity logic to ensure only humans can post.
Alignment:
- Must use wrangler secret for keys
- Must include vitest integration tests for the verification worker.

Tasks:

1. Infrastructure &amp; Secrets
  - [ ] Provision a Cloudflare Turnstile site key and secret via Terraform
  - [ ] Run `wrangler secret put TURNSTILE_SECRET_KEY` for the dev environment
  - [ ] Update `wrangler.jsonc` to include the Turnstile site key as a public variable
2. Worker Implementation
  - [ ] Create a utility function `verifyTurnstileToken(token, ip)` using the fetch API to hit Turnstile siteverify endpoint
  - [ ] Modify `POST /comment` endpoint to extract the cf-turnstile-response header
  - [ ] Implement a 403 Forbidden response if the token is missing or the verification fails
3. Constitution checks
  - [ ] Error Handling: Ensure the worker doesn't crash if the Turnstile API is unreachable; default to fail-closed
  - [ ] Observability: Log successful and failed attempts, including Turnstile error code
  - [ ] Tests: Write a vitest suite that mocks the Turnstile API respone to test both success and failed scenarios

Definition of done:

- [ ] curl request to the endpoint without a token returns a 403
- [ ] curl request with a valid (mocked) token returns the expected next-step response
- [ ] All tests pass with 80% statement coverage
</code></pre></div></div>

<p>Because this file is focused and small, it’s a great context anchor.  You aren’t concerning the LLM with all the other things it COULD be doing - just do this one thing.  This keeps the context smaller, even when it gets fed the entire spec, stack, and constitution to figure out what this task means.  Finally, checkbox logic - agents are remarkably good at following markdown checklists.  It provides a clear success metric for the session.</p>

<h2 id="some-final-cleanup">Some final cleanup</h2>

<p>Don’t forget to include a <code class="language-plaintext highlighter-rouge">AGENTS.md</code> or a skill that you use to execute a step-project.  This allows you to integrate a process that the agent MUST follow.  Mine includes:</p>

<ul>
  <li><strong>ALWAYS</strong> create a new, dedicated git worktree (eg. <code class="language-plaintext highlighter-rouge">git worktree ad ../00X-branch 00X-branch</code>)</li>
  <li><strong>ALWAYS</strong> commit, following the Conventional Commits specification</li>
  <li><strong>ALWAYS</strong> run <code class="language-plaintext highlighter-rouge">npm run quality-gate</code> - fix any errors found and ensure a minimum of 80% code coverage</li>
  <li><strong>ALWAYS</strong> Push the branch and generate a PR description that references the original JTBD from .spec/spec.md once the quality-gate is green</li>
</ul>

<p>Obviously, there is more that goes into a skill or AGENTS.md file than this, but putting the process at the front ensures that MULTIPLE agents can be operating together and that all the check-ins and pull-requests will be generated for you.  You become the code reviewer (unless you want to delegate that to an LLM as well).</p>

<h2 id="final-thoughts">Final thoughts</h2>

<p>When I started this series, the goal was to find a way to move from vibe-coding to agentic engineering - to make AI agents more than just sophisticated copy-paste machines.  We’ve moved through the three distinct layers of discipline:</p>

<ol>
  <li>The Spec - replacing vague user stored with Jobs-to-be-done to give the agent a deep understanding of why the product exists.</li>
  <li>The Constitution - using MoSCoW to codify the senior engineer intuition that agents natually lack, forcing common complaints like observability, security, and maintainability into the project “must-haves”.</li>
  <li>The Work Brekdown - leveraging GIST to break the work into verifiable step-projects that keep the agent grounded in a manageable context window.</li>
  <li>The Cleanup - creating a skill that codifies the process of writing code, allowing parallelization of effort.</li>
</ol>

<p>The transition from vibe-coder to an agentic engineer isn’t about writing less code; it’s about writing better instructions.  We are shifting our primary output from lines of syntax to the creation of high-fidelity blueprints.</p>

<p>By applying these product management frameworks to our technical specs, we aren’t just making it easier for the agent to execute; we are making it possible for us to scale our own expertise.  We’ve turned the intuition gap into a documented standard.  Whether you (like me) are building a native comment system on Cloudflare or a complex enterprise backend, the lesson remains the same.</p>

<ul>
  <li>If you can’t specify it, the agent can’t build it.</li>
  <li>If you can’t put in basic guard rails, you can’t trust the agent to lead</li>
</ul>

<p>This framework doesn’t remove the human from the loop; it elevates the human to the role of architect.</p>

<p>One thing I’ve found useful is to have a robust repository template that is AI focused.  By including the technology stack and constitution along with a solid set of skills and agent definitions for YOUR environment, you’ll find the process becomes “write the spec and let the agent do the work”.  The repository template is both technology stack and organization specific.  What works for the Cloudflare dev platform won’t work for a mobile app, and we can’t expect it to.  However, what works for your group of engineers in a company does not necessarily translate to someone elses group either.  I suspect we’ll get to the point where the “Cloudflare dev platform AI repository” template can be created and then modified on a per-organization basis.</p>

<p>But that’s something for another time.</p>

<h3 id="ai-disclosure">AI Disclosure</h3>

<p>I wrote this article (as I do all my articles) by hand.  However, the images are generated by Google Gemini.</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Devtools" /><category term="ai" /><summary type="html"><![CDATA[This is the third and final part of my series on agentic engineering. If you want to read the first two articles, see the following links: Ten rules for spec-driven design Using Jobs-to-be-done to improve your agentic spec In the previous article, I covered the spec, which documents the “what” and the “why” of what you want to build using JTBD as a primary product requirements framework to use. This is recommeneded because the alternatives rely on a conversation between a product manager and an engineer to fill in the gaps - the agentic developer can’t rely on engineer intuition to determine the answers to all the edge cases you are going to uncover - you have to specify them. By focusing on the objective, you can use best practices and actually imbue the agent with a lot of that intuition. While a spec will tell an agent what to build, the agent has a primary objective to satisfy the spec with the least resistance, frequently sacrificing the “invisible” pillars of professional software - for example, observability, security, and maintainability - to get to a visual result faster. By codifying these requirements, you bridge the intuition gap. You are effectively hard-coding tje senior-level rigor that a human engineer would apply instinctively - such as error handling, schema migrations, and unit tests - ensuring that the agent doesn’t just deliver code that works, but code that is production-grade and maintainable. An agent without a constitution is like a junior dev on three energy drinks: they’ll move fast, but you’ll spend the next week cleaning up the mess. The constitution gives them the “Senior” conscience they weren’t born with. Generating the Constitution with MoSCoW Left to themselves, agents often “lazy-code” (for example, skipping error handling or comments) because they are optimized for completion speed. A constitution is a set of rules that prevents agents from taking short cuts for the sake of speed. The framework I most often reach for is the MoSCoW framework. MoSCoW stands for: Must-haves Should-haves Could-haves Won’t-have Anyone who has read an Internet RFC will be familiar here: DO, SHOULD, *SHOULDN’T, and **DON’T map to the same concepts. I have a basic prompt: Review the spec in @.spec/spec.md. Create a constitution in .spec/constitution.md using the MoSCoW framework. This constitution must define the technical standards and engineering intuition that a senior developer work use. * **Must Have**: List critical Cloudflare-specific standards (for example, D1 migrations, drizzle type-safety, turnstile verification, vitest coverage &gt;80%, and structured logging). * **Should/Could Have**: List technical enhancements (e.g. KV caching, R2 image optimization) * **Rules of Engagement**: Explicitly state that the agent MUST NOT skip observability, security, maintainability, error handling to save tokens. This will produce another living document (don’t forget Rule 10: The spec is a living document) and that it’s your responsibility to read and edit it. Some of the things I normally end up adding include: MUST HAVE: Abstract service interfaces as close to the service as possible. Adding an abstraction at the service level is a great way to simplify mocking for unit tests. This addition is something a senior dev would likely do automatically irrespective of if a PM asked or not - it just makes sense once you’ve been in industry for a while. SHOULD HAVE: Automated cache-aside pattern using Cloudflare Workers KV. While a “Must Have” would be the raw D1 database connection, a “Should Have” like this ensures the system isn’t hitting the database for every single page load. This tells the agent that while the system needs to work, it should be optimized for the edge. If the agent is running low on context (or time), it can priorize the drizzle logic first, but it knows the architectural expectation is to include a KV caching layer for a read-heavy operation. COULD HAVE: Real-time ‘User is typing’ indicators via Cloudflare durable objects. This adds polish to an experience, but it isn’t required for a functional comment validation and hosting system. By labeling this as “Could have”, you prevent the agent from over-engineering the initial state-management logic. It signals to the agent: “If we have extra budget in our work breakdown, this is the direction we want to head” without letting it distract from the security of the API. WONT HAVE: Cross-platform (non-D1) database drivers. Explicitly stating that the system will not support PostgreSQL, MySQL, or other external databases. This is a crucial guardrail. Without it, an agent might try to write “agnostic” code or suggest heavy libraries that work in any environment. This “Won’t have” forces the agent to stay lean and hyper-focused on my Cloudflare native stack, ensuring the code remains lightweight and specific to my architecture. It’s the selective ignorance that keeps the agent from hallucinating complex integrations or pulling in unnecessary NPM packages that “might be useful later”. Generating the work breakdown with GIST Now that I have the tech stack, the constitution, and the spec available, I have everything I need to start building. Complex projects, however, will exhaust the context pretty quickly. Our spec and constitution are not small and they cost tokens. Breaking the work down into logical phases is another key skill that senior developers use often prior to touching the keyboard. The key framework here is GIST: GOAL IDEA STEP-PROJECT TASK I focus in on the STEP-PROJECT and TASK (or more appropriately, “Phase” and “Task”) where each task can be considered a single GitHub issue. This is what allows the LLM to focus their attention on the right part of the project. Fortunately, LLMs (particularly MoE thinking LLMs like Opus 4.6) are really good at this step. I use the following prompt: Convert the @.spec/spec.md ionto a work breakdown using the GIST framework. Use the @.spec/constitution.md and @.spec/stack.md to frame the requirements for the Cloudflare dev platform. For each step-project: 1. **Single File**: Create a file in `.spec/work` for the step-project, named `xxx-short-title.md` where xxx is an incrementing zero-padded number. 2. **Atomic Scope**: Each step must be small enough to be completed in a single LLM chat session without compacting the context. 3. **Definition of Done**: Each task must include a definition of done. 4. **Indicate Parallelization**: Indicate which tasks can be parallelized within a step-project. 4. **Quality Gate**: At the end of each step-project: - Additional unit tests for new code should be written - TypeScript type check must be clean - Eslint should show no errors - All tests should pass - Minimum 80% test coverage - Wrangler dry-run should ensure no more than 1MB bundle size Step-projects protect the context window, ensuring the agent doesn’t (for example) forget the database schema while trying to write the CSS. What the LLM will produce is a flight map - a set of files you can now execute through chat or feed into Squad and tell it to go ahead and build the project. By forcing the agent to output each step-project into its own zero-padded file, you are creating a chain of custody for the project evolution. It also prevents the agent from getting “lazy” or “helpful” by trying to cram a 40-task breakdown into a single message. Let’s look at what one of these files looks like: File: 0002-auth-turnstile.md Prerequisites: 001-db-setup.md Status: Pending Goal: Implement Cloudflare Turnstile verification and basic user identity logic to ensure only humans can post. Alignment: - Must use wrangler secret for keys - Must include vitest integration tests for the verification worker. Tasks: 1. Infrastructure &amp; Secrets - [ ] Provision a Cloudflare Turnstile site key and secret via Terraform - [ ] Run `wrangler secret put TURNSTILE_SECRET_KEY` for the dev environment - [ ] Update `wrangler.jsonc` to include the Turnstile site key as a public variable 2. Worker Implementation - [ ] Create a utility function `verifyTurnstileToken(token, ip)` using the fetch API to hit Turnstile siteverify endpoint - [ ] Modify `POST /comment` endpoint to extract the cf-turnstile-response header - [ ] Implement a 403 Forbidden response if the token is missing or the verification fails 3. Constitution checks - [ ] Error Handling: Ensure the worker doesn't crash if the Turnstile API is unreachable; default to fail-closed - [ ] Observability: Log successful and failed attempts, including Turnstile error code - [ ] Tests: Write a vitest suite that mocks the Turnstile API respone to test both success and failed scenarios Definition of done: - [ ] curl request to the endpoint without a token returns a 403 - [ ] curl request with a valid (mocked) token returns the expected next-step response - [ ] All tests pass with 80% statement coverage Because this file is focused and small, it’s a great context anchor. You aren’t concerning the LLM with all the other things it COULD be doing - just do this one thing. This keeps the context smaller, even when it gets fed the entire spec, stack, and constitution to figure out what this task means. Finally, checkbox logic - agents are remarkably good at following markdown checklists. It provides a clear success metric for the session. Some final cleanup Don’t forget to include a AGENTS.md or a skill that you use to execute a step-project. This allows you to integrate a process that the agent MUST follow. Mine includes: ALWAYS create a new, dedicated git worktree (eg. git worktree ad ../00X-branch 00X-branch) ALWAYS commit, following the Conventional Commits specification ALWAYS run npm run quality-gate - fix any errors found and ensure a minimum of 80% code coverage ALWAYS Push the branch and generate a PR description that references the original JTBD from .spec/spec.md once the quality-gate is green Obviously, there is more that goes into a skill or AGENTS.md file than this, but putting the process at the front ensures that MULTIPLE agents can be operating together and that all the check-ins and pull-requests will be generated for you. You become the code reviewer (unless you want to delegate that to an LLM as well). Final thoughts When I started this series, the goal was to find a way to move from vibe-coding to agentic engineering - to make AI agents more than just sophisticated copy-paste machines. We’ve moved through the three distinct layers of discipline: The Spec - replacing vague user stored with Jobs-to-be-done to give the agent a deep understanding of why the product exists. The Constitution - using MoSCoW to codify the senior engineer intuition that agents natually lack, forcing common complaints like observability, security, and maintainability into the project “must-haves”. The Work Brekdown - leveraging GIST to break the work into verifiable step-projects that keep the agent grounded in a manageable context window. The Cleanup - creating a skill that codifies the process of writing code, allowing parallelization of effort. The transition from vibe-coder to an agentic engineer isn’t about writing less code; it’s about writing better instructions. We are shifting our primary output from lines of syntax to the creation of high-fidelity blueprints. By applying these product management frameworks to our technical specs, we aren’t just making it easier for the agent to execute; we are making it possible for us to scale our own expertise. We’ve turned the intuition gap into a documented standard. Whether you (like me) are building a native comment system on Cloudflare or a complex enterprise backend, the lesson remains the same. If you can’t specify it, the agent can’t build it. If you can’t put in basic guard rails, you can’t trust the agent to lead This framework doesn’t remove the human from the loop; it elevates the human to the role of architect. One thing I’ve found useful is to have a robust repository template that is AI focused. By including the technology stack and constitution along with a solid set of skills and agent definitions for YOUR environment, you’ll find the process becomes “write the spec and let the agent do the work”. The repository template is both technology stack and organization specific. What works for the Cloudflare dev platform won’t work for a mobile app, and we can’t expect it to. However, what works for your group of engineers in a company does not necessarily translate to someone elses group either. I suspect we’ll get to the point where the “Cloudflare dev platform AI repository” template can be created and then modified on a per-organization basis. But that’s something for another time. AI Disclosure I wrote this article (as I do all my articles) by hand. However, the images are generated by Google Gemini.]]></summary></entry><entry><title type="html">Using Jobs-to-be-done to improve your agentic spec</title><link href="https://adrianhall.github.io/posts/2026/2026-04-09-agentic-engineer-2.html" rel="alternate" type="text/html" title="Using Jobs-to-be-done to improve your agentic spec" /><published>2026-04-09T00:00:00-07:00</published><updated>2026-04-09T00:00:00-07:00</updated><id>https://adrianhall.github.io/posts/2026/agentic-engineer-2</id><content type="html" xml:base="https://adrianhall.github.io/posts/2026/2026-04-09-agentic-engineer-2.html"><![CDATA[<p>In <a href="/posts/2026/2026-04-07-agentic-engineer-1.html">my last article</a>, I introduced ten rules to help you move from vibe-coder to AI Product Architect by building specs.  However, I glossed over a few of the details.  The basic problem is that most people write specs for themselves - a human.  You aren’t writing for a human, so why do you think it would look the same? In this article, I’m going in depth into using product management techniques to write a spec that agents love.</p>

<p><img src="/assets/images/2026/Apr09-banner.png" alt="Transition to JTBD Image" /></p>

<p>Rule 1 (refined) is “<strong>Don’t just tell the agent what to build; tell it the circumstance in which the user is struggling.</strong>  The “Job” is the prompt; the “Code” is the solution to that job.”</p>

<p>Product managers use frameworks to help organize data from users so that the spec is more meaningful.  We often use “user stories”, which get put into JIRA or another issue tracking system, to communicate the requirements.  Weeks of research are done to make these really concise.  Some of the common PM frameworks that you shouldn’t use include:</p>

<ul>
  <li>The <a href="https://productstrategy.co/working-backwards-the-amazon-prfaq-for-product-innovation/"><strong>PR-FAQ</strong></a> - a favorite at Amazon; it starts with a really good press release that describes the customer benefits from the product or feature and follows on with a number of frequently asked questions that would describe sales scenarios.  This is <strong>NOT</strong> a good fit for agentic development as it doesn’t go into enough detail about the product or feature to be reasonable, even with tight writing.  It’s more at home as a mechanism to decide if a project should be approved.</li>
  <li><a href="https://medium.com/@tlowdermilk/customer-driven-engineering-part-1-the-culture-97601b5f65ed"><strong>Hypothesis Progression Framework (HPF)</strong></a> - a favorite at Microsoft Developer Division; it starts with a hypothesis and then you run experiments (data analysis, customer interviews, user studies, etc.) to disprove the hypothesis.  If it’s standing at the end, it’s a good hypothesis.  These are great for gathering information about features, but lack finesse when directing an LLM.</li>
  <li><a href="https://www.productplan.com/glossary/user-story/"><strong>User Stories</strong></a> - definitely useful, and I started down this path myself.  The classic “As a [user], I want to [do something], so that [value occurs]”.  User stories are for humans.  They are great for a JIRA board where a human can fill in the gaps with intuition and discussion with a product manager.  But an LLM lacks intuition; it only has context. The AI focuses on the <strong>Feature</strong>.</li>
</ul>

<p>So, what <strong>SHOULD</strong> you use:</p>

<ul>
  <li><a href="https://www.productplan.com/glossary/jobs-to-be-done-framework/"><strong>Jobs-to-be-done</strong></a> (JTBD) - “When I am [situation], I want [outcome] so I can [progress]” - this is the central framework I recommend for agentic specifications.  The LLM focuses on the <strong>Situation</strong> and <strong>Outcome</strong>, instead of a basic understanding of a feature.</li>
  <li><a href="https://www.productplan.com/glossary/circles-method/"><strong>CIRCLES</strong></a> (Comprehend context, identify customers, report needs, cut through priorities, list solutions, evaluate tradeoffs, summarize).  While we use JTBD for the ‘Why,’ we use the <strong>Comprehend Context</strong> step of CIRCLES to define ‘Who’ (which is rule 3). Knowing the ‘Who’ and the ‘Why’ together creates an unbreakable context for the LLM. The <strong>Evaluate Tradeoffs</strong> step is the secret sauce that makes it useful for agentic specifications.  If you explicitly list tradeoffs in your spec (e.g. “Prioritize developer readability over micro-optimizations”), the AI will stop using overly complex code and stick to clean maintainable patterns.</li>
</ul>

<p>These are the ones worth knowing while writing an agentic specification.  Frameworks like <a href="https://en.wikipedia.org/wiki/MoSCoW_method">MoSCoW</a> and <a href="https://www.productplan.com/glossary/gist-planning/">GIST</a> are better suited for the constitution and work breakdown, which I will tackle in future articles.</p>

<p>Let’s take a couple of examples.  I am currently building an AI-driven comment system for <a href="https://blog.cloudflare.com/emdash-wordpress/">EmDash - the Cloudflare Wordpress replacement</a>.</p>

<h2 id="example-1-submitting-the-comment">Example 1: Submitting the comment</h2>

<p>I might write this as a user story: <code class="language-plaintext highlighter-rouge">As [a reader], I want to [write comments] so I can [engage with the community]</code>.  It’s snappy, tells the story.  The software engineer will generally have a discussion with the PM asking about design of the feature and the nature of how comments flow through the system and come up with the right thing.  The LLM will build a standard CRUD form and add a submit button.</p>

<p>Using <strong>Jobs to be done</strong>: <code class="language-plaintext highlighter-rouge">When I [finish a compelling article], I want to [share my rebuttal immediately without losing my flow], so that [I feel my voice is heard in real-time].</code>  It’s a completely different vibe to the user story.  The LLM can act on this completely differently.  This feature needs streaming responses or optimistic UI because “flow” and “real-time” are the priorities.</p>

<h2 id="example-2-rejecting-spam-comments">Example 2: Rejecting Spam Comments</h2>

<p>Again, let’s write a user story: <code class="language-plaintext highlighter-rouge">As [an author], I want to [reject spammy comments] so [my readers feel safe engaging with the community].</code>  Again, it’s snappy and tells the story.  The LLM will add an <code class="language-plaintext highlighter-rouge">IsSpam</code> boolean to the database and add a delete button to the comments.  Technically, it meets the requirement.  We don’t get to discuss the expectations with an LLM - they read the requirement and act.</p>

<p>Let’s re-write this using <strong>Jobs to be done</strong>: <code class="language-plaintext highlighter-rouge">When I [receive a submitted comment], I want to [automatically classify the comment as spam or not-spam and quarantine the spam comments], so that [my readers are not overwhelmed by irrelevant comments]</code>.  Here, the goal is “automate triage”.  The agent will be incentivized to create a pre-save hook, inject a classification step, and route noise to a quarantine table.</p>

<h2 id="example-3-using-state-machines">Example 3: Using State Machines</h2>

<p>We can also illustrate the state machine for this feature.  When you add a state machine to the spec, you are essentially telling the agent “The comment doesn’t just exist; it moves through a lifecycle.” and that can help frame the HOW in addition to the WHY.  Instead of just asking for an Insert function, you define the transitions:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>stateDiagram-v2
    [*] --&gt; New: Reader Submits
    New --&gt; Triage: classifyComment()
    
    state Triage {
        [*] --&gt; Analyzing
        Analyzing --&gt; Approved: High Signal
        Analyzing --&gt; Quarantined: Low Signal
        Analyzing --&gt; ManualReview: Fail/Ambiguous
    }
    
    Approved --&gt; UI_Live: Optimistic Sync
    ManualReview --&gt; UI_Pending: Notify Reader
    Quarantined --&gt; [*]: Silent Remove
</code></pre></div></div>

<p>As you can see, there can be a lot of transitions to discuss when you consider a full lifecycle of a comment.  I haven’t handled all of them - manual review can also move to quarantined or approved through a button press - what happens then? Defining the flow via state machines is great for nailing the logic when it matters most.</p>

<p>If you are looking for a better tool to represent state machines, use the <a href="https://mermaid.js.org/syntax/stateDiagram.html">mermaid diagram language</a> (as I have here).  It supports state machines, is AI readable, but renders inside the documents readily for human readers.  It also forces logical rigor (unlike a drawing tool like Figma or LucidChart).  If there is a logic error, the mermaid chart won’t compile into an image.</p>

<h2 id="example-4-trade-offs">Example 4: Trade-offs</h2>

<p>If you don’t give the LLM some guidance in the spec, it will likely re-use other parts of the spec or previous projects (via memory) when building the comments entry box.  You might end up with a heavy weight markdown editor instead of a simple clean text box.  Using the “Evaluate Tradeoffs” from the CIRCLES framework is a great way to put some guard rails on this.</p>

<ul>
  <li>We priortize <strong>Speed (Low Latency)</strong> and <strong>Data Integrity</strong> over <strong>Rich Text Features</strong>.  The agent should focus on optimistic UI updates and instant Cloudflare D1 writes; markdown support is a ‘Should-Have’, not a ‘Must-Have’.</li>
</ul>

<p>The agent now knows that if it has to choose between a heavy library for emojis or a lightweight library for speed, it must choose speed.</p>

<h2 id="final-thoughts">Final thoughts</h2>

<p>When you shift from a user story to a JTBD, you move from requesting a feature to requesting a system behavior.  You provide more context.  LLMs do a better job when you provide context over expecting engineering intuition.  You move from prescriptive specs to descriptive architectures.</p>

<p>If you did start with user stories, all is not lost.  Remember rule 10: <em>The spec is a living document</em>.</p>

<p>Vibe-coding way: “Actually, don’t build the form that way.  Start with an AI-driven spam classification middleware” during a chat session.</p>

<p>Product Architect way:</p>

<ul>
  <li>Stop the chat session</li>
  <li>Revert the change</li>
  <li>Fix the intent by putting a better Job-to-be-done in the spec.</li>
</ul>

<p>Most of the time, the agent isn’t dumb; it’s just working towards the wrong job. Fixing the job ensures that the agent doesn’t re-hallucinate the exact same form next time because the job has been updated.</p>

<h3 id="ai-disclosure">AI Disclosure</h3>

<p>I wrote this article (as I do all my articles) by hand.  However, the images are generated by Google Gemini.</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Devtools" /><category term="ai" /><summary type="html"><![CDATA[In my last article, I introduced ten rules to help you move from vibe-coder to AI Product Architect by building specs. However, I glossed over a few of the details. The basic problem is that most people write specs for themselves - a human. You aren’t writing for a human, so why do you think it would look the same? In this article, I’m going in depth into using product management techniques to write a spec that agents love. Rule 1 (refined) is “Don’t just tell the agent what to build; tell it the circumstance in which the user is struggling. The “Job” is the prompt; the “Code” is the solution to that job.” Product managers use frameworks to help organize data from users so that the spec is more meaningful. We often use “user stories”, which get put into JIRA or another issue tracking system, to communicate the requirements. Weeks of research are done to make these really concise. Some of the common PM frameworks that you shouldn’t use include: The PR-FAQ - a favorite at Amazon; it starts with a really good press release that describes the customer benefits from the product or feature and follows on with a number of frequently asked questions that would describe sales scenarios. This is NOT a good fit for agentic development as it doesn’t go into enough detail about the product or feature to be reasonable, even with tight writing. It’s more at home as a mechanism to decide if a project should be approved. Hypothesis Progression Framework (HPF) - a favorite at Microsoft Developer Division; it starts with a hypothesis and then you run experiments (data analysis, customer interviews, user studies, etc.) to disprove the hypothesis. If it’s standing at the end, it’s a good hypothesis. These are great for gathering information about features, but lack finesse when directing an LLM. User Stories - definitely useful, and I started down this path myself. The classic “As a [user], I want to [do something], so that [value occurs]”. User stories are for humans. They are great for a JIRA board where a human can fill in the gaps with intuition and discussion with a product manager. But an LLM lacks intuition; it only has context. The AI focuses on the Feature. So, what SHOULD you use: Jobs-to-be-done (JTBD) - “When I am [situation], I want [outcome] so I can [progress]” - this is the central framework I recommend for agentic specifications. The LLM focuses on the Situation and Outcome, instead of a basic understanding of a feature. CIRCLES (Comprehend context, identify customers, report needs, cut through priorities, list solutions, evaluate tradeoffs, summarize). While we use JTBD for the ‘Why,’ we use the Comprehend Context step of CIRCLES to define ‘Who’ (which is rule 3). Knowing the ‘Who’ and the ‘Why’ together creates an unbreakable context for the LLM. The Evaluate Tradeoffs step is the secret sauce that makes it useful for agentic specifications. If you explicitly list tradeoffs in your spec (e.g. “Prioritize developer readability over micro-optimizations”), the AI will stop using overly complex code and stick to clean maintainable patterns. These are the ones worth knowing while writing an agentic specification. Frameworks like MoSCoW and GIST are better suited for the constitution and work breakdown, which I will tackle in future articles. Let’s take a couple of examples. I am currently building an AI-driven comment system for EmDash - the Cloudflare Wordpress replacement. Example 1: Submitting the comment I might write this as a user story: As [a reader], I want to [write comments] so I can [engage with the community]. It’s snappy, tells the story. The software engineer will generally have a discussion with the PM asking about design of the feature and the nature of how comments flow through the system and come up with the right thing. The LLM will build a standard CRUD form and add a submit button. Using Jobs to be done: When I [finish a compelling article], I want to [share my rebuttal immediately without losing my flow], so that [I feel my voice is heard in real-time]. It’s a completely different vibe to the user story. The LLM can act on this completely differently. This feature needs streaming responses or optimistic UI because “flow” and “real-time” are the priorities. Example 2: Rejecting Spam Comments Again, let’s write a user story: As [an author], I want to [reject spammy comments] so [my readers feel safe engaging with the community]. Again, it’s snappy and tells the story. The LLM will add an IsSpam boolean to the database and add a delete button to the comments. Technically, it meets the requirement. We don’t get to discuss the expectations with an LLM - they read the requirement and act. Let’s re-write this using Jobs to be done: When I [receive a submitted comment], I want to [automatically classify the comment as spam or not-spam and quarantine the spam comments], so that [my readers are not overwhelmed by irrelevant comments]. Here, the goal is “automate triage”. The agent will be incentivized to create a pre-save hook, inject a classification step, and route noise to a quarantine table. Example 3: Using State Machines We can also illustrate the state machine for this feature. When you add a state machine to the spec, you are essentially telling the agent “The comment doesn’t just exist; it moves through a lifecycle.” and that can help frame the HOW in addition to the WHY. Instead of just asking for an Insert function, you define the transitions: stateDiagram-v2 [*] --&gt; New: Reader Submits New --&gt; Triage: classifyComment() state Triage { [*] --&gt; Analyzing Analyzing --&gt; Approved: High Signal Analyzing --&gt; Quarantined: Low Signal Analyzing --&gt; ManualReview: Fail/Ambiguous } Approved --&gt; UI_Live: Optimistic Sync ManualReview --&gt; UI_Pending: Notify Reader Quarantined --&gt; [*]: Silent Remove As you can see, there can be a lot of transitions to discuss when you consider a full lifecycle of a comment. I haven’t handled all of them - manual review can also move to quarantined or approved through a button press - what happens then? Defining the flow via state machines is great for nailing the logic when it matters most. If you are looking for a better tool to represent state machines, use the mermaid diagram language (as I have here). It supports state machines, is AI readable, but renders inside the documents readily for human readers. It also forces logical rigor (unlike a drawing tool like Figma or LucidChart). If there is a logic error, the mermaid chart won’t compile into an image. Example 4: Trade-offs If you don’t give the LLM some guidance in the spec, it will likely re-use other parts of the spec or previous projects (via memory) when building the comments entry box. You might end up with a heavy weight markdown editor instead of a simple clean text box. Using the “Evaluate Tradeoffs” from the CIRCLES framework is a great way to put some guard rails on this. We priortize Speed (Low Latency) and Data Integrity over Rich Text Features. The agent should focus on optimistic UI updates and instant Cloudflare D1 writes; markdown support is a ‘Should-Have’, not a ‘Must-Have’. The agent now knows that if it has to choose between a heavy library for emojis or a lightweight library for speed, it must choose speed. Final thoughts When you shift from a user story to a JTBD, you move from requesting a feature to requesting a system behavior. You provide more context. LLMs do a better job when you provide context over expecting engineering intuition. You move from prescriptive specs to descriptive architectures. If you did start with user stories, all is not lost. Remember rule 10: The spec is a living document. Vibe-coding way: “Actually, don’t build the form that way. Start with an AI-driven spam classification middleware” during a chat session. Product Architect way: Stop the chat session Revert the change Fix the intent by putting a better Job-to-be-done in the spec. Most of the time, the agent isn’t dumb; it’s just working towards the wrong job. Fixing the job ensures that the agent doesn’t re-hallucinate the exact same form next time because the job has been updated. AI Disclosure I wrote this article (as I do all my articles) by hand. However, the images are generated by Google Gemini.]]></summary></entry><entry><title type="html">Ten rules for spec-driven design</title><link href="https://adrianhall.github.io/posts/2026/2026-04-07-agentic-engineer-1.html" rel="alternate" type="text/html" title="Ten rules for spec-driven design" /><published>2026-04-07T00:00:00-07:00</published><updated>2026-04-07T00:00:00-07:00</updated><id>https://adrianhall.github.io/posts/2026/agentic-engineer-1</id><content type="html" xml:base="https://adrianhall.github.io/posts/2026/2026-04-07-agentic-engineer-1.html"><![CDATA[<p>Recently, I posted an article on the <a href="/posts/2026/2026-04-03-ai-discussion.html">AI Maturity Model</a>.  In that article, I proposed six stages of growth - from tactician or accidental editor to the agentic engineer.  The biggest difference between the vibe coder and the agentic engineer is the transition to spec-driven design.</p>

<p>But what does that really mean?</p>

<p>The “vibe-coder” sends one massive prompt and hopes for a miracle.  The agentic engineer understands that an LLM is a reasoning engine, not a magician.  Spec-driven design provides three distinct documents: a <strong>Constitution</strong>, a <strong>Specification</strong>, and a <strong>Work Breakdown</strong>.</p>

<p><img src="/assets/images/2026/Apr07-agentic-engineer-1.png" alt="The three pillars of agentic engineering" /></p>

<ul>
  <li>Without the Spec: The agent will build a “technically correct” feature that doesn’t actually solve the user’s problem.</li>
  <li>Without the Constitution: The agent will suggest a library you hate or use an insecure pattern.</li>
  <li>Without the Breakdown: The agent tries to write thousands of lines of code at once and hallucinates halfway through.</li>
</ul>

<p>It doesn’t matter if you are using <a href="https://openspec.dev/">OpenSpec</a>, <a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/">SpecKit</a>, or <a href="https://kiro.dev/">AWS Kiro</a> for your spec-driven design.  The concepts and the requirements are the same.</p>

<p>The real insight here is that you have to be a cross between an architect and a product manager to be really good at this.  Technical Product Managers (and I have been one for many years) are great at this stuff.  I believe <em>Product Architect</em> is the role that will be advertised for this specific skill set in the future.</p>

<h2 id="pillar-1-the-spec-the-intent">Pillar 1: The Spec (The Intent)</h2>

<p>This is the “North Star” for the agent, defining the specific feature or product value and behaviour.</p>

<h3 id="rule-1-use-a-pm-framework">Rule 1: Use a PM Framework</h3>

<p>Before defining a single UI element, anchor the spec in the user’s struggle.  Use a framework like <strong>Jobs-to-be-Done</strong> to explain the context: “When I am [X], I want to [Y], so I can [Z]”.  When the agent understands the intent, it makes better autonomous decisions when it hits a logical fork in the road that the spec didn’t explicitly cover.</p>

<p>This is so important that I’m going to <a href="/posts/2026/2026-04-09-agentic-engineer-2.html">follow up on this topic in a future article</a>.</p>

<h3 id="rule-2-explicit-state-machine-mapping">Rule 2: Explicit State Machine Mapping</h3>

<p>Agents excel at the “happy path” code but struggle with edge cases.  This is why vibe-coding a proof of concept works wonderfully but taking it to production fails in so many cases.  You must explicitly define the feature states, like idle, loading, success, and error.  By mapping these transitions in the spec, you force the agent to write defensive code and robust error handling rather than optimistic, fragile snippets.</p>

<h3 id="rule-3-the-user-persona-context">Rule 3: The “User Persona” Context</h3>

<p>An agent building a tool for a “Senior devops engineer” should write different code (and UI) than one building for a “First-time retail customer.”  Providing the persona ensures the agent defaults to the correct level of complexity, terminology, and UX friction without you having to prompt for it every time.</p>

<h2 id="pillar-2-the-constitution-the-constraints">Pillar 2: The Constitution (The Constraints)</h2>

<p>These are the “global rules” that ensure the agent stays within your specific tech stack and architectural style.</p>

<h3 id="rule-4-the-single-source-of-truth-schema">Rule 4: The “Single Source of Truth” Schema</h3>

<p>My first spec forgot to define the data schema.  I ended up with one table with an auto-incrementing <code class="language-plaintext highlighter-rouge">user_id</code> and another with a <code class="language-plaintext highlighter-rouge">unique_user_id</code> that was a UUID.  Needless to say, things broke.  Including the Drizzle <code class="language-plaintext highlighter-rouge">schema.ts</code> directly would have solved this.</p>

<p>You constitution should define the data shapes before any logic is written.  By grounding the agent, you prevent logic drift where the agent hallucinates database columns or variable types that don’t exist.</p>

<h3 id="rule-5-explicit-tech-stack-and-dependency-declaration">Rule 5: Explicit Tech Stack and Dependency Declaration</h3>

<p>Specify your version (e.g. <em>Next.js 15, Drizzle ORM, Cloudflare D1</em>).  If the constitution is vague, the agent will default to its oldest training data.  Hard-coding your stack preferences ensures the agent doesn’t introduce hallucinated libraries or outdated patterns that break your build.</p>

<p>You should also provide coding style decisions - e.g. JSDoc requirements, one class per file, or whatever else your organization (or you) decides is important.</p>

<h3 id="rule-6-the-out-of-scope-list">Rule 6: The “Out of Scope” List</h3>

<p>Telling an agent what <strong>not</strong> to do is often more important than telling it what to do.  Use the constitution to set “negative constraints” - such as <em>No external CSS libraries</em> or <em>No Node.js built-ins (v8 isolates only)</em> (which is great for my current focus in Cloudflare development).  This keeps the agent output lean and compatible with your tech stack.</p>

<h2 id="pillar-3-the-work-breakdown">Pillar 3: The Work Breakdown</h2>

<p>This is the “Task Graph” that prevents the agent from losing its place during complex builds.</p>

<h3 id="rule-7-modular-decomposition">Rule 7: Modular Decomposition</h3>

<p>Never ask an agent to build a “Page”.  Ask it to build a “Module”.  Break your feature into atomic units - schema, data access layer, and UI components.  This prevents context collapse, where the agents reasoning degrades because the context size has exceeded its cognitive window.</p>

<h3 id="rule-8-the-definition-of-done-verification">Rule 8: The Definition of Done (Verification)</h3>

<p>The work breakdown must include explicit acceptance criteria for every task.  This allows the agent (or a secondary reviewer agent) to verify the work.  If the agent can’t test its output against your criteria, it isn’t finished.</p>

<h3 id="rule-9-error-handling-and-edge-cases">Rule 9: Error handling and Edge Cases</h3>

<p>Every task in the breakdown should account for failure states.  By itemizing specific edge cases (e.g. “what if the D1 query times out?”), you ensure the agent builds a resilient system rather than a “vibe-based” prototype that only works under perfect conditions.</p>

<h2 id="rule-10-the-living-document">Rule 10: The Living Document</h2>

<p>Spec-Driven Design is not “waterfall” development.  As the agent codes, it will discover ambiguities or technical blockers.  Rule 10 dictates that you <strong>update the spec or constitution first</strong>, then let the agent retry.  Never patch the code manually to fix a logic error; fix the instruction so the agentic loop remains clean and reproducible.</p>

<p>You are responsible for each stage.  It’s alright to defer some of the work to a reasoning LLM, but you are responsible for the output.  Each document should be as concise as possible while still meeting the requirements for the document.  I have found that LLMs love to be verbose.  One project I was on, the LLM decided to say “Support RFC x” and then include the RFC in the document.  Remember, size is context and context is token cost.</p>

<p>Of course, after every change, you should ask the LLM to analyze the work breakdown, particularly the things that have not been started yet, to ensure that other changes are not needed.</p>

<h2 id="final-thoughts">Final thoughts</h2>

<p>As I was writing this, I realized that this is just scraping the surface of this topic, and I will go into detail in future articles, especially the in-depth analysis for the rules around the spec itself.  Thinking like a product manager does not come naturally to most people, so it’s time I wrote it down.</p>

<p>If you are doing this right, you will be spending much more time on these documents than on running the AI.  A recent project I completed is a good example.  I spent over three weeks working through all the issues with the spec.  Once I was happy, the product was done in less than two days, and could have been done faster if I had enabled a squad to help me orchestrate the work.</p>

<p>Product thinking is not free, but it is the difference between a vibe-coded proof-of-concept and a maintainable production-ready product.</p>

<h3 id="ai-disclosure">AI Disclosure</h3>

<p>I wrote this article (as I do all my articles) by hand.  However, the images are generated by Google Gemini.</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Devtools" /><category term="ai" /><summary type="html"><![CDATA[Recently, I posted an article on the AI Maturity Model. In that article, I proposed six stages of growth - from tactician or accidental editor to the agentic engineer. The biggest difference between the vibe coder and the agentic engineer is the transition to spec-driven design. But what does that really mean? The “vibe-coder” sends one massive prompt and hopes for a miracle. The agentic engineer understands that an LLM is a reasoning engine, not a magician. Spec-driven design provides three distinct documents: a Constitution, a Specification, and a Work Breakdown. Without the Spec: The agent will build a “technically correct” feature that doesn’t actually solve the user’s problem. Without the Constitution: The agent will suggest a library you hate or use an insecure pattern. Without the Breakdown: The agent tries to write thousands of lines of code at once and hallucinates halfway through. It doesn’t matter if you are using OpenSpec, SpecKit, or AWS Kiro for your spec-driven design. The concepts and the requirements are the same. The real insight here is that you have to be a cross between an architect and a product manager to be really good at this. Technical Product Managers (and I have been one for many years) are great at this stuff. I believe Product Architect is the role that will be advertised for this specific skill set in the future. Pillar 1: The Spec (The Intent) This is the “North Star” for the agent, defining the specific feature or product value and behaviour. Rule 1: Use a PM Framework Before defining a single UI element, anchor the spec in the user’s struggle. Use a framework like Jobs-to-be-Done to explain the context: “When I am [X], I want to [Y], so I can [Z]”. When the agent understands the intent, it makes better autonomous decisions when it hits a logical fork in the road that the spec didn’t explicitly cover. This is so important that I’m going to follow up on this topic in a future article. Rule 2: Explicit State Machine Mapping Agents excel at the “happy path” code but struggle with edge cases. This is why vibe-coding a proof of concept works wonderfully but taking it to production fails in so many cases. You must explicitly define the feature states, like idle, loading, success, and error. By mapping these transitions in the spec, you force the agent to write defensive code and robust error handling rather than optimistic, fragile snippets. Rule 3: The “User Persona” Context An agent building a tool for a “Senior devops engineer” should write different code (and UI) than one building for a “First-time retail customer.” Providing the persona ensures the agent defaults to the correct level of complexity, terminology, and UX friction without you having to prompt for it every time. Pillar 2: The Constitution (The Constraints) These are the “global rules” that ensure the agent stays within your specific tech stack and architectural style. Rule 4: The “Single Source of Truth” Schema My first spec forgot to define the data schema. I ended up with one table with an auto-incrementing user_id and another with a unique_user_id that was a UUID. Needless to say, things broke. Including the Drizzle schema.ts directly would have solved this. You constitution should define the data shapes before any logic is written. By grounding the agent, you prevent logic drift where the agent hallucinates database columns or variable types that don’t exist. Rule 5: Explicit Tech Stack and Dependency Declaration Specify your version (e.g. Next.js 15, Drizzle ORM, Cloudflare D1). If the constitution is vague, the agent will default to its oldest training data. Hard-coding your stack preferences ensures the agent doesn’t introduce hallucinated libraries or outdated patterns that break your build. You should also provide coding style decisions - e.g. JSDoc requirements, one class per file, or whatever else your organization (or you) decides is important. Rule 6: The “Out of Scope” List Telling an agent what not to do is often more important than telling it what to do. Use the constitution to set “negative constraints” - such as No external CSS libraries or No Node.js built-ins (v8 isolates only) (which is great for my current focus in Cloudflare development). This keeps the agent output lean and compatible with your tech stack. Pillar 3: The Work Breakdown This is the “Task Graph” that prevents the agent from losing its place during complex builds. Rule 7: Modular Decomposition Never ask an agent to build a “Page”. Ask it to build a “Module”. Break your feature into atomic units - schema, data access layer, and UI components. This prevents context collapse, where the agents reasoning degrades because the context size has exceeded its cognitive window. Rule 8: The Definition of Done (Verification) The work breakdown must include explicit acceptance criteria for every task. This allows the agent (or a secondary reviewer agent) to verify the work. If the agent can’t test its output against your criteria, it isn’t finished. Rule 9: Error handling and Edge Cases Every task in the breakdown should account for failure states. By itemizing specific edge cases (e.g. “what if the D1 query times out?”), you ensure the agent builds a resilient system rather than a “vibe-based” prototype that only works under perfect conditions. Rule 10: The Living Document Spec-Driven Design is not “waterfall” development. As the agent codes, it will discover ambiguities or technical blockers. Rule 10 dictates that you update the spec or constitution first, then let the agent retry. Never patch the code manually to fix a logic error; fix the instruction so the agentic loop remains clean and reproducible. You are responsible for each stage. It’s alright to defer some of the work to a reasoning LLM, but you are responsible for the output. Each document should be as concise as possible while still meeting the requirements for the document. I have found that LLMs love to be verbose. One project I was on, the LLM decided to say “Support RFC x” and then include the RFC in the document. Remember, size is context and context is token cost. Of course, after every change, you should ask the LLM to analyze the work breakdown, particularly the things that have not been started yet, to ensure that other changes are not needed. Final thoughts As I was writing this, I realized that this is just scraping the surface of this topic, and I will go into detail in future articles, especially the in-depth analysis for the rules around the spec itself. Thinking like a product manager does not come naturally to most people, so it’s time I wrote it down. If you are doing this right, you will be spending much more time on these documents than on running the AI. A recent project I completed is a good example. I spent over three weeks working through all the issues with the spec. Once I was happy, the product was done in less than two days, and could have been done faster if I had enabled a squad to help me orchestrate the work. Product thinking is not free, but it is the difference between a vibe-coded proof-of-concept and a maintainable production-ready product. AI Disclosure I wrote this article (as I do all my articles) by hand. However, the images are generated by Google Gemini.]]></summary></entry><entry><title type="html">Are we in an AI bubble or an AI revolution?</title><link href="https://adrianhall.github.io/posts/2026/2026-04-05-ai-bubble.html" rel="alternate" type="text/html" title="Are we in an AI bubble or an AI revolution?" /><published>2026-04-05T00:00:00-07:00</published><updated>2026-04-05T00:00:00-07:00</updated><id>https://adrianhall.github.io/posts/2026/ai-bubble</id><content type="html" xml:base="https://adrianhall.github.io/posts/2026/2026-04-05-ai-bubble.html"><![CDATA[<p>I read a lot on the Internet from the prognosticators about whether AI is terrible or awesome - are we in a bubble or a revolution in the way we work?</p>

<p>Why not both?</p>

<p><img src="/assets/images/2026/Apr05-ai-bubble-or-revolution.png" alt="AI Bubble or Revolution?" /></p>

<p>Perhaps the best way to explain this is by - as everyone else is - drawing parallels to the other bubble / revolution that we all experienced - the dot-com crash of 2000.  The year 2000 represents the moment where the financial excitement (the bubble) outpaced the actual infrastructure (the revolution).</p>

<h2 id="the-bubble-financial-euphoria--irrational-exuberance">The Bubble: Financial euphoria &amp; irrational exuberance</h2>

<p>The “bubble” aspect was driven by a disconnect between stock prices and business fundamentals.  In 2000, investors were terrified of missing out on the new economy.  Simply adding “.com” to a company name could double its stock price overnight, regardless of whether a company had a path to profitability.</p>

<p>Companies like pets.com and Webvan spent millions on Super Bowl ads and massive warehouses before they had a stable customer base.  Success was measured in “clicks” rather than cash flow.  Between 1995 and 2000, the NASDAQ composite spiked 800%.</p>

<p>When the fed raised interest rates and capital dried up, the index crashed by 78% wiping out $5 trillion in market value.  The bubble burst.</p>

<h2 id="the-revolution-the-silent-laying-of-the-infrastructure">The Revolution: The silent laying of the infrastructure</h2>

<p>While the stock market was collapsing, the “revolution” was quietly succeeding.  The bubble provided the capital necessary to build the world we live in now.</p>

<p>During the boom, companies laid thousands of miles of fiber-optic cable.  When they went bankrupt, the dark fiber remained in the ground, making high-speed internet incredibly cheap and accessible for the next generation of startups (like YouTube and Netflix).</p>

<p>The late 90s saw the birth of Amazon (1994) and Google (1998).  While their stocks definitely took a hit in 2000, their underlying technologies - scalable e-commerce and algorithmic search - were fundamentally sound.  More importantly, they had already moved on from clicks to cash flow.</p>

<p>The revolution wasn’t just code; it was a shift in human psychology.  By 2000, the idea of buying a book online or sending an email had moved from “fringe” to “inevitable”.</p>

<h2 id="the-ai-parallel">The AI parallel</h2>

<p>The similarities are striking:</p>

<ul>
  <li>In 2000, there was massive VC funding with no revenue models.  Today, billions are poured into LLM startups with massive compute cost and no sustainable plan on monetization.</li>
  <li>In 2000, the big infrastructure play was fiber-optic cables and early servers.  Todays companies are building out GPU clusters and massive data centers.</li>
  <li>In 2000, the play was to sell whatever the idea was on the web instead of bricks and mortar stored.  Today, we are putting “AI” into every toaster, refridgerator, and toothbrush.</li>
  <li>In 2000, there was a shift from physical to digitial information and commerce.  Today, there is a shift from manual creation to generative intelligence.</li>
</ul>

<h2 id="final-thoughts">Final thoughts</h2>

<p>In 2000, the “Internet” was right, but many “Internet companies” were wrong.  The revolution survived the bubble because the utility of the technology was real, even if the valuations of the companies providing it were a fantasy.</p>

<p>Today, “AI” is right, but many of todays products and companies will fail because AI is being shoe-horned into products that don’t need it.  Just like in 2000, don’t expect the early companies to be the ones that win the day.  I don’t expect OpenAI or Anthropic to survive in their current form.  The cash burn is too large for the monetization event.</p>

<p>What will survive?  I suspect the infrastructure - the GPU clusters needed to drive AI - will still be needed, so expect the hyperscalers to come out - not unscathed - but with the facilities to really take advantage.  You can also bet that the underlying utility plays will do ok.  Think the comment system that automatically regulates itself, or the fraud scanners that are suddenly better at their job.</p>

<p>What will emerge will affect every single industry.  And how we do work will have changed forever.</p>

<h2 id="ai-disclosure">AI Disclosure</h2>

<p>All my blogs are written by hand.  The images are produced using Google Gemini.</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Devtools" /><category term="ai" /><summary type="html"><![CDATA[I read a lot on the Internet from the prognosticators about whether AI is terrible or awesome - are we in a bubble or a revolution in the way we work? Why not both? Perhaps the best way to explain this is by - as everyone else is - drawing parallels to the other bubble / revolution that we all experienced - the dot-com crash of 2000. The year 2000 represents the moment where the financial excitement (the bubble) outpaced the actual infrastructure (the revolution). The Bubble: Financial euphoria &amp; irrational exuberance The “bubble” aspect was driven by a disconnect between stock prices and business fundamentals. In 2000, investors were terrified of missing out on the new economy. Simply adding “.com” to a company name could double its stock price overnight, regardless of whether a company had a path to profitability. Companies like pets.com and Webvan spent millions on Super Bowl ads and massive warehouses before they had a stable customer base. Success was measured in “clicks” rather than cash flow. Between 1995 and 2000, the NASDAQ composite spiked 800%. When the fed raised interest rates and capital dried up, the index crashed by 78% wiping out $5 trillion in market value. The bubble burst. The Revolution: The silent laying of the infrastructure While the stock market was collapsing, the “revolution” was quietly succeeding. The bubble provided the capital necessary to build the world we live in now. During the boom, companies laid thousands of miles of fiber-optic cable. When they went bankrupt, the dark fiber remained in the ground, making high-speed internet incredibly cheap and accessible for the next generation of startups (like YouTube and Netflix). The late 90s saw the birth of Amazon (1994) and Google (1998). While their stocks definitely took a hit in 2000, their underlying technologies - scalable e-commerce and algorithmic search - were fundamentally sound. More importantly, they had already moved on from clicks to cash flow. The revolution wasn’t just code; it was a shift in human psychology. By 2000, the idea of buying a book online or sending an email had moved from “fringe” to “inevitable”. The AI parallel The similarities are striking: In 2000, there was massive VC funding with no revenue models. Today, billions are poured into LLM startups with massive compute cost and no sustainable plan on monetization. In 2000, the big infrastructure play was fiber-optic cables and early servers. Todays companies are building out GPU clusters and massive data centers. In 2000, the play was to sell whatever the idea was on the web instead of bricks and mortar stored. Today, we are putting “AI” into every toaster, refridgerator, and toothbrush. In 2000, there was a shift from physical to digitial information and commerce. Today, there is a shift from manual creation to generative intelligence. Final thoughts In 2000, the “Internet” was right, but many “Internet companies” were wrong. The revolution survived the bubble because the utility of the technology was real, even if the valuations of the companies providing it were a fantasy. Today, “AI” is right, but many of todays products and companies will fail because AI is being shoe-horned into products that don’t need it. Just like in 2000, don’t expect the early companies to be the ones that win the day. I don’t expect OpenAI or Anthropic to survive in their current form. The cash burn is too large for the monetization event. What will survive? I suspect the infrastructure - the GPU clusters needed to drive AI - will still be needed, so expect the hyperscalers to come out - not unscathed - but with the facilities to really take advantage. You can also bet that the underlying utility plays will do ok. Think the comment system that automatically regulates itself, or the fraud scanners that are suddenly better at their job. What will emerge will affect every single industry. And how we do work will have changed forever. AI Disclosure All my blogs are written by hand. The images are produced using Google Gemini.]]></summary></entry><entry><title type="html">The AI Maturity Model</title><link href="https://adrianhall.github.io/posts/2026/2026-04-03-ai-discussion.html" rel="alternate" type="text/html" title="The AI Maturity Model" /><published>2026-04-03T00:00:00-07:00</published><updated>2026-04-03T00:00:00-07:00</updated><id>https://adrianhall.github.io/posts/2026/ai-discussion</id><content type="html" xml:base="https://adrianhall.github.io/posts/2026/2026-04-03-ai-discussion.html"><![CDATA[<p>I have a confession to make.  I don’t write code any more, and I haven’t written any code since August of last year.  I wrote a little about my journey to AI nirvana (see my posts on <a href="/posts/2025/2025-08-01-ai-editors.html">AI Editors</a>, <a href="/posts/2025/2025-08-01-oss-ai-editors.html">OSS AI Editors</a>, or <a href="/posts/2025/2025-12-06-spec-kit.html">SpecKit</a>).  I’m guessing that everyone goes through the same journey - from distrusting that AI will do a good job to using AI for everything.</p>

<p>The good news is that software development has not really changed.  The job was never about the code, even though the code was the tangible output of the work.  It was about understanding problems, designing systems, thinking about edge cases and failure modes, and ensuring that a user has a great experience when using the product.  None of that has changed (and yes, AI has made even this bit easier - but not replaced it yet).</p>

<p>I came up with an <em>AI Maturity Model</em> - you can map your experience onto it and thus determine what you should be investigating to take better advantage of AI facilities.</p>

<p><img src="/assets/images/2026/Apr03-ai-maturity-model.jpg" alt="The AI Maturity Model" /></p>

<h2 id="stage-1-the-nano-assitant">Stage 1: The Nano Assitant</h2>

<ul>
  <li>Trust Level: You trust the AI to finish a sentence.</li>
  <li>Control Level: You are the driver; it’s the power steering.</li>
</ul>

<p>My journey started in Visual Studio Code.  GitHub Copilot was installed and all of a sudden, the in-editor prompts improve.  Yeah - that’s AI doing that.  It’s not in your face.  Sometimes it was a single line; sometimes it was a whole function.</p>

<p>The problem, as I saw it when I was in this phase, is that the AI should have been reading my mind.  However, it’s just predicting the next word and so it got it wrong as often as it got it right.  It also didn’t write it exactly as I would.  I didn’t trust it, so I spent a lot of time pressing escape to substitute my own code.</p>

<p>No, you aren’t vibe-coding yet.  The editor is just introducing a more intelligent helper.</p>

<h2 id="stage-2-the-junior-consultant">Stage 2: The Junior Consultant</h2>

<ul>
  <li>Trust Level: You trust the AI to explain a block of code or refactor a function.</li>
  <li>Control Level: Side-bar chat.  You provide snippets; it provides advice.</li>
</ul>

<p>At some point, you give in and try the chat function.  After all, it’s always sitting there begging to be used.  You start with some basic stuff.  Mine was with my <a href="https://github.com/CommunityToolkit/Datasync">OSS Project - the Datasync Community Toolkit</a>.  There is a pretty hairy piece of logic for synchronizing data.  I figured it can’t hurt.  It walked me through what was happening.  At this point, I could see the bug and proceeded to correct it myself (with some help for AI-assisted auto-complete).</p>

<p>GitHub Copilot had added ask and edit mode, so I did do a few sessions where I highlighted a piece of code, and told it what was going on.  It then told me what the code should be, and I just told it to implement it.</p>

<p>You are still not vibe-coding, but you are developing trust in the code that the AI writes.</p>

<h2 id="stage-3-the-project-navigator">Stage 3: The Project Navigator</h2>

<ul>
  <li>Trust Level: You trust the AI to find things across your whole repository.</li>
  <li>Control Level: It’s answering complex questions and doing multi-file editing, but you still have an opt-out and review all the code it writes.</li>
</ul>

<p>So, you download <a href="https://cursor.com">Cursor</a> or add in a new plugin (maybe <a href="https://www.continue.dev/">continue.dev</a> or <a href="https://cline.bot/">cline</a>).  These all index your source code, so you can start asking more complex questions (like “how does authentication work in my repo?”) and doing multi-file edits (like “implement rate-limiting on the API surface”).  The AI will dutifully determine what is going on and make all the edits for you.  You can cycle through each change and decide whether to accept it or not.</p>

<p>You’ll get more and more trust here, and probably decide not to babysit the AI any more.  If it works and the tests pass, why bother?  This is the point at which you decide the code is not important.</p>

<p>Congratulation, you are now vibe-coding.  Coincidentally, this is also the time you are likely to buy a subscription to a coding AI service like Anthropic.</p>

<h2 id="stage-4-the-autonomous-operator">Stage 4: The Autonomous Operator</h2>

<ul>
  <li>Trust Level: You trust the AI to run commands, fix bugs, and execute “Plan -&gt; Act -&gt; Observe” loops.</li>
  <li>Control Level: You give a high level goal; it navigates the files and runs tests until it’s fixed.</li>
</ul>

<p>At some point, you’ll wonder why you are in the editor at all.  After all, the AI is doing all the work.  You are just doing some prompting; you’ve learned that context is king, so your prompts become files.  You want to do more because the AI is actually helping now.</p>

<p>This is the point at which you learn about <a href="https://code.claude.com/docs/en/overview">Claude Code</a> or <a href="https://opencode.ai/">OpenCode</a>.  You live in the terminal so you can run multiple sessions at the same time.  You learn about <a href="https://git-scm.com/docs/git-worktree">git worktrees</a> to manage independent work streams.  You are likely trusting the AI to review the code in a pull request.</p>

<p>Yes, you are still vibe-coding.  Vibe-coding is when you write everything in a prompt and allow the AI to go at it until complete.  It will miss edge cases, be badly designed, and not maintainable.  However, it’s great for a proof-of-concept.</p>

<h2 id="stage-5-the-spec-driven-architect">Stage 5: The Spec-Driven Architect</h2>

<ul>
  <li>Trust Level: You trust the AI to interpret a blueprint rather than a prompt.</li>
  <li>Control Level: You write the blueprint; the AI writes the code; you review the code.</li>
</ul>

<p>You’ll have a bad experience vibe-coding that will require you to undo hours of work.  You will feel frustrated, but you are a software engineer.  You need to re-assert your design mandate.  Enter spec-driven design, most notably via <a href="https://github.blog/ai-and-ml/generative-ai/spec-driven-development-with-ai-get-started-with-a-new-open-source-toolkit/">SpecKit</a> or <a href="https://openspec.dev/">OpenSpec</a>.</p>

<p>You are still in the terminal, but also are back in the editor.  You start with the prompt for the idea, but - instead of telling the AI to just go do it - you ask it to create a blueprint instead.  SpecKit and OpenSpec work pretty much the same way.  The blueprint becomes a contract, work breakdown is done, edge cases are discussed.  The files are much larger.  But no code is being written.</p>

<p>When you are happy with the spec, you can now determine how many agents are going to work on it.  Some work will have predecessors, but some work will be able to be done in parallel.  Once the work is done, you’ve got a product with tests, CI/CD, and a user manual.  The smaller context that this technique provides allows for more precision.  You aren’t writing code, but you are directing and managing the work.</p>

<p>You’ve entered the realm of agentic engineering and can now call yourself an AI Engineer or Agentic AI Engineer on your resume.</p>

<h2 id="stage-6-the-squad-leader">Stage 6: The Squad Leader</h2>

<ul>
  <li>Trust Level: You hire a team of specialized agents to collaborate.</li>
  <li>Control Level: You are a manager of intelligence - you define the vision, set the constraints, and verify the final integration</li>
</ul>

<p>This is the last step in your journey.  In Stage 5, you were the organizer of the agents, kicking off each agent in turn in separate terminals.  In Stage 6, you delegate that to an orchestrator agent and it decides which agents to fire up.  You can, quite literally, spend a couple of hours refining your spec on a Friday afternoon and then leave the agent squad to it over the weekend.</p>

<p>The tools here are <a href="https://bradygaster.github.io/squad/">Squad</a> or <a href="https://crewai.com/">CrewAI</a> - they work in slightly different ways, but the ultimate goal is the same - you are the architect and the AI agents are your development team.  Let them do their job.</p>

<p>And yes, this is where I am now.  I don’t know if there will be another stage, but I’m definitely more productive now than I was six months ago, despite not writing code.  You just have to remember why you became a software engineer in the first place - to build compelling products.</p>

<h2 id="final-thoughts">Final thoughts</h2>

<p>You’ll notice that there are pairs of stages here, with an early and late stage:</p>

<ul>
  <li>The Tactician, using AI as a tool.</li>
  <li>The Lead Dev, delegating coding to the AI.</li>
  <li>The Architect, orchestrating multiple agents to fulfill a vision.</li>
</ul>

<p>No matter where you are in the journey, it’s about the level of trust you have in the AI to do the right thing, and the level of control you give up to become more productive.</p>

<p>I now spend my time thinking up ideas.  Half the time, those ideas are similar enough to someone elses idea that I have to decide whether I still want to do it.  But when I do want to do something, I’ve got the tools and the skills to do a good job.</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Devtools" /><category term="ai" /><summary type="html"><![CDATA[I have a confession to make. I don’t write code any more, and I haven’t written any code since August of last year. I wrote a little about my journey to AI nirvana (see my posts on AI Editors, OSS AI Editors, or SpecKit). I’m guessing that everyone goes through the same journey - from distrusting that AI will do a good job to using AI for everything. The good news is that software development has not really changed. The job was never about the code, even though the code was the tangible output of the work. It was about understanding problems, designing systems, thinking about edge cases and failure modes, and ensuring that a user has a great experience when using the product. None of that has changed (and yes, AI has made even this bit easier - but not replaced it yet). I came up with an AI Maturity Model - you can map your experience onto it and thus determine what you should be investigating to take better advantage of AI facilities. Stage 1: The Nano Assitant Trust Level: You trust the AI to finish a sentence. Control Level: You are the driver; it’s the power steering. My journey started in Visual Studio Code. GitHub Copilot was installed and all of a sudden, the in-editor prompts improve. Yeah - that’s AI doing that. It’s not in your face. Sometimes it was a single line; sometimes it was a whole function. The problem, as I saw it when I was in this phase, is that the AI should have been reading my mind. However, it’s just predicting the next word and so it got it wrong as often as it got it right. It also didn’t write it exactly as I would. I didn’t trust it, so I spent a lot of time pressing escape to substitute my own code. No, you aren’t vibe-coding yet. The editor is just introducing a more intelligent helper. Stage 2: The Junior Consultant Trust Level: You trust the AI to explain a block of code or refactor a function. Control Level: Side-bar chat. You provide snippets; it provides advice. At some point, you give in and try the chat function. After all, it’s always sitting there begging to be used. You start with some basic stuff. Mine was with my OSS Project - the Datasync Community Toolkit. There is a pretty hairy piece of logic for synchronizing data. I figured it can’t hurt. It walked me through what was happening. At this point, I could see the bug and proceeded to correct it myself (with some help for AI-assisted auto-complete). GitHub Copilot had added ask and edit mode, so I did do a few sessions where I highlighted a piece of code, and told it what was going on. It then told me what the code should be, and I just told it to implement it. You are still not vibe-coding, but you are developing trust in the code that the AI writes. Stage 3: The Project Navigator Trust Level: You trust the AI to find things across your whole repository. Control Level: It’s answering complex questions and doing multi-file editing, but you still have an opt-out and review all the code it writes. So, you download Cursor or add in a new plugin (maybe continue.dev or cline). These all index your source code, so you can start asking more complex questions (like “how does authentication work in my repo?”) and doing multi-file edits (like “implement rate-limiting on the API surface”). The AI will dutifully determine what is going on and make all the edits for you. You can cycle through each change and decide whether to accept it or not. You’ll get more and more trust here, and probably decide not to babysit the AI any more. If it works and the tests pass, why bother? This is the point at which you decide the code is not important. Congratulation, you are now vibe-coding. Coincidentally, this is also the time you are likely to buy a subscription to a coding AI service like Anthropic. Stage 4: The Autonomous Operator Trust Level: You trust the AI to run commands, fix bugs, and execute “Plan -&gt; Act -&gt; Observe” loops. Control Level: You give a high level goal; it navigates the files and runs tests until it’s fixed. At some point, you’ll wonder why you are in the editor at all. After all, the AI is doing all the work. You are just doing some prompting; you’ve learned that context is king, so your prompts become files. You want to do more because the AI is actually helping now. This is the point at which you learn about Claude Code or OpenCode. You live in the terminal so you can run multiple sessions at the same time. You learn about git worktrees to manage independent work streams. You are likely trusting the AI to review the code in a pull request. Yes, you are still vibe-coding. Vibe-coding is when you write everything in a prompt and allow the AI to go at it until complete. It will miss edge cases, be badly designed, and not maintainable. However, it’s great for a proof-of-concept. Stage 5: The Spec-Driven Architect Trust Level: You trust the AI to interpret a blueprint rather than a prompt. Control Level: You write the blueprint; the AI writes the code; you review the code. You’ll have a bad experience vibe-coding that will require you to undo hours of work. You will feel frustrated, but you are a software engineer. You need to re-assert your design mandate. Enter spec-driven design, most notably via SpecKit or OpenSpec. You are still in the terminal, but also are back in the editor. You start with the prompt for the idea, but - instead of telling the AI to just go do it - you ask it to create a blueprint instead. SpecKit and OpenSpec work pretty much the same way. The blueprint becomes a contract, work breakdown is done, edge cases are discussed. The files are much larger. But no code is being written. When you are happy with the spec, you can now determine how many agents are going to work on it. Some work will have predecessors, but some work will be able to be done in parallel. Once the work is done, you’ve got a product with tests, CI/CD, and a user manual. The smaller context that this technique provides allows for more precision. You aren’t writing code, but you are directing and managing the work. You’ve entered the realm of agentic engineering and can now call yourself an AI Engineer or Agentic AI Engineer on your resume. Stage 6: The Squad Leader Trust Level: You hire a team of specialized agents to collaborate. Control Level: You are a manager of intelligence - you define the vision, set the constraints, and verify the final integration This is the last step in your journey. In Stage 5, you were the organizer of the agents, kicking off each agent in turn in separate terminals. In Stage 6, you delegate that to an orchestrator agent and it decides which agents to fire up. You can, quite literally, spend a couple of hours refining your spec on a Friday afternoon and then leave the agent squad to it over the weekend. The tools here are Squad or CrewAI - they work in slightly different ways, but the ultimate goal is the same - you are the architect and the AI agents are your development team. Let them do their job. And yes, this is where I am now. I don’t know if there will be another stage, but I’m definitely more productive now than I was six months ago, despite not writing code. You just have to remember why you became a software engineer in the first place - to build compelling products. Final thoughts You’ll notice that there are pairs of stages here, with an early and late stage: The Tactician, using AI as a tool. The Lead Dev, delegating coding to the AI. The Architect, orchestrating multiple agents to fulfill a vision. No matter where you are in the journey, it’s about the level of trust you have in the AI to do the right thing, and the level of control you give up to become more productive. I now spend my time thinking up ideas. Half the time, those ideas are similar enough to someone elses idea that I have to decide whether I still want to do it. But when I do want to do something, I’ve got the tools and the skills to do a good job.]]></summary></entry><entry><title type="html">Using SpecKit with multiple AI agents</title><link href="https://adrianhall.github.io/posts/2025/2025-12-06-spec-kit.html" rel="alternate" type="text/html" title="Using SpecKit with multiple AI agents" /><published>2025-12-06T00:00:00-08:00</published><updated>2025-12-06T00:00:00-08:00</updated><id>https://adrianhall.github.io/posts/2025/spec-kit</id><content type="html" xml:base="https://adrianhall.github.io/posts/2025/2025-12-06-spec-kit.html"><![CDATA[<p>I wanted to put down my workflow and all the details for using <a href="https://speckit.org/">SpecKit</a> using a superior reasoning and planning LLM (like <a href="https://www.anthropic.com/claude/opus">Opus 4.5</a> via <a href="https://code.claude.com/docs/en/setup">Claude Code</a>, or <a href="https://deepmind.google/models/gemini/pro/">Gemini 3</a> via the <a href="https://geminicli.com/">Gemini CLI</a>) for the initial phases, but switching to an agentic AI IDE like <a href="https://cursor.com/">Cursor</a> or <a href="https://code.visualstudio.com/docs/copilot/overview">GitHub Copilot in VS Code</a> for the final coding phases.  I’m using <a href="https://git-scm.com/">Git</a> and the <a href="https://cli.github.com/">GitHub CLI</a> for repository actions.</p>

<p>This workflow does not include “authenticating”.  You still need to configure and log in to each tool separately.  They’ll generally prompt you when you need to do so.</p>

<h2 id="install-dependencies">Install dependencies</h2>

<p>You need to install the following tools.  I use <a href="">Winget</a> where I can, since I’m on a Windows machine.  These instructions are pretty much the same on MacOS or Linux; however, the method of installation is different.</p>

<ul>
  <li><strong>Python 3.11+</strong> for running the SpecKit CLI.</li>
  <li><strong>UV</strong> is the Python Package Manager that SpecKit prefers.</li>
  <li><strong>GitHub CLI</strong> for repository management.</li>
  <li><strong>Claude Code</strong> or <strong>Gemini CLI</strong> for running the planning phase.</li>
  <li><strong>Visual Studio Code</strong> or <strong>Cursor</strong> for running the execution phase.</li>
  <li><strong>SpecKit Companiion</strong> (if using VS Code) for running SpecKit inside the editor.</li>
</ul>

<p>You can install all this using the following:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">winget</span><span class="w"> </span><span class="nx">install</span><span class="w"> </span><span class="nt">--id</span><span class="w"> </span><span class="nx">Python.Python.3.12</span><span class="w">
</span><span class="n">winget</span><span class="w"> </span><span class="nx">install</span><span class="w"> </span><span class="nt">--id</span><span class="w"> </span><span class="nx">astral-sh.uv</span><span class="w">
</span><span class="n">winget</span><span class="w"> </span><span class="nx">install</span><span class="w"> </span><span class="nt">--id</span><span class="w"> </span><span class="nx">GitHub.CLI</span><span class="w">
</span><span class="n">winget</span><span class="w"> </span><span class="nx">install</span><span class="w"> </span><span class="nt">--id</span><span class="w"> </span><span class="nx">Anthropic.ClaudeCode</span><span class="w">
</span><span class="n">winget</span><span class="w"> </span><span class="nx">install</span><span class="w"> </span><span class="nt">--id</span><span class="w"> </span><span class="nx">Microsoft.VisualStudioCode</span><span class="w">
</span><span class="n">winget</span><span class="w"> </span><span class="nx">install</span><span class="w"> </span><span class="nt">--id</span><span class="w"> </span><span class="nx">Microsoft.VisualStudioCode.CLI</span><span class="w">
</span></code></pre></div></div>

<p>I prefer GitHub Copilot over Cursor, but you can swap that out.  <a href="https://marketplace.visualstudio.com/items?itemName=alfredoperez.speckit-companion">SpecKit Companion</a> is available on the Visual Studio Code extension marketplace.</p>

<p>Once you’ve done all that, you’ll also need to install SpecKit with the following command:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">uv</span><span class="w"> </span><span class="nx">tool</span><span class="w"> </span><span class="nx">install</span><span class="w"> </span><span class="nx">specify-cli</span><span class="w"> </span><span class="nt">--from</span><span class="w"> </span><span class="nx">git</span><span class="o">+</span><span class="nx">https://github.com/github/spec-kit.git</span><span class="w">

</span><span class="c"># Verify installation</span><span class="w">
</span><span class="n">specify</span><span class="w"> </span><span class="nt">--version</span><span class="w">
</span></code></pre></div></div>

<p>Check out the <a href="https://speckit.org/">Quick Start Guide</a> for Spec Kit to familiarize yourself with the application and also to verify that the install instructions are still correct.</p>

<h2 id="initialize-a-project">Initialize a project</h2>

<p>Start by using the GitHub CLI to create a new remote repository and Spec Kit to scaffold the SDD (Spec Driven Development) files locally:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create a remote repository on GitHub (private or public as desired)</span><span class="w">
</span><span class="n">gh</span><span class="w"> </span><span class="nx">repo</span><span class="w"> </span><span class="nx">create</span><span class="w"> </span><span class="nx">my-project</span><span class="w"> </span><span class="nt">--public</span><span class="w"> </span><span class="nt">--clone</span><span class="w"> </span><span class="nt">--confirm</span><span class="w">

</span><span class="c"># Change directory into the new project folder</span><span class="w">
</span><span class="n">cd</span><span class="w"> </span><span class="nx">my-project</span><span class="w">
</span></code></pre></div></div>

<p>At this point, I like to write a <code class="language-plaintext highlighter-rouge">README.md</code> which describes the project as a whole quickly.  However, this is optional.  Once done, use the installed specify command to create the necessary templates and structure:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">specify</span><span class="w"> </span><span class="nx">init</span><span class="w"> </span><span class="o">.</span><span class="w">
</span></code></pre></div></div>

<p>This will generate the <code class="language-plaintext highlighter-rouge">.specify</code> folder in your project.</p>

<h2 id="specification-and-planning-phase">Specification and Planning phase</h2>

<p>Use <strong>Claude Code</strong> for the high-level reasoning and planning phases.  Claude’s large context window and strong reasoning (Opus 4.5) excel at creating accurate, unambiguous artifacts.  You could also use Gemini 3 here (I haven’t, so can’t comment on quality).</p>

<h3 id="establish-a-constitution">Establish a constitution</h3>

<p>The consitution defines the non-negotiable rules for your project (e.g., tech stack, testing standards).</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start the claude code interactive session</span><span class="w">
</span><span class="n">claude</span><span class="w">

</span><span class="c"># You may need to sign in with /login</span><span class="w">
</span><span class="n">/login</span><span class="w">

</span><span class="c"># Run the constitution command with your prompt</span><span class="w">
</span><span class="n">/speckit.constitution</span><span class="w"> </span><span class="nx">Create</span><span class="w"> </span><span class="nx">principles</span><span class="w"> </span><span class="nx">focused</span><span class="w"> </span><span class="nx">on</span><span class="w"> </span><span class="nx">using</span><span class="w"> </span><span class="nx">dotnet</span><span class="w"> </span><span class="nx">10</span><span class="w"> </span><span class="nx">with</span><span class="w"> </span><span class="nx">Blazor</span><span class="w"> </span><span class="nx">and</span><span class="w"> </span><span class="nx">MudBlazor</span><span class="p">,</span><span class="w"> </span><span class="nx">90</span><span class="o">%</span><span class="w"> </span><span class="nx">unit</span><span class="w"> </span><span class="nx">test</span><span class="w"> </span><span class="nx">coverage</span><span class="w"> </span><span class="nx">for</span><span class="w"> </span><span class="nx">all</span><span class="w"> </span><span class="nx">logic</span><span class="p">,</span><span class="w"> </span><span class="nx">mermaid</span><span class="w"> </span><span class="nx">for</span><span class="w"> </span><span class="nx">specification</span><span class="w"> </span><span class="nx">diagrams</span><span class="w">
</span></code></pre></div></div>

<p>This generates a file <code class="language-plaintext highlighter-rouge">.specify/memory/constitution.md</code> - you should review this file and make any changes you need.</p>

<h3 id="create-the-specification">Create the specification</h3>

<p>The specification is the “what” and “why” of your project or feature.</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Specify the feature name</span><span class="w">
</span><span class="nv">$</span><span class="nn">env</span><span class="p">:</span><span class="nv">SPECIFY_FEATURE</span><span class="o">=</span><span class="s2">"001-mvp"</span><span class="w">

</span><span class="c"># Run Claude Code</span><span class="w">
</span><span class="n">claude</span><span class="w">

</span><span class="c"># Build the specification</span><span class="w">
</span><span class="n">/speckit.specify</span><span class="w"> </span><span class="nx">Build</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="nx">task</span><span class="w"> </span><span class="nx">management</span><span class="w"> </span><span class="nx">app</span><span class="w"> </span><span class="nx">with</span><span class="w"> </span><span class="nx">user</span><span class="w"> </span><span class="nx">authentication</span><span class="p">,</span><span class="w"> </span><span class="nx">real-time</span><span class="w"> </span><span class="nx">collaboration</span><span class="p">,</span><span class="w"> </span><span class="nx">and</span><span class="w"> </span><span class="nx">mobile</span><span class="w"> </span><span class="nx">support.</span><span class="w"> </span><span class="nx">Users</span><span class="w"> </span><span class="nx">should</span><span class="w"> </span><span class="nx">be</span><span class="w"> </span><span class="nx">able</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">create</span><span class="w"> </span><span class="nx">projects</span><span class="p">,</span><span class="w"> </span><span class="nx">assign</span><span class="w"> </span><span class="nx">tasks</span><span class="p">,</span><span class="w"> </span><span class="nx">and</span><span class="w"> </span><span class="nx">track</span><span class="w"> </span><span class="nx">progress</span><span class="w"> </span><span class="nx">with</span><span class="w"> </span><span class="nx">Kanban</span><span class="w"> </span><span class="nx">boards.</span><span class="w">
</span></code></pre></div></div>

<p>This generates the file <code class="language-plaintext highlighter-rouge">specs/001-mvp.md</code> - the core blueprint.  The specification translates your high-level natural language prompt into a structured document containing: <strong>Goal</strong>, <strong>User Scenarios</strong>, <strong>Functional Requirements</strong>, and <strong>Acceptance Criteria</strong>.  You must ensure the agent accurately captures all of your intent and doesn’t introduce unwanted complexity or omit necessary features.</p>

<p><strong>Review checklist:</strong></p>

<ul>
  <li><strong>Completeness</strong>: Does the specification cover everything you intended for the feature?</li>
  <li><strong>Accuracy</strong>: Are the requirements correctly stated?</li>
  <li><strong>Unambiguity</strong>: Is every requirement clear?</li>
  <li><strong>Adherence to constitution</strong>: Does the specification violate any rule you set in the constitution?</li>
</ul>

<p>Now that you have a base specification, you can clarify and refine the specification using generative AI:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">/speckit.clarify</span><span class="w">
</span></code></pre></div></div>

<p>The clarify command doesn’t generate a new file.  Instead, it generates direct output in the Claude Code terminal (or chat window), allowing you to further refine the specification through a question and answer session.</p>

<p><strong>Review checklist:</strong></p>
<ul>
  <li><strong>Understanding</strong>: Does the question highlight a genuine missing piece of information?</li>
  <li><strong>Completeness</strong>: Your human responsibility is to then go back and directly edit the spec file to answer these questions directly.</li>
  <li><strong>Iteration</strong>: You may run <code class="language-plaintext highlighter-rouge">/speckit.clarify</code> multiple times until the agent returns minimal or no questions, indicating the specification is robust and complete.</li>
</ul>

<p>If the clarify stage reveals a fundamental misunderstanding, you should not just edit the spec file manually.  Re-run the specify command with a more detailed prompt to regenerate the baseline structure before refining it.</p>

<p>After final edits (and before committing), you should do one final check on the specification file before committing it.</p>

<h3 id="create-the-technical-plan">Create the technical plan</h3>

<p>The plan is the “how” of the project or feature.</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">/speckit.plan</span><span class="w"> </span><span class="nx">Build</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">base</span><span class="w"> </span><span class="nx">ASP.NET</span><span class="w"> </span><span class="nx">Core</span><span class="w"> </span><span class="nx">application</span><span class="w"> </span><span class="nx">with</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">health</span><span class="w"> </span><span class="nx">endpoints.</span><span class="w">
</span></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">/speckit.plan</code> command instructs the AI agent to read the approved specification and the constitution and generate a technical plan.  This is placed in <code class="language-plaintext highlighter-rouge">specs/001-mvp/plan.md</code>.  Depending on the complexity of the prompt, the agent may also create supplementary files, such as:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">data-model.md</code></li>
  <li><code class="language-plaintext highlighter-rouge">api-spec.md</code></li>
  <li><code class="language-plaintext highlighter-rouge">architecture.png</code></li>
</ul>

<p><strong>Review checklist:</strong></p>

<p>The plan is where the AI defines the architecture, so your review must focus on technical suitability and adherence to standards.</p>

<ul>
  <li><strong>Architecture &amp; Design</strong>: Does the plan use the correct design patterns?  Is the proposed solution over-engineers or under-engineered for the feature size?</li>
  <li><strong>Tech Stack Adherence</strong>: Does the plan strictly follow the rules laid out in your <code class="language-plaintext highlighter-rouge">constitution.md</code>?  For example, if you forbade ORMs or required a specific testing framework, the plan must reflect this.</li>
  <li><strong>Data Models</strong>: Are the proposed database schemas (tables, fields, relationships) correct, efficient, and complete for the features needs?</li>
  <li><strong>API Contracts</strong>: Are the proposed API endpoints, request/response bodies, and error codes logical and consistent with existing system contracts?</li>
  <li><strong>Constraints</strong>: Does the plan account for non-functional requirements from the specification, such as performance targets, security, and scalability?</li>
</ul>

<h3 id="break-the-plan-into-granular-tasks">Break the plan into granular tasks</h3>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">/speckit.tasks</span><span class="w"> </span><span class="nx">Break</span><span class="w"> </span><span class="nx">the</span><span class="w"> </span><span class="nx">plan</span><span class="w"> </span><span class="nx">into</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="nx">list</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="nx">atomic</span><span class="p">,</span><span class="w"> </span><span class="nx">testable</span><span class="w"> </span><span class="nx">tasks.</span><span class="w">
</span></code></pre></div></div>

<p>The output is the <code class="language-plaintext highlighter-rouge">.specify/tasks.md</code> file, which is a sequential checklist of small work items (e.g. <code class="language-plaintext highlighter-rouge">T001: Create data model</code>).</p>

<p>This concludes the planning phase; exit Claude Code and commit the final artifacts before switching:</p>

<div class="language-powershell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">git</span><span class="w"> </span><span class="nx">add</span><span class="w"> </span><span class="o">.</span><span class="nf">specify</span><span class="nx">/</span><span class="w">
</span><span class="n">gh</span><span class="w"> </span><span class="nx">commit</span><span class="w"> </span><span class="nt">-m</span><span class="w"> </span><span class="s2">"feat: [MVP] MVP Specification and technical plan"</span><span class="w">
</span></code></pre></div></div>

<h2 id="implementation-phase">Implementation phase</h2>

<p>Now switch to the AI IDE that you prefer for implementation speed.  Spec Kits artifacts provide all the context necessary to complete each task in the task list.</p>

<h3 id="set-up-the-environment">Set up the environment</h3>

<p>Open the project in  your chosen IDE (VS Code for Copilot, or the Cursor IDE).  Ensure the respective AI tool is active and authenticated, and that your chosen language / runtime has been installed and configured correctly.  This is also a good time to ensure your testing environment is available (e.g. if you are using TestContainers, then start the Docker engine).</p>

<h3 id="implement-tasks">Implement tasks</h3>

<p>You now need to instruct the agent to execute the tasks that were generated:</p>

<p><strong>GitHub Copilot Chat</strong>:</p>

<ol>
  <li>Open the <strong>Copilot Chat</strong> window</li>
  <li>Ask it to implement the tasks with the prompt: <code class="language-plaintext highlighter-rouge">Read .specify/tasks.md and implement task 1 following the plan in .specify/plan.md</code>.</li>
</ol>

<p><strong>Cursor</strong>:</p>

<ol>
  <li>Open the <strong>Cursor Agent/Chat</strong> window.</li>
  <li>Instruct it in Max Mode: <code class="language-plaintext highlighter-rouge">Using the tasks defined in .specify/tasks.md, implement task 1 following the plan in .specify/plan.md</code></li>
</ol>

<p>After each task, you should review the new code (the “compare window” is a good choice here), run any tests, and then commit the code.</p>

<h2 id="configuring-cusor-and-github-copilot">Configuring Cusor and GitHub Copilot</h2>

<p>When you see <code class="language-plaintext highlighter-rouge">/speckit.implement</code> in tutorials, it is usually referring to a pre-configured “slash command” that pastes a specific, complex prompt into the AI’s chat window.  It tells the AI “<em>Read the plan and tasks files, then write the code for the active task.</em>”  Since this workflow uses two different LLMs, you don’t necessarily get the same slash commands, but you can emulate them.</p>

<h3 id="cursor">Cursor</h3>

<p>Cursors <strong>Agent Mode</strong> (“Composer”) can run terminal commands and edit files directly.  You can create a “rule” or “command” that acts exactly the same way as the <code class="language-plaintext highlighter-rouge">/implement</code> commmand:</p>

<ol>
  <li>Create a new file in your project at <code class="language-plaintext highlighter-rouge">.cursor/rules/implement.mdc</code></li>
  <li>Paste the following prompt into that file.  This teaches Cursor what to do:</li>
</ol>

<div class="language-markdown highlighter-rouge"><div class="highlight"><pre class="highlight"><code>  # Spec Kit Implementation Rule

  When I ask you to "implement" or use "/implement":
<span class="p">  
  1.</span> <span class="gs">**Read Context**</span>:
<span class="p">    *</span> Read <span class="sb">`.specify/memory/constitution.md`</span> (for rules)
<span class="p">    *</span> Read <span class="sb">`.specify/specs/*.md`</span> (for requirements)
<span class="p">    *</span> Read <span class="sb">`.specify/plan.md`</span> (for architecture)
<span class="p">    *</span> Read <span class="sb">`.specify/tasks.md`</span> (for the to-do list)
<span class="p">
  2.</span> <span class="gs">**Determine Active Task**</span>:
<span class="p">    *</span> Look at <span class="sb">`tasks.md`</span>.  Find the first unchecked task (e.g., <span class="sb">`[ ] T001...`</span>).
<span class="p">    *</span> <span class="gs">**Goal**</span>: You goal is to complete ONLY this one task.
<span class="p">
  3.</span> <span class="gs">**Execute**</span>:
<span class="p">    *</span> Write the code necessary to complete the task.
<span class="p">    *</span> Create new files if the plan dictates it.
<span class="p">    *</span> <span class="gs">**Constraint**</span> Do not deviate from the <span class="sb">`plan.md`</span>.  If the plan is impossible, stop and ask me.
<span class="p">
  4.</span> <span class="gs">**Update State**</span>:
<span class="p">    *</span> After the code is written and verified, update <span class="sb">`tasks.md`</span> by marking the task as <span class="sb">`[x]`</span>.
</code></pre></div></div>

<p>Now you can use this command inside composer or chat; e.g. <code class="language-plaintext highlighter-rouge">Run /implement</code>.  Cursor will read your rule, find the next task, write the code, and check off the box automatically.</p>

<h3 id="github-copilot">GitHub Copilot</h3>

<p>GitHub Copilot does not natively support custom “slash commands” that execute multi-step logic unless you use a specific extension or the “Prompt Files” feature.  I use the <a href="https://marketplace.visualstudio.com/items?itemName=alfredoperez.speckit-companion">SpecKit Companion</a>.</p>

<p>The extension adds a “SpecKit” UI panel.  Instead of typing <code class="language-plaintext highlighter-rouge">/implement</code>, you simply click the “Implement Next Task” button in the extension sidebar.  This will automatically feed the correct context to Copilot.</p>

<h2 id="final-words">Final words</h2>

<p>Of course, it’s easier if you use the same LLM for both planning and execution of an agentic coding session.  However, I find that using a great reasoning LLM for the planning and a different coding focused LLM for the execution gets better code results.  It’s not much more work, with SpecKit doing the majority of the heavy lifting for you.</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Devtools" /><category term="ai" /><summary type="html"><![CDATA[I wanted to put down my workflow and all the details for using SpecKit using a superior reasoning and planning LLM (like Opus 4.5 via Claude Code, or Gemini 3 via the Gemini CLI) for the initial phases, but switching to an agentic AI IDE like Cursor or GitHub Copilot in VS Code for the final coding phases. I’m using Git and the GitHub CLI for repository actions. This workflow does not include “authenticating”. You still need to configure and log in to each tool separately. They’ll generally prompt you when you need to do so. Install dependencies You need to install the following tools. I use Winget where I can, since I’m on a Windows machine. These instructions are pretty much the same on MacOS or Linux; however, the method of installation is different. Python 3.11+ for running the SpecKit CLI. UV is the Python Package Manager that SpecKit prefers. GitHub CLI for repository management. Claude Code or Gemini CLI for running the planning phase. Visual Studio Code or Cursor for running the execution phase. SpecKit Companiion (if using VS Code) for running SpecKit inside the editor. You can install all this using the following: winget install --id Python.Python.3.12 winget install --id astral-sh.uv winget install --id GitHub.CLI winget install --id Anthropic.ClaudeCode winget install --id Microsoft.VisualStudioCode winget install --id Microsoft.VisualStudioCode.CLI I prefer GitHub Copilot over Cursor, but you can swap that out. SpecKit Companion is available on the Visual Studio Code extension marketplace. Once you’ve done all that, you’ll also need to install SpecKit with the following command: uv tool install specify-cli --from git+https://github.com/github/spec-kit.git # Verify installation specify --version Check out the Quick Start Guide for Spec Kit to familiarize yourself with the application and also to verify that the install instructions are still correct. Initialize a project Start by using the GitHub CLI to create a new remote repository and Spec Kit to scaffold the SDD (Spec Driven Development) files locally: # Create a remote repository on GitHub (private or public as desired) gh repo create my-project --public --clone --confirm # Change directory into the new project folder cd my-project At this point, I like to write a README.md which describes the project as a whole quickly. However, this is optional. Once done, use the installed specify command to create the necessary templates and structure: specify init . This will generate the .specify folder in your project. Specification and Planning phase Use Claude Code for the high-level reasoning and planning phases. Claude’s large context window and strong reasoning (Opus 4.5) excel at creating accurate, unambiguous artifacts. You could also use Gemini 3 here (I haven’t, so can’t comment on quality). Establish a constitution The consitution defines the non-negotiable rules for your project (e.g., tech stack, testing standards). # Start the claude code interactive session claude # You may need to sign in with /login /login # Run the constitution command with your prompt /speckit.constitution Create principles focused on using dotnet 10 with Blazor and MudBlazor, 90% unit test coverage for all logic, mermaid for specification diagrams This generates a file .specify/memory/constitution.md - you should review this file and make any changes you need. Create the specification The specification is the “what” and “why” of your project or feature. # Specify the feature name $env:SPECIFY_FEATURE="001-mvp" # Run Claude Code claude # Build the specification /speckit.specify Build a task management app with user authentication, real-time collaboration, and mobile support. Users should be able to create projects, assign tasks, and track progress with Kanban boards. This generates the file specs/001-mvp.md - the core blueprint. The specification translates your high-level natural language prompt into a structured document containing: Goal, User Scenarios, Functional Requirements, and Acceptance Criteria. You must ensure the agent accurately captures all of your intent and doesn’t introduce unwanted complexity or omit necessary features. Review checklist: Completeness: Does the specification cover everything you intended for the feature? Accuracy: Are the requirements correctly stated? Unambiguity: Is every requirement clear? Adherence to constitution: Does the specification violate any rule you set in the constitution? Now that you have a base specification, you can clarify and refine the specification using generative AI: /speckit.clarify The clarify command doesn’t generate a new file. Instead, it generates direct output in the Claude Code terminal (or chat window), allowing you to further refine the specification through a question and answer session. Review checklist: Understanding: Does the question highlight a genuine missing piece of information? Completeness: Your human responsibility is to then go back and directly edit the spec file to answer these questions directly. Iteration: You may run /speckit.clarify multiple times until the agent returns minimal or no questions, indicating the specification is robust and complete. If the clarify stage reveals a fundamental misunderstanding, you should not just edit the spec file manually. Re-run the specify command with a more detailed prompt to regenerate the baseline structure before refining it. After final edits (and before committing), you should do one final check on the specification file before committing it. Create the technical plan The plan is the “how” of the project or feature. /speckit.plan Build the base ASP.NET Core application with the health endpoints. The /speckit.plan command instructs the AI agent to read the approved specification and the constitution and generate a technical plan. This is placed in specs/001-mvp/plan.md. Depending on the complexity of the prompt, the agent may also create supplementary files, such as: data-model.md api-spec.md architecture.png Review checklist: The plan is where the AI defines the architecture, so your review must focus on technical suitability and adherence to standards. Architecture &amp; Design: Does the plan use the correct design patterns? Is the proposed solution over-engineers or under-engineered for the feature size? Tech Stack Adherence: Does the plan strictly follow the rules laid out in your constitution.md? For example, if you forbade ORMs or required a specific testing framework, the plan must reflect this. Data Models: Are the proposed database schemas (tables, fields, relationships) correct, efficient, and complete for the features needs? API Contracts: Are the proposed API endpoints, request/response bodies, and error codes logical and consistent with existing system contracts? Constraints: Does the plan account for non-functional requirements from the specification, such as performance targets, security, and scalability? Break the plan into granular tasks /speckit.tasks Break the plan into a list of atomic, testable tasks. The output is the .specify/tasks.md file, which is a sequential checklist of small work items (e.g. T001: Create data model). This concludes the planning phase; exit Claude Code and commit the final artifacts before switching: git add .specify/ gh commit -m "feat: [MVP] MVP Specification and technical plan" Implementation phase Now switch to the AI IDE that you prefer for implementation speed. Spec Kits artifacts provide all the context necessary to complete each task in the task list. Set up the environment Open the project in your chosen IDE (VS Code for Copilot, or the Cursor IDE). Ensure the respective AI tool is active and authenticated, and that your chosen language / runtime has been installed and configured correctly. This is also a good time to ensure your testing environment is available (e.g. if you are using TestContainers, then start the Docker engine). Implement tasks You now need to instruct the agent to execute the tasks that were generated: GitHub Copilot Chat: Open the Copilot Chat window Ask it to implement the tasks with the prompt: Read .specify/tasks.md and implement task 1 following the plan in .specify/plan.md. Cursor: Open the Cursor Agent/Chat window. Instruct it in Max Mode: Using the tasks defined in .specify/tasks.md, implement task 1 following the plan in .specify/plan.md After each task, you should review the new code (the “compare window” is a good choice here), run any tests, and then commit the code. Configuring Cusor and GitHub Copilot When you see /speckit.implement in tutorials, it is usually referring to a pre-configured “slash command” that pastes a specific, complex prompt into the AI’s chat window. It tells the AI “Read the plan and tasks files, then write the code for the active task.” Since this workflow uses two different LLMs, you don’t necessarily get the same slash commands, but you can emulate them. Cursor Cursors Agent Mode (“Composer”) can run terminal commands and edit files directly. You can create a “rule” or “command” that acts exactly the same way as the /implement commmand: Create a new file in your project at .cursor/rules/implement.mdc Paste the following prompt into that file. This teaches Cursor what to do: # Spec Kit Implementation Rule When I ask you to "implement" or use "/implement": 1. **Read Context**: * Read `.specify/memory/constitution.md` (for rules) * Read `.specify/specs/*.md` (for requirements) * Read `.specify/plan.md` (for architecture) * Read `.specify/tasks.md` (for the to-do list) 2. **Determine Active Task**: * Look at `tasks.md`. Find the first unchecked task (e.g., `[ ] T001...`). * **Goal**: You goal is to complete ONLY this one task. 3. **Execute**: * Write the code necessary to complete the task. * Create new files if the plan dictates it. * **Constraint** Do not deviate from the `plan.md`. If the plan is impossible, stop and ask me. 4. **Update State**: * After the code is written and verified, update `tasks.md` by marking the task as `[x]`. Now you can use this command inside composer or chat; e.g. Run /implement. Cursor will read your rule, find the next task, write the code, and check off the box automatically. GitHub Copilot GitHub Copilot does not natively support custom “slash commands” that execute multi-step logic unless you use a specific extension or the “Prompt Files” feature. I use the SpecKit Companion. The extension adds a “SpecKit” UI panel. Instead of typing /implement, you simply click the “Implement Next Task” button in the extension sidebar. This will automatically feed the correct context to Copilot. Final words Of course, it’s easier if you use the same LLM for both planning and execution of an agentic coding session. However, I find that using a great reasoning LLM for the planning and a different coding focused LLM for the execution gets better code results. It’s not much more work, with SpecKit doing the majority of the heavy lifting for you.]]></summary></entry><entry><title type="html">AI Assisted Editors: A Comparison (Part 1)</title><link href="https://adrianhall.github.io/posts/2025/2025-08-01-ai-editors.html" rel="alternate" type="text/html" title="AI Assisted Editors: A Comparison (Part 1)" /><published>2025-08-01T00:00:00-07:00</published><updated>2025-08-01T00:00:00-07:00</updated><id>https://adrianhall.github.io/posts/2025/ai-editors</id><content type="html" xml:base="https://adrianhall.github.io/posts/2025/2025-08-01-ai-editors.html"><![CDATA[<p>You can’t go a lap about development these days without bumping into AI assisted editors.  Most of the AI tooling companies I see are heavily investing in Visual Studio Code based editors, but you have some others out there as well.  I’m going to break the comparison I’m doing into three articles - one on the paid offerings I’ve personally used, one on the “free” offerings I’ve used, and one on the quirks of the LLMs that are in common use.  Hopefully, by the end of this, you’ll understand the strengths and weaknesses of each one and be able to decide what to invest in.</p>

<!-- more -->

<h2 id="what-is-an-ai-assisted-editor">What is an AI Assisted Editor?</h2>

<p>As a developer, you open up an editor and write code.  AI Assisted editors generally come in the forms of plugins that allow a large language model to take over the writing of the the code for you.  They also generally provide information on the code base through a chat interface.  This allows you to come up to speed on a new code base quickly by allowing the AI to focus your attention on the important bits.</p>

<p>I’ve personally used three AI assisted code editor plugins to create the same project (a web UI for Azurite) three times.  The projects ended up very different even if they used the same AI models underneath.  That’s not unusual and a natural part of the development process with AI.  If you don’t like the solution, you can ask the AI to change or you can just wipe it out and start over.</p>

<p>I started off writing the project myself without AI.  This took me a month, mostly because I’m not the best at designing UI and TailwindCSS is complex.  But that’s a baseline.</p>

<h2 id="the-paid-plan-editors">The paid-plan editors</h2>

<p>So, who are the contenders?</p>

<ul>
  <li><a href="https://github.com/features/copilot">GitHub Copilot</a></li>
  <li><a href="https://cursor.com/en">Cursor</a></li>
  <li><a href="https://kiro.dev/">AWS Kiro</a></li>
</ul>

<p>All of these have a free plan (at least, right now) that is VERY limited, and all of them charge about $20/month (US pricing) to get any reasonable use out of them.  They all support the use of <a href="https://www.anthropic.com/claude/sonnet">Claude Sonnet</a> v3.7 and v4 as the underlying model; largely considered the gold standard in AI models in this space.  After that, things diverge.</p>

<h2 id="github-copilot">GitHub Copilot</h2>

<p>Pros:  Large selection of models; Enterprise feature set <br />
Cons:  Complex prompting; limited free version;</p>

<p><img src="/assets/images/2025/08/20250801-copilot-agent-dropdown.png" alt="GitHub Copilot Chat Agent Mode" /></p>

<p>I’ve used GitHub Copilot in Visual Studio Code and Visual Studio.  It’s also available for XCode, JetBrains, Neovim, Eclipse, and via GitHub Issues.  It’s installed by default in Visual Studio Code and Visual Studio these days.  The likelyhood is you have access to it.  I found the free version (which only supports Claude Sonnet 3.5 and OpenAI GPT 4.1 among the base models) to be good only for about 30 minutes of coding.  It’s enough to get a flavor but not enough to complete even a basic project.</p>

<p>In terms of model availability, it’s got Claude Sonnet 3.7, Claude Sonnet 4, Gemini 2.5 Pro, and several version of the OpenAI models available in the paid veersion.  There is also a “Pro+” plan for $40/month that provides access to reasoning models (like Claude Opus 4 and OpenAI o3) plus access to GitHub Spark (which is a competitor to v0.dev and similar prototyping tools). It also provides access to local models like <a href="https://ollama.com/">Ollama</a> hosted models, which is great if you dislike sending your code to someone else.  This expands the list to include models like Qwen2.5-coder or stable-coder. This list is by far the most expansive of any of the paid versions.  (Note: using local models doesn’t help with pricing - you still pay for completions).</p>

<p>GitHub Copilot Chat (which is your basic interface) has three modes: Ask (where the model doesn’t do any changes to your code), Edit (where the model is allowed to make changes, but doesn’t use tools), and Agent (where the model uses tools).  This is a super-important distinction.  All the models (but explicitly Claude Sonnet 4) like to change your code.  Sometimes, you only want an AI discussion to formulate a plan - Ask mode is perfect for this.  The Ask and Edit modes also have access to reasoning models that cannot run tools, allowing you to get different responses.</p>

<p>When using GitHub Copilot Agent Mode, you can configure “auto-approved” tools.  Let’s say, for example, you are ok with the agent running “npm test” whenever it wants.  You can add that to the settings (which has an allow list and deny list) and the tool will auto-run.</p>

<p>Since GitHub Copilot is produced by (ultimately) Microsoft, it’s got a very enterprise-centric view of things.  While this doesn’t make any difference to the average user, it allows enterprises to control which models are available and to turn off features for their developers.  The other contenders don’t allow this.</p>

<p>So, how does GitHub Copilot work in practice?  My application was written using a “plan and act” type of formula.  First, I wrote a simple synopsis of the product I wanted to write, then I asked the agent to write the requirements, followed by the design, and finally the tasks needed to complete the task.  Once I had the task list, I asked the agent to complete each task in turn.  In between each one, I started a new chat session to avoid or minimize hallucination.  This process required me to write specific prompts (which I placed in <code class="language-plaintext highlighter-rouge">.github/prompts</code>) and to write a <code class="language-plaintext highlighter-rouge">copilot-instructions.md</code> with project specific instructions for the LLM to consider when writing code.  The instructions contained information on my tech stack, testing preferences, project structure, and so on.</p>

<p>So, where does it fall down?</p>

<ul>
  <li>All the documentation shows the <code class="language-plaintext highlighter-rouge">copilot-instructions.md</code> as a single file, which isn’t ideal.  There are always a split between organizational requirements and project requirements.  Fortunately, you can split this file, but good luck finding the instructions.</li>
  <li>The other entries in this list have a nicer way of approving the tool usage.  Having to edit settings to auto-approve tools is bad.</li>
  <li>The agent will often do <code class="language-plaintext highlighter-rouge">cd somewhere &amp;&amp; do-something</code>.  Even if <code class="language-plaintext highlighter-rouge">do-something</code> is approved, it won’t automatically do it because of the directory change.</li>
  <li>There are many instances where the agent will not read the instructions, forget where it is, or other similar problems.</li>
</ul>

<p>Overall, GitHub Copilot does the job, but it’s far from ideal.   It does have the widest range of models, largest catalog of tools supported, and it works in the largest number of editors. It’s kind of like IKEA furniture - it does the job, but it’s not necessarily the best option for all projects.</p>

<p>I completed the GitHub Copilot project within 2 weeks.  The savings primarily came from the fact that the React components and TailwindCSS took at most a few minutes to write whereas they took a half a day without AI.  However, GitHub Copilot (in common with the other contenders) would then get itself into a loop trying to write and fix tests, add linting rules, and generally do things I had not asked.</p>

<h2 id="cursor">Cursor</h2>

<p>Pros: Model auto-selection; background agents; Planning mode<br />
Cons: Yet Another Editor install; no ask mode; limited free version</p>

<p><img src="/assets/images/2025/08/20250801-cursor-agent.svg" alt="Cursor Editor Chat Agent Mode" /></p>

<p>Cursor provides an installable editor.  The editor itself is based on the OSS version of Visual Studio Code, so you have access to most of the extensions in the Visual Studio Code marketplace.  Want to use Cursor inside your existing editor?  That’s unavailable.  I was able to use the free version for about two hours before my tab completions ran out.  I was able to use the agent for about another hour.  Just like GitHub Copilot, the Pro version is $20/month, but there is another “Ultra” version that is $200/month.  There is also teams and enterprise versions (although I haven’t used it in those modes but it looks similar to the controls on GitHub Copilot).</p>

<p>In terms of model availability, it’s very similar to GitHub Copilot.  You have Grok 3 models in addition to the models from Claude and OpenAI.  IF you want to bring your own model, you can also use AWS, Azure OpenAI, Google, Anthropic, or OpenAI.  You can’t use local models, however - everything is cloud based.  Just like GitHub Copilot, you will still pay for using Cursor as all the requests are routed through Cursor backends.</p>

<p>Cursor really likes “agent” mode.  I found this frustrating - there are times I don’t want the agent to be editing code.  I just want a discussion.  Having to continually switch into ask mode was frustrating.  There is no “edit” mode.  Edit mode in GitHub Copilot expands the models available by allowing models that don’t have tool support (such as Claude Opus or o3).  However, you have background agents.  Let’s say you have five independent tasks to do and they are all working on different parts of the codebase.  You can make four of them “background agents”.  Behind the scenes, the agent will work in an independent branch, clone the repo, do the work, run tools, and check in the code for you.  Unfortunately, I found this to be more aggravation than helpful in my project.  I suspect a larger codebase and more mature product team would find this useful.  GitHub Copilot on GitHub Issues is a similar sort of thing and I found the ability to assign an issue to a GitHub Copilot agent was a better implementation of this concept. Cursor also allows you to auto-approve tools for the agent to use.  You can turn on auto-run mode and then explicitly provide the allow-list and deny-list.</p>

<p>Cursor has two automatic features that replace the <code class="language-plaintext highlighter-rouge">copilot-instructions.md</code> - the first is codebase indexing.  This is a semantic analysis of your code base that allows the agent to provide context-aware suggestions.  The other is “memories” - persistent storage of project context and decisions from past conversations.  Together, these provide the automatic context capabilities that mean you don’t need to worry about providing instructions.  However, I would prefer this to be explicit and stored with the repository.  I found getting Cursor to obey my project requirements (such as using happy-dom when testing components) relatively hard.</p>

<p>One of the nicer features that I used a lot is the “planning” tool.  The Agent can break down complex tasks into a structured to-do list and manage the execution accordingly.  This feature made it easier to write, for example, the home page dashboard with a description and then get the task broken down further.  It’s not capable of breaking down the entire project, though.  I still had to get the agent to “plan and act” and pause after each step for review.  In fact, going through this was more frustrating - no ask mode meant that I was forever telling the model to pause after doing the work I asked.  However, it was also produced a better result - the requirements and design were better thought through with the agent routing feature selecting a more appropriate model for each step.</p>

<p>Overall, Cursor and GitHub Copilot do the same things.  I found Cursor to be more polished and easier to use.  I specifically liked the fact that I didn’t have to think about the model being used.  However, I disliked that I had to use agent mode and didn’t have an option for turning off edits to just have a design conversation.</p>

<p>I completed the Cursor project within 2 weeks.  It was about the same as GitHub Copilot, and the savings came from the same place.  I ended up having to do more re-work in the design phase because I was unable to discuss options without the agent trying to make changes.</p>

<h2 id="aws-kiro">AWS Kiro</h2>

<p>Pros: Spec-driven design; tool auto-approve<br />
Cons: Yet Another Editor install; currently in preview; limited models</p>

<p><img src="/assets/images/2025/08/20250801-kiro-features.png" alt="Kiro Editor Agent Mode" /></p>

<p>Cursor and GitHub Copilot are out in the wild and fully available.  Kiro, by contrast, is in preview mode now.  I’m making allowances for the fact that it is in preview which means that not all features are available yet. Just like Cursor, it’s based on the OSS version of Visual Studio Code. Only Claude Sonnet 3.7 and 4 are available - no other models.  It’s got an “agent” mode (auto-pilot is turned on) and a “ask” mode (when it’s turned off), so you might think this is a “me too” offering from AWS.</p>

<p>Not so fast.  Kiro has a nice sidebar that has a feature to analyze the repository and provide “steering documents” - the equivalent to the <code class="language-plaintext highlighter-rouge">copilot-instructions.md</code> file in GitHub Copilot.  These are broken into “tech”, “structure”, and “product” documents.  Breaking things down into multiple documents opens up the way for providing enterprise standards to these documents - something I expect AWS to incorporate in the future.  Additionally, Kiro is big into “spec-driven design”.  You enter a request (e.g. “I want to build a home page dashboard”) into chat and tell it to create the spec.  It will create a directory for the spec, then create a <code class="language-plaintext highlighter-rouge">requirements.md</code> - user stories with acceptance criteria.  You can edit this file before moving onto the next step, which is design.  The <code class="language-plaintext highlighter-rouge">design.md</code> has everything you’d expect, from sequence diagrams to architecture, to expected interfaces.  Once you’re happy with the design, it will generate tasks - another document.  You can edit each of these documents either with AI chat or just by editing the files.</p>

<p>Next, you can execute the tasks - there is a literal “Start Task” button next to each one.  It reminds me of a Jupyter notebook in style.  Every single task is executed in its own chat session. The model will take into account the steering documents and the design plus other specs you point at.  It checks against the acceptance criteria at the end.</p>

<p>I completed the Kiro-driven project within 10 days.  It was easily the fastest.</p>

<p>So, where were the problems?  First, Kiro didn’t follow instructions.  I had added a paragraph to the <code class="language-plaintext highlighter-rouge">tech.md</code> steering document to tell it to run <code class="language-plaintext highlighter-rouge">npm run check</code> and <code class="language-plaintext highlighter-rouge">npm test</code> after each task was complete.  It didn’t.  It also ran things multiple times unnecessarily.  Since these runs consume tokens and time, this was a problem for me.  I spent so much time hitting cancel.  Also, like Cursor, Claude Sonnet 4 likes to ignore instructions (like “use happy-dom, not jest-dom”), which caused some frustration.</p>

<p>There were also issues which are associated with the Preview nature.  There isn’t a pause button.  There are limited models.  The models are sometimes slow because Kiro is bumping into scale issues.  There are also standout features that I want the other contenders to adopt.  The first is the model they use for auto-approving commands.  When the chat gets to a command, you get to auto-approve the command.  It gives you two or three options.  For example, if the command is <code class="language-plaintext highlighter-rouge">npx vitest run --project unit</code>, then you’ll be given the option to auto-approve the whole command and then maybe <code class="language-plaintext highlighter-rouge">npx vitest *</code> and <code class="language-plaintext highlighter-rouge">npx *</code> as alternatives.  The second feature I really hope gets adopted widely is hooks.  These are agent actions that get triggered when something happens.  As an example, I told it that when <code class="language-plaintext highlighter-rouge">src/**/*.ts</code> was saved, it should ensure the file has comprehensive JSDoc comments and update them if not.</p>

<h2 id="wrap-up">Wrap-up</h2>

<p>I like all three products here, and all three provided a significant savings in time.  Each one had its own drawbacks, though, so there isn’t a clear winner.</p>

<ul>
  <li>If you like absolute control and the widest possible options; use GitHub Copilot</li>
  <li>If you prefer choices to be made for you when possible; use Cursor.</li>
  <li>AWS Kiro is something to keep an eye on.  I certainly appreciate the spec-driven design process.</li>
</ul>

<p>They all have agent mode; they all use the same base models; they all allow you to run tools;  they all allow you to use MCP servers.  You can ask all of them to refactor code, identify bugs, fix eslint errors, and write new code.  They will all get into loops and require manual intervention sometimes.  In all cases, you’ll be paying $20/mo. for any reasonable usage (beyond a couple of hours a month).</p>

<p>For right now, I’m using Kiro extensively.  That’s more a function of the preview mode (and the fact that it is free at the moment) than any qualitative feature that it may have, although I am enjoying the spec-driven design.  That may change in the future, however, as this is a rapidly evolving area.</p>

<p>I said this was part 1.  Part 2 will cover the free and local options you have available.  Until then, happy editing.</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Devtools" /><category term="ai" /><summary type="html"><![CDATA[You can’t go a lap about development these days without bumping into AI assisted editors. Most of the AI tooling companies I see are heavily investing in Visual Studio Code based editors, but you have some others out there as well. I’m going to break the comparison I’m doing into three articles - one on the paid offerings I’ve personally used, one on the “free” offerings I’ve used, and one on the quirks of the LLMs that are in common use. Hopefully, by the end of this, you’ll understand the strengths and weaknesses of each one and be able to decide what to invest in. What is an AI Assisted Editor? As a developer, you open up an editor and write code. AI Assisted editors generally come in the forms of plugins that allow a large language model to take over the writing of the the code for you. They also generally provide information on the code base through a chat interface. This allows you to come up to speed on a new code base quickly by allowing the AI to focus your attention on the important bits. I’ve personally used three AI assisted code editor plugins to create the same project (a web UI for Azurite) three times. The projects ended up very different even if they used the same AI models underneath. That’s not unusual and a natural part of the development process with AI. If you don’t like the solution, you can ask the AI to change or you can just wipe it out and start over. I started off writing the project myself without AI. This took me a month, mostly because I’m not the best at designing UI and TailwindCSS is complex. But that’s a baseline. The paid-plan editors So, who are the contenders? GitHub Copilot Cursor AWS Kiro All of these have a free plan (at least, right now) that is VERY limited, and all of them charge about $20/month (US pricing) to get any reasonable use out of them. They all support the use of Claude Sonnet v3.7 and v4 as the underlying model; largely considered the gold standard in AI models in this space. After that, things diverge. GitHub Copilot Pros: Large selection of models; Enterprise feature set Cons: Complex prompting; limited free version; I’ve used GitHub Copilot in Visual Studio Code and Visual Studio. It’s also available for XCode, JetBrains, Neovim, Eclipse, and via GitHub Issues. It’s installed by default in Visual Studio Code and Visual Studio these days. The likelyhood is you have access to it. I found the free version (which only supports Claude Sonnet 3.5 and OpenAI GPT 4.1 among the base models) to be good only for about 30 minutes of coding. It’s enough to get a flavor but not enough to complete even a basic project. In terms of model availability, it’s got Claude Sonnet 3.7, Claude Sonnet 4, Gemini 2.5 Pro, and several version of the OpenAI models available in the paid veersion. There is also a “Pro+” plan for $40/month that provides access to reasoning models (like Claude Opus 4 and OpenAI o3) plus access to GitHub Spark (which is a competitor to v0.dev and similar prototyping tools). It also provides access to local models like Ollama hosted models, which is great if you dislike sending your code to someone else. This expands the list to include models like Qwen2.5-coder or stable-coder. This list is by far the most expansive of any of the paid versions. (Note: using local models doesn’t help with pricing - you still pay for completions). GitHub Copilot Chat (which is your basic interface) has three modes: Ask (where the model doesn’t do any changes to your code), Edit (where the model is allowed to make changes, but doesn’t use tools), and Agent (where the model uses tools). This is a super-important distinction. All the models (but explicitly Claude Sonnet 4) like to change your code. Sometimes, you only want an AI discussion to formulate a plan - Ask mode is perfect for this. The Ask and Edit modes also have access to reasoning models that cannot run tools, allowing you to get different responses. When using GitHub Copilot Agent Mode, you can configure “auto-approved” tools. Let’s say, for example, you are ok with the agent running “npm test” whenever it wants. You can add that to the settings (which has an allow list and deny list) and the tool will auto-run. Since GitHub Copilot is produced by (ultimately) Microsoft, it’s got a very enterprise-centric view of things. While this doesn’t make any difference to the average user, it allows enterprises to control which models are available and to turn off features for their developers. The other contenders don’t allow this. So, how does GitHub Copilot work in practice? My application was written using a “plan and act” type of formula. First, I wrote a simple synopsis of the product I wanted to write, then I asked the agent to write the requirements, followed by the design, and finally the tasks needed to complete the task. Once I had the task list, I asked the agent to complete each task in turn. In between each one, I started a new chat session to avoid or minimize hallucination. This process required me to write specific prompts (which I placed in .github/prompts) and to write a copilot-instructions.md with project specific instructions for the LLM to consider when writing code. The instructions contained information on my tech stack, testing preferences, project structure, and so on. So, where does it fall down? All the documentation shows the copilot-instructions.md as a single file, which isn’t ideal. There are always a split between organizational requirements and project requirements. Fortunately, you can split this file, but good luck finding the instructions. The other entries in this list have a nicer way of approving the tool usage. Having to edit settings to auto-approve tools is bad. The agent will often do cd somewhere &amp;&amp; do-something. Even if do-something is approved, it won’t automatically do it because of the directory change. There are many instances where the agent will not read the instructions, forget where it is, or other similar problems. Overall, GitHub Copilot does the job, but it’s far from ideal. It does have the widest range of models, largest catalog of tools supported, and it works in the largest number of editors. It’s kind of like IKEA furniture - it does the job, but it’s not necessarily the best option for all projects. I completed the GitHub Copilot project within 2 weeks. The savings primarily came from the fact that the React components and TailwindCSS took at most a few minutes to write whereas they took a half a day without AI. However, GitHub Copilot (in common with the other contenders) would then get itself into a loop trying to write and fix tests, add linting rules, and generally do things I had not asked. Cursor Pros: Model auto-selection; background agents; Planning mode Cons: Yet Another Editor install; no ask mode; limited free version Cursor provides an installable editor. The editor itself is based on the OSS version of Visual Studio Code, so you have access to most of the extensions in the Visual Studio Code marketplace. Want to use Cursor inside your existing editor? That’s unavailable. I was able to use the free version for about two hours before my tab completions ran out. I was able to use the agent for about another hour. Just like GitHub Copilot, the Pro version is $20/month, but there is another “Ultra” version that is $200/month. There is also teams and enterprise versions (although I haven’t used it in those modes but it looks similar to the controls on GitHub Copilot). In terms of model availability, it’s very similar to GitHub Copilot. You have Grok 3 models in addition to the models from Claude and OpenAI. IF you want to bring your own model, you can also use AWS, Azure OpenAI, Google, Anthropic, or OpenAI. You can’t use local models, however - everything is cloud based. Just like GitHub Copilot, you will still pay for using Cursor as all the requests are routed through Cursor backends. Cursor really likes “agent” mode. I found this frustrating - there are times I don’t want the agent to be editing code. I just want a discussion. Having to continually switch into ask mode was frustrating. There is no “edit” mode. Edit mode in GitHub Copilot expands the models available by allowing models that don’t have tool support (such as Claude Opus or o3). However, you have background agents. Let’s say you have five independent tasks to do and they are all working on different parts of the codebase. You can make four of them “background agents”. Behind the scenes, the agent will work in an independent branch, clone the repo, do the work, run tools, and check in the code for you. Unfortunately, I found this to be more aggravation than helpful in my project. I suspect a larger codebase and more mature product team would find this useful. GitHub Copilot on GitHub Issues is a similar sort of thing and I found the ability to assign an issue to a GitHub Copilot agent was a better implementation of this concept. Cursor also allows you to auto-approve tools for the agent to use. You can turn on auto-run mode and then explicitly provide the allow-list and deny-list. Cursor has two automatic features that replace the copilot-instructions.md - the first is codebase indexing. This is a semantic analysis of your code base that allows the agent to provide context-aware suggestions. The other is “memories” - persistent storage of project context and decisions from past conversations. Together, these provide the automatic context capabilities that mean you don’t need to worry about providing instructions. However, I would prefer this to be explicit and stored with the repository. I found getting Cursor to obey my project requirements (such as using happy-dom when testing components) relatively hard. One of the nicer features that I used a lot is the “planning” tool. The Agent can break down complex tasks into a structured to-do list and manage the execution accordingly. This feature made it easier to write, for example, the home page dashboard with a description and then get the task broken down further. It’s not capable of breaking down the entire project, though. I still had to get the agent to “plan and act” and pause after each step for review. In fact, going through this was more frustrating - no ask mode meant that I was forever telling the model to pause after doing the work I asked. However, it was also produced a better result - the requirements and design were better thought through with the agent routing feature selecting a more appropriate model for each step. Overall, Cursor and GitHub Copilot do the same things. I found Cursor to be more polished and easier to use. I specifically liked the fact that I didn’t have to think about the model being used. However, I disliked that I had to use agent mode and didn’t have an option for turning off edits to just have a design conversation. I completed the Cursor project within 2 weeks. It was about the same as GitHub Copilot, and the savings came from the same place. I ended up having to do more re-work in the design phase because I was unable to discuss options without the agent trying to make changes. AWS Kiro Pros: Spec-driven design; tool auto-approve Cons: Yet Another Editor install; currently in preview; limited models Cursor and GitHub Copilot are out in the wild and fully available. Kiro, by contrast, is in preview mode now. I’m making allowances for the fact that it is in preview which means that not all features are available yet. Just like Cursor, it’s based on the OSS version of Visual Studio Code. Only Claude Sonnet 3.7 and 4 are available - no other models. It’s got an “agent” mode (auto-pilot is turned on) and a “ask” mode (when it’s turned off), so you might think this is a “me too” offering from AWS. Not so fast. Kiro has a nice sidebar that has a feature to analyze the repository and provide “steering documents” - the equivalent to the copilot-instructions.md file in GitHub Copilot. These are broken into “tech”, “structure”, and “product” documents. Breaking things down into multiple documents opens up the way for providing enterprise standards to these documents - something I expect AWS to incorporate in the future. Additionally, Kiro is big into “spec-driven design”. You enter a request (e.g. “I want to build a home page dashboard”) into chat and tell it to create the spec. It will create a directory for the spec, then create a requirements.md - user stories with acceptance criteria. You can edit this file before moving onto the next step, which is design. The design.md has everything you’d expect, from sequence diagrams to architecture, to expected interfaces. Once you’re happy with the design, it will generate tasks - another document. You can edit each of these documents either with AI chat or just by editing the files. Next, you can execute the tasks - there is a literal “Start Task” button next to each one. It reminds me of a Jupyter notebook in style. Every single task is executed in its own chat session. The model will take into account the steering documents and the design plus other specs you point at. It checks against the acceptance criteria at the end. I completed the Kiro-driven project within 10 days. It was easily the fastest. So, where were the problems? First, Kiro didn’t follow instructions. I had added a paragraph to the tech.md steering document to tell it to run npm run check and npm test after each task was complete. It didn’t. It also ran things multiple times unnecessarily. Since these runs consume tokens and time, this was a problem for me. I spent so much time hitting cancel. Also, like Cursor, Claude Sonnet 4 likes to ignore instructions (like “use happy-dom, not jest-dom”), which caused some frustration. There were also issues which are associated with the Preview nature. There isn’t a pause button. There are limited models. The models are sometimes slow because Kiro is bumping into scale issues. There are also standout features that I want the other contenders to adopt. The first is the model they use for auto-approving commands. When the chat gets to a command, you get to auto-approve the command. It gives you two or three options. For example, if the command is npx vitest run --project unit, then you’ll be given the option to auto-approve the whole command and then maybe npx vitest * and npx * as alternatives. The second feature I really hope gets adopted widely is hooks. These are agent actions that get triggered when something happens. As an example, I told it that when src/**/*.ts was saved, it should ensure the file has comprehensive JSDoc comments and update them if not. Wrap-up I like all three products here, and all three provided a significant savings in time. Each one had its own drawbacks, though, so there isn’t a clear winner. If you like absolute control and the widest possible options; use GitHub Copilot If you prefer choices to be made for you when possible; use Cursor. AWS Kiro is something to keep an eye on. I certainly appreciate the spec-driven design process. They all have agent mode; they all use the same base models; they all allow you to run tools; they all allow you to use MCP servers. You can ask all of them to refactor code, identify bugs, fix eslint errors, and write new code. They will all get into loops and require manual intervention sometimes. In all cases, you’ll be paying $20/mo. for any reasonable usage (beyond a couple of hours a month). For right now, I’m using Kiro extensively. That’s more a function of the preview mode (and the fact that it is free at the moment) than any qualitative feature that it may have, although I am enjoying the spec-driven design. That may change in the future, however, as this is a rapidly evolving area. I said this was part 1. Part 2 will cover the free and local options you have available. Until then, happy editing.]]></summary></entry><entry><title type="html">AI Assisted Editors: A Comparison (Part 2)</title><link href="https://adrianhall.github.io/posts/2025/2025-08-01-oss-ai-editors.html" rel="alternate" type="text/html" title="AI Assisted Editors: A Comparison (Part 2)" /><published>2025-08-01T00:00:00-07:00</published><updated>2025-08-01T00:00:00-07:00</updated><id>https://adrianhall.github.io/posts/2025/oss-ai-editors</id><content type="html" xml:base="https://adrianhall.github.io/posts/2025/2025-08-01-oss-ai-editors.html"><![CDATA[<p>In <a href="/posts/2025/2025-08-01-ai-editors.html">my last article</a>, I provided a break down of the three commercial AI assisted editors that I use on a regular basis.  However, not everyone can afford a monthly subscription ($20/month is the standard), or maybe you prefer different models or a more local development environment.  There are still options that you can use to get AI assisted editing yet avoid the subscription charge.</p>

<!-- more -->

<h2 id="what-is-an-ai-assisted-editor">What is an AI Assisted Editor?</h2>

<p>As a developer, you open up an editor and write code.  AI Assisted editors generally come in the forms of plugins that allow a large language model to take over the writing of the the code for you.  They also generally provide information on the code base through a chat interface.  This allows you to come up to speed on a new code base quickly by allowing the AI to focus your attention on the important bits.</p>

<p>I’ve personally used three AI assisted code editor plugins to create the same project (a web UI for Azurite) three times.  The projects ended up very different even if they used the same AI models underneath.  That’s not unusual and a natural part of the development process with AI.  If you don’t like the solution, you can ask the AI to change or you can just wipe it out and start over.</p>

<p>I started off writing the project myself without AI.  This took me a month, mostly because I’m not the best at designing UI and TailwindCSS is complex.  But that’s a baseline.</p>

<h2 id="the-plugins">The plugins</h2>

<p>So, who are the contenders?</p>

<ul>
  <li><a href="https://github.com/features/copilot">GitHub Copilot</a> - yes, again!</li>
  <li><a href="https://cline.bot/">Cline</a></li>
  <li>Others I’ve tried.</li>
</ul>

<p>These are all available as extensions on Visual Studio Code.  If you want to get going with local models, you’re also going to need <a href="https://ollama.com/">Ollama</a> running on a GPU-enabled desktop with plenty of memory, and a model.  I used qwen2.5-coder:1.5b for my tests, but you will need to explicitly choose a model.</p>

<p>You can easily install Ollama like this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>winget <span class="nb">install </span>ollama.ollama
</code></pre></div></div>

<p>Then install and run the model:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ollama run qwen2.5-coder:1.5b
</code></pre></div></div>

<p>Unlike the subscription services, not one of the options allowed me to complete my project without a significant amount of coding.  The OSS AI assistants were more helpers on the side than a serious AI coding partner.</p>

<blockquote>
  <p><strong>A note to the Enterprise IT Admins who turn off options in Visual Studio Code</strong></p>

  <p>Why?</p>

  <p>If you allow downloading extensions, but turn off features of GitHub Copilot, you are just asking your developers to work around your restrictions.  They will do what is needed to get their job done.</p>
</blockquote>

<h2 id="github-copilot">GitHub Copilot</h2>

<p>Didn’t we do this last time?  Well, yes - I did talk about the subscription services last time, but I glossed over the experience when you go local.  When you use the drop-down (err - drop-up?) to edit the model, you get this:</p>

<p><img src="/assets/images/2025/08/2025-08-03-copilot-models.png" alt="Model selector in GitHub Copilot Chat" /></p>

<p>Notice the “Manage Models” option.  That allows you to install new models from providers that aren’t GitHub Copilot. If you are running Ollama, you can select that and see the list of models that are available to you.  Once selected, you can then run Ask, Edit, or Agent mode.  If you have Azure credits, you can run a model in the Azure AI Factory and connect to it using the Azure option.  There is also support for OpenRouter (which is a pay-as-you-go option for consuming LLMs).  You can also choose models from any of the major frontier providers if you have an API key (and hence are paying directly for them).</p>

<p>I’ve picked Ollama and qwen2.5-coder as my model.  So, how did it fare?  Well, let’s split this into the experience and the model accuracy.  I started by wondering if it could do something as simple as write unit tests for a function.  As soon as I entered the prompt, I heard the fans on my desktop shift into high gear - it was working hard!  My GPU was pegged.  I waited.</p>

<p>And waited.</p>

<p>And waited.</p>

<p>Seven minutes later, my simple prompt produced an answer.  It had worked, but something that produced an almost immediate response took an eternity when running locally.  What’s worse, the results left a lot to be desired.</p>

<p>I didn’t have enough horsepower in my desktop to run the latest qwen3-coder models, and stable-coder (another model that was suggested as being acceptable in this scenario) was not any better than the qwen2.5-coder.  I had asked the model to create unit tests for a single test file that I added as context.  It produced code that didn’t work and added in extra tests for methods in other files that I had not asked for.</p>

<p>My learning from this - you either need a beefy box that costs several thousand dollars (due to the GPU requirements) so that you can run the best models available or you are stuck using remote models.</p>

<p>As to the rest of the experience - it was exactly the same as GitHub Copilot with a remote model.  You have the three modes (ask, edit, and agent).  You can configure and use tools when the model you are using supports tool use.  You can add context and use <code class="language-plaintext highlighter-rouge">copilot-instructions.md</code> just like the subscription version.</p>

<h2 id="cline">Cline</h2>

<p>Next, I installed <a href="https://cline.bot">Cline</a>.  If you ever tried Linux after being on a Windows PC, you were likely overwhelmed by the sheer number of “knobs” you can turn to configure it.  I had the same shock when I tried Cline.  It exposes more knobs than any of the other extensions I tried.  It also provides more feedback than any of the other extensions I tried.  This was properly OSS land.</p>

<p>It also failed more often than any of the other extensions I tried.</p>

<p>My first step was to just try it out.  I signed up for a Cline account.  They give you $0.50 worth of credits to play with (enough for 3-4 small tasks), and then you can add more credits as you wish.  The Cline account suggests you use Claude Sonnet, and it works just like you would expect - very well.</p>

<p>I then switched over to Ollama.  Despite having good experience with the configuration in GitHub Copilot, the same setup would not work in Cline.  I tried multiple models (in case it was a model incompatibility) with no success.  The API call would just time out.</p>

<p><img src="/assets/images/2025/08/2025-08-03-cline-models.png" alt="Cline model selection" /></p>

<p>Cline did have the largest model provider selection of all the extensions I tried.  In addition to model routers at Cline, you can also try LiteLLM (for a local flavor), OpenRouter (for a paid option), and VS Code itself (to use Cline on top of the Copilot models).  There is a plethora of options you can use directly from OpenAI, Google, AWS, Azure, Mistral, and others.  You will find YOUR model in this set.</p>

<p>As you can see above, you also get to set the Model Context Window and the Request Timeout.  Qwen 2.5 supports a 256K window and LLama 3.1 supports a 128K window - you can take advantage of those.</p>

<p>Let’s switch over to the experience though.  Like all the other extensions, you can just chat.  However, Cline also has a “Plan and Act” mode.  This is most similar to the Kiro spec-driven design model (although Kiro is much more polished even in preview).  It did a reasonable job of turning my project description into an actionable plan and then executing each task. I have no doubt that if I were using Cline accounts and the Claude Sonnet 4 model, I’d be able to complete the project.  I was after a “free” option though, and the models I had access to were not up to the task.  The other good thing I found in the UI was the “MCP marketplace”.  Admittedly, this is heavily skewed towards AWS usage.  I did appreciate the one-click install of the <a href="https://github.com/github/github-mcp-server">GitHub MCP Server</a>, for example.  I think I spent an hour just scrolling through the list figuring out if any of the other MCP servers were useful to me.</p>

<p>The other place Cline falls down is a big one.  It’s just too complex for the average person.  You can adjust absolutely everything.  However, understanding why you would want to use a bigger context window or a specific model is not something the average developer even wants to know unless they are trying to become an expert on AI topics.  The cognitive load is high without an obvious benefit.</p>

<h2 id="other-tools">Other tools</h2>

<p>I’m going to lump a lot of tools here, since I tried a bunch when researching this article.  Here is a partial list:</p>

<ul>
  <li>CodeGPT</li>
  <li>Ollama Chat</li>
  <li>Ollama Dev Companion</li>
  <li>Kilo Code AI Agent</li>
  <li>Zencoder</li>
</ul>

<p>They all had “something” a little extra for themselves.  CodeGPT was a paid alternative to GitHub Copilot, but it wasn’t really that much different.  Ollama Chat was a nice way to interact with the model outside of an IDE, but it didn’t seem to offer anything beyond what you could already do in GitHub Copilot, or in the terminal for that matter.  Kilo Code AI Agent is a combination of Cline with Roo - it was an easier combined set up but didn’t really offer anything not already within Cline, and I didn’t see the benefit of combining them.  It did give you $20 in tokens though.  Zencoder was another sign up for a paid service.  It provided access to Claude Sonnet.  It’s little extra was indexing your code base and providing specific agents (e.g. for Q&amp;A, Coding, Test generation, etc), which is just another layer of complexity. I tried them all, but none of them offered anything that I couldn’t do elsewhere.</p>

<p>Another problem - really in the usability area, and one I’ve gotten used to when using AI assitance.  I expect the chat interface to be in the secondary chat area - to the right of my editor.  I’m almost programmed to look for it there.  Instead, all of these tools put the chat in the primary editor space.  This means you can see the chat OR the tests OR the file browser.  It was surprising to me how much of a difference this makes the usability.  I wanted to see the tests and type something in chat while I’m looking at the test output, or I wanted to drag and drop the files onto the context area.</p>

<h2 id="wrap-up">Wrap-up</h2>

<p>So, would I use any of these tools?</p>

<p>Nope.</p>

<p>For the amount of coding I do, I’m going to pay for one of the subscription services.  I’ll get a better model, faster responses, and less frustration.</p>

<p>Using local models may be fine for simple tasks, but it’s not worth the delays and inaccuracies you need to deal with.  The cost of the GPU you need will fund several years of subscription costs, and the best models are all subscription based, not local models.</p>

<p>Using OSS or non-core extensions all come with their own issues and annoyances.  For me, the benefits are not worth the frustration.</p>

<p>This got me thinking on what an ideal “AI assitant” should be:</p>

<ul>
  <li>Be everywhere I am - not in a new application.  (Winner: GitHub Copilot)</li>
  <li>Live in the secondary chat area to the right of the editor. (Winners: Copilot, Kiro, Cursor)</li>
  <li>Transparently use the best model unless I want to specify it (Winner: Cursor)</li>
  <li>Minimize the prompt engineering I need to do, allowing me to focus on the job (Winner: Kiro)</li>
  <li>Allow the use of local models, OpenRouter, and direct API connections (Winner: Copilot)</li>
  <li>Provide recommendations for MCP servers (No winner) from a marketplace (Winner: Cline)</li>
  <li>It should allow me to auto-approve while working (Winner: Kiro)</li>
  <li>It should have a separate mode for pull request-like reviews - aka “review my changes” (No winner)</li>
  <li>It should have automatic memory, so it learns my style and requirements as we work together (Winner: Cursor)</li>
</ul>

<p>As you can see, there is no “best candidate” that does everything - it’s sort of a bit of all of them.  Who knows, the next hot new editor may just provide the AI assistant I need.  Do you have something you really want the next hot new editor to help you with?  Let me know in the comments.</p>

<p>In the next (and final) article, I’m going to give my thoughts on using the models available through the subscription services. I’ll also give you the model I most often use (and why).  It’s not the latest Claude model.</p>]]></content><author><name>Adrian Hall</name><email>photoadrian@outlook.com</email></author><category term="Devtools" /><category term="ai" /><summary type="html"><![CDATA[In my last article, I provided a break down of the three commercial AI assisted editors that I use on a regular basis. However, not everyone can afford a monthly subscription ($20/month is the standard), or maybe you prefer different models or a more local development environment. There are still options that you can use to get AI assisted editing yet avoid the subscription charge. What is an AI Assisted Editor? As a developer, you open up an editor and write code. AI Assisted editors generally come in the forms of plugins that allow a large language model to take over the writing of the the code for you. They also generally provide information on the code base through a chat interface. This allows you to come up to speed on a new code base quickly by allowing the AI to focus your attention on the important bits. I’ve personally used three AI assisted code editor plugins to create the same project (a web UI for Azurite) three times. The projects ended up very different even if they used the same AI models underneath. That’s not unusual and a natural part of the development process with AI. If you don’t like the solution, you can ask the AI to change or you can just wipe it out and start over. I started off writing the project myself without AI. This took me a month, mostly because I’m not the best at designing UI and TailwindCSS is complex. But that’s a baseline. The plugins So, who are the contenders? GitHub Copilot - yes, again! Cline Others I’ve tried. These are all available as extensions on Visual Studio Code. If you want to get going with local models, you’re also going to need Ollama running on a GPU-enabled desktop with plenty of memory, and a model. I used qwen2.5-coder:1.5b for my tests, but you will need to explicitly choose a model. You can easily install Ollama like this: winget install ollama.ollama Then install and run the model: ollama run qwen2.5-coder:1.5b Unlike the subscription services, not one of the options allowed me to complete my project without a significant amount of coding. The OSS AI assistants were more helpers on the side than a serious AI coding partner. A note to the Enterprise IT Admins who turn off options in Visual Studio Code Why? If you allow downloading extensions, but turn off features of GitHub Copilot, you are just asking your developers to work around your restrictions. They will do what is needed to get their job done. GitHub Copilot Didn’t we do this last time? Well, yes - I did talk about the subscription services last time, but I glossed over the experience when you go local. When you use the drop-down (err - drop-up?) to edit the model, you get this: Notice the “Manage Models” option. That allows you to install new models from providers that aren’t GitHub Copilot. If you are running Ollama, you can select that and see the list of models that are available to you. Once selected, you can then run Ask, Edit, or Agent mode. If you have Azure credits, you can run a model in the Azure AI Factory and connect to it using the Azure option. There is also support for OpenRouter (which is a pay-as-you-go option for consuming LLMs). You can also choose models from any of the major frontier providers if you have an API key (and hence are paying directly for them). I’ve picked Ollama and qwen2.5-coder as my model. So, how did it fare? Well, let’s split this into the experience and the model accuracy. I started by wondering if it could do something as simple as write unit tests for a function. As soon as I entered the prompt, I heard the fans on my desktop shift into high gear - it was working hard! My GPU was pegged. I waited. And waited. And waited. Seven minutes later, my simple prompt produced an answer. It had worked, but something that produced an almost immediate response took an eternity when running locally. What’s worse, the results left a lot to be desired. I didn’t have enough horsepower in my desktop to run the latest qwen3-coder models, and stable-coder (another model that was suggested as being acceptable in this scenario) was not any better than the qwen2.5-coder. I had asked the model to create unit tests for a single test file that I added as context. It produced code that didn’t work and added in extra tests for methods in other files that I had not asked for. My learning from this - you either need a beefy box that costs several thousand dollars (due to the GPU requirements) so that you can run the best models available or you are stuck using remote models. As to the rest of the experience - it was exactly the same as GitHub Copilot with a remote model. You have the three modes (ask, edit, and agent). You can configure and use tools when the model you are using supports tool use. You can add context and use copilot-instructions.md just like the subscription version. Cline Next, I installed Cline. If you ever tried Linux after being on a Windows PC, you were likely overwhelmed by the sheer number of “knobs” you can turn to configure it. I had the same shock when I tried Cline. It exposes more knobs than any of the other extensions I tried. It also provides more feedback than any of the other extensions I tried. This was properly OSS land. It also failed more often than any of the other extensions I tried. My first step was to just try it out. I signed up for a Cline account. They give you $0.50 worth of credits to play with (enough for 3-4 small tasks), and then you can add more credits as you wish. The Cline account suggests you use Claude Sonnet, and it works just like you would expect - very well. I then switched over to Ollama. Despite having good experience with the configuration in GitHub Copilot, the same setup would not work in Cline. I tried multiple models (in case it was a model incompatibility) with no success. The API call would just time out. Cline did have the largest model provider selection of all the extensions I tried. In addition to model routers at Cline, you can also try LiteLLM (for a local flavor), OpenRouter (for a paid option), and VS Code itself (to use Cline on top of the Copilot models). There is a plethora of options you can use directly from OpenAI, Google, AWS, Azure, Mistral, and others. You will find YOUR model in this set. As you can see above, you also get to set the Model Context Window and the Request Timeout. Qwen 2.5 supports a 256K window and LLama 3.1 supports a 128K window - you can take advantage of those. Let’s switch over to the experience though. Like all the other extensions, you can just chat. However, Cline also has a “Plan and Act” mode. This is most similar to the Kiro spec-driven design model (although Kiro is much more polished even in preview). It did a reasonable job of turning my project description into an actionable plan and then executing each task. I have no doubt that if I were using Cline accounts and the Claude Sonnet 4 model, I’d be able to complete the project. I was after a “free” option though, and the models I had access to were not up to the task. The other good thing I found in the UI was the “MCP marketplace”. Admittedly, this is heavily skewed towards AWS usage. I did appreciate the one-click install of the GitHub MCP Server, for example. I think I spent an hour just scrolling through the list figuring out if any of the other MCP servers were useful to me. The other place Cline falls down is a big one. It’s just too complex for the average person. You can adjust absolutely everything. However, understanding why you would want to use a bigger context window or a specific model is not something the average developer even wants to know unless they are trying to become an expert on AI topics. The cognitive load is high without an obvious benefit. Other tools I’m going to lump a lot of tools here, since I tried a bunch when researching this article. Here is a partial list: CodeGPT Ollama Chat Ollama Dev Companion Kilo Code AI Agent Zencoder They all had “something” a little extra for themselves. CodeGPT was a paid alternative to GitHub Copilot, but it wasn’t really that much different. Ollama Chat was a nice way to interact with the model outside of an IDE, but it didn’t seem to offer anything beyond what you could already do in GitHub Copilot, or in the terminal for that matter. Kilo Code AI Agent is a combination of Cline with Roo - it was an easier combined set up but didn’t really offer anything not already within Cline, and I didn’t see the benefit of combining them. It did give you $20 in tokens though. Zencoder was another sign up for a paid service. It provided access to Claude Sonnet. It’s little extra was indexing your code base and providing specific agents (e.g. for Q&amp;A, Coding, Test generation, etc), which is just another layer of complexity. I tried them all, but none of them offered anything that I couldn’t do elsewhere. Another problem - really in the usability area, and one I’ve gotten used to when using AI assitance. I expect the chat interface to be in the secondary chat area - to the right of my editor. I’m almost programmed to look for it there. Instead, all of these tools put the chat in the primary editor space. This means you can see the chat OR the tests OR the file browser. It was surprising to me how much of a difference this makes the usability. I wanted to see the tests and type something in chat while I’m looking at the test output, or I wanted to drag and drop the files onto the context area. Wrap-up So, would I use any of these tools? Nope. For the amount of coding I do, I’m going to pay for one of the subscription services. I’ll get a better model, faster responses, and less frustration. Using local models may be fine for simple tasks, but it’s not worth the delays and inaccuracies you need to deal with. The cost of the GPU you need will fund several years of subscription costs, and the best models are all subscription based, not local models. Using OSS or non-core extensions all come with their own issues and annoyances. For me, the benefits are not worth the frustration. This got me thinking on what an ideal “AI assitant” should be: Be everywhere I am - not in a new application. (Winner: GitHub Copilot) Live in the secondary chat area to the right of the editor. (Winners: Copilot, Kiro, Cursor) Transparently use the best model unless I want to specify it (Winner: Cursor) Minimize the prompt engineering I need to do, allowing me to focus on the job (Winner: Kiro) Allow the use of local models, OpenRouter, and direct API connections (Winner: Copilot) Provide recommendations for MCP servers (No winner) from a marketplace (Winner: Cline) It should allow me to auto-approve while working (Winner: Kiro) It should have a separate mode for pull request-like reviews - aka “review my changes” (No winner) It should have automatic memory, so it learns my style and requirements as we work together (Winner: Cursor) As you can see, there is no “best candidate” that does everything - it’s sort of a bit of all of them. Who knows, the next hot new editor may just provide the AI assistant I need. Do you have something you really want the next hot new editor to help you with? Let me know in the comments. In the next (and final) article, I’m going to give my thoughts on using the models available through the subscription services. I’ll also give you the model I most often use (and why). It’s not the latest Claude model.]]></summary></entry></feed>