<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/"><channel><title>Jujutsu on Jonathan's Blog</title><link>https://jonathan-frere.com/tags/jujutsu/</link><description>Recent content in Jujutsu on Jonathan's Blog</description><generator>Hugo</generator><language>en-gb</language><copyright>This work is licensed under CC BY-SA 4.0</copyright><lastBuildDate>Thu, 27 Feb 2025 00:00:00 +0000</lastBuildDate><atom:link href="https://jonathan-frere.com/tags/jujutsu/index.xml" rel="self" type="application/rss+xml"/><item><title>Why are Jujutsu's ID Prefixes So Short?</title><link>https://jonathan-frere.com/posts/jujutsu-shortest-ids/</link><pubDate>Thu, 27 Feb 2025 00:00:00 +0000</pubDate><guid>https://jonathan-frere.com/posts/jujutsu-shortest-ids/</guid><description>&lt;p>&lt;a href="https://jj-vcs.github.io/jj/latest/">Jujutsu&lt;/a> is a relatively new version control system that is designed around working directly with commit-like objects called “changes”&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>. Each change has an ID, and these IDs are important because they’re the main way of referring to different changes&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup>. To make referencing these IDs a bit easier, Jujutsu often shows ID prefixes rather than the whole ID when displaying commits (such as in the &lt;code>jj log&lt;/code> command).&lt;/p>
&lt;p>This is similar to how Git can show an abbreviated commit hash, (typically 7-8 characters), but in Jujutsu, the ID prefixes are often 1-2 characters long. But a repository in Jujutsu can have thousands of changes in it. How can the prefixes stay so small?&lt;/p></description><content:encoded><![CDATA[<p><a href="https://jj-vcs.github.io/jj/latest/">Jujutsu</a> is a relatively new version control system that is designed around working directly with commit-like objects called “changes”<sup id="fnref:1"><a href="#fn:1" class="footnote-ref" role="doc-noteref">1</a></sup>. Each change has an ID, and these IDs are important because they’re the main way of referring to different changes<sup id="fnref:2"><a href="#fn:2" class="footnote-ref" role="doc-noteref">2</a></sup>. To make referencing these IDs a bit easier, Jujutsu often shows ID prefixes rather than the whole ID when displaying commits (such as in the <code>jj log</code> command).</p>
<p>This is similar to how Git can show an abbreviated commit hash, (typically 7-8 characters), but in Jujutsu, the ID prefixes are often 1-2 characters long. But a repository in Jujutsu can have thousands of changes in it. How can the prefixes stay so small?</p>
<h2 id="anatomy-of-a-change-id">
  <a href="#anatomy-of-a-change-id">Anatomy of a Change ID</a>
</h2>

<p>Every change in Jujutsu has a change ID. If you run <code>jj log</code>, you’ll see an abbreviated set of changes, and at the start of each entry in the log will be the change’s ID. This will usually be printed out something like this:</p>
<blockquote>
<p><strong>l</strong>nwzpvxn<br>
<strong>sp</strong>xyzulm<br>
<strong>sw</strong>llwvql<br>
<strong>o</strong>qumxxyr</p>
</blockquote>
<p>A change ID is always made up from the letters k-z, which explains the “back half of the alphabet” feeling that these IDs have. In the code, this is sometimes described as <a href="https://github.com/jj-vcs/jj/blob/5b5a9e71c38d9e037c2d18f9a950552a553b7d7d/lib/src/backend.rs#L57"><code>reverse_hex</code></a>, because it’s the same as hexadecimal notation, just starting from the back of the alphabet instead of the front.</p>
<p>Change IDs are generally 16 bytes long. This is actually configurable based on the backend, but the Git backend (which is the one most people use) has this value set to <a href="https://github.com/jj-vcs/jj/blob/5b5a9e71c38d9e037c2d18f9a950552a553b7d7d/lib/src/git_backend.rs#L90">16 bytes</a>, so that’s what you’re most likely to see. When displayed in the UI, though, the <a href="https://github.com/jj-vcs/jj/blob/5b5a9e71c38d9e037c2d18f9a950552a553b7d7d/cli/src/config/templates.toml#L175">default template</a> only shows the first eight bytes (or eight characters of the hexadecimal format).</p>
<p>Showing only the first eight characters of the ID is fine, because when referencing a change ID (for example when doing something like <code>jj new -r &lt;id&gt;</code>), we don’t need to write the entire ID, we just need to write a unique prefix. As long as no other change starts with the same sequence that we’ve used, Jujutsu knows which commit we’re referring to. Given that there are roughly 280 trillion possible eight-byte sequences, the short ID is almost certainly unique, even for very large repositories.</p>
<p>Jujutsu goes a step further, though. The eight byte prefix is probably unique, but for every ID, Jujutsu also displays the specific prefix that is guaranteed to be unique for that change. When showing the ID in the terminal (and in the example IDs above), this is the part of the ID in bold.</p>
<p>How does this work? And why, for most of the changes that you’ll use regularly, is the prefix mostly only one or two characters?</p>
<h2 id="indexes-in-jujutsu">
  <a href="#indexes-in-jujutsu">Indexes in Jujutsu</a>
</h2>

<p>Jujutsu has a concept of indexes. Actually, Jujutsu has a number of indexes. If I’m honest, the indexing code is complicated to navigate around — I think this is partly the use of traits which make figuring out starting points more difficult, and partly because the indexing code is not very well documented right now. As a result, I will try and explain things as best I can, and I’ll let you know when I get stuck!</p>
<p>We can start by asking what indexes are used for. From what I can tell, the indexes resolve and identify prefixes for commit and change IDs. Roughly, we can imagine a big long list of IDs, all sorted alphabetically based on the ID itself. When we want to resolve a prefix for a change ID, we can do a fairly-efficient binary search on the data to identify the full ID, and we can use that ID then as an index into other stores.</p>
<p>The result might look something like this:</p>
<blockquote>
<p>00138948d32bee69319bf096a00d2b7c<br>
17a7166c862e79d7efe5b36d8b58a90b<br>
b704840fc5605d522ae6a7a5d09a38fd<br>
bc4e63c300bb28e899e31f11fe54f67b<br>
be3a486b7697d5d24f15e1634ed37fe8<br>
d29b12e490af660a2f2533088389f737<br>
d2d3105278d430876c07ea9767c7a28d</p>
</blockquote>
<p>Here, if I wanted to look up the prefix “d29”, I would start in the middle (with “bc4e…”), and see that my ID is greater than that number (because “d” &gt; “b” in hexadecimal notation). I can repeat this process, starting with the ID that’s halfway between the end of the list and the ID I just looked at (in this case “d29b…”). This starts with the prefix that I was looking up, so I know I’ve found the right ID<sup id="fnref:3"><a href="#fn:3" class="footnote-ref" role="doc-noteref">3</a></sup>.</p>
<p>The other side of this is, if we know the ID of a change, we can find out the shortest unambiguous ID by first looking for the ID, and then looking for its neighbours. For example, if I want to find the shortest possible prefix for the commit “bc4e…” in the above list, I can look at its neighbours and see that, while they both begin with “b”, neither begins with “bc”, and therefore I know that I can display “bc” as an unambiguous ID.</p>
<p>This is the simple version. In practice, there are multiple indexes. They all use the principles above (I believe), but in different ways.</p>
<ul>
<li>The “readonly” index stores all the IDs in a binary file. It knows how many IDs are stored in the file, and how long each ID is, so it can perform an efficient binary search on the data. Unfortunately, the format of the file does not appear to be well-documented, but I believe it contains separate indexes for commit IDs and change IDs. There also appears to be a graph-like data structure, but I’m not sure what this is for.</li>
<li>The “mutable” index stores commits and changes in two in-memory B-trees. The mutable index can also reference a parent readonly index, which I assume is for the case where you start with an existing readonly index, and want to only store the changes made to that index, and not duplicate it entirely.</li>
<li>The “composite” index is made up of multiple nested index segments. I believe this is an optimisation technique: rather than have one single index file with every single ID every made in it, you instead split the files up and check each file individually when trying to do a lookup.</li>
</ul>
<p>This all explains (somewhat) how Jujutsu can efficiently look up changes based on a prefix, and how Jujutsu can show you what the unique prefix for any change is. Unfortunately, it doesn’t answer the original question.</p>
<p>In a repository with thousands of commits (such as that of Jujutsu itself), it’s very likely that most change IDs will require 4-5 character prefixes. If you clone the Jujutsu repository and run <code>jj log &quot;all()&quot;</code> (i.e. showing all changes in the repository), you’ll see most of the highlighted prefixes are quite long. But run <code>jj log</code> without any arguments, and you’ll see that the highlighted prefixes are typically only one or two characters long. What’s going on?</p>
<h2 id="the-secret-fourth-index">
  <a href="#the-secret-fourth-index">The Secret Fourth Index</a>
</h2>

<p>The trick is that there’s actually a fourth index in Jujutsu, but it’s separate from the other indexes. This is the <a href="https://github.com/jj-vcs/jj/blob/6f1d15bd1a85aed635a64bd85c1a87467ef5132d/lib/src/id_prefix.rs#L147C1-L149C2"><em>ID prefix</em></a> index. This index is a partial index — it only covers a specific subset of IDs, and if it can’t find a given index, it delegates to the main repository index (that contains all the possible IDs).</p>
<p>The ID prefix index is initialised with a <em>revset</em>, which is like an SQL query but for Jujutsu changes. For example, you might write <code>bookmarks() &amp; author(me@me.me)</code>, which would resolve to all changes in the repository that (a) are labelled directly by a bookmark, <strong>and</strong> (b) have “me@me.me” as the change author. This revset language is a powerful way to describe different sets of changes, and it’s used extensively in Jujutsu. Commands that take an ID (e.g. <code>jj new -r &lt;id&gt;</code>) generally also take a revset that can describe multiple commits (e.g. <code>jj new -r 'all: bookmarks() &amp; local()'</code> will create a new commit that merges together all your local bookmarks — a surprisingly powerful and useful technique called a <a href="https://v5.chriskrycho.com/journal/jujutsu-megamerges-and-jj-absorb/">megamerge</a>)<sup id="fnref:4"><a href="#fn:4" class="footnote-ref" role="doc-noteref">4</a></sup>.</p>
<p>Revsets are also used a lot for configuration. When running <code>jj log</code>, only a subset of changes get shown — typically any local branches, the currently active change, and the trunk change (i.e. main/master/trunk). By default, this subset is defined in the <a href="https://github.com/jj-vcs/jj/blob/539cd75f902234d46327495f5da33e41b896470d/cli/src/config/revsets.toml#L10C1-L10C74">default Jujutsu configuration</a> as the following revset:</p>
<pre tabindex="0"><code>present(@)
  | ancestors(immutable_heads().., 2)
  | present(trunk())
</code></pre><p>Any change that matches this definition will get shown in the log<sup id="fnref:5"><a href="#fn:5" class="footnote-ref" role="doc-noteref">5</a></sup>.</p>
<p>The ID prefix index gets passed a revset like the log one. In fact, by default, it <em>is</em> the log revset. This is usually very convenient: the default log revset shows the changes in the repository that you’re currently working on, so it makes sense to want to quickly reference these changes. This flow of checking the log to see the current change IDs, and then using <code>jj squash</code> or similar commands to manipulate them is so common that the bare <code>jj</code> command is by default aliased to <code>jj log</code>. However, if you want to change it, you can directly set the <code>revsets.short-prefixes</code> configuration setting to the revset query of your choice.</p>
<h2 id="conclusion">
  <a href="#conclusion">Conclusion?</a>
</h2>

<p>To sum up, most of the short ID prefixes that you will use in Jujutsu can be short, because Jujutsu creates an index out of only the active changes, and consults this index first when looking up ID prefixes. The definition of “active changes” can be defined using a query language, and is configurable under <code>revsets.short-prefixes</code>.</p>
<p>About halfway through writing this article, I realised that the answer already exists in the documentation: in the section on &ldquo;<a href="https://jj-vcs.github.io/jj/latest/config/#display-of-commit-and-change-ids">Display of commit and change ids</a>”, there is the line &ldquo;To get shorter prefixes for certain revisions, set <code>revsets.short-prefixes</code>&rdquo;, followed by an example that prioritises only the current change and its “branch”.</p>
<p>But I enjoyed my journey to finding out the answer more, and I’ve ended up a lot more familiar with the Jujutsu codebase, and how Jujutsu works as a result.</p>
<div class="footnotes" role="doc-endnotes">
<hr>
<ol>
<li id="fn:1">
<p>There&rsquo;s an important distinction between <em>commits</em> and <em>changes</em>, in that <em>commits</em> are immutable and serve as a kind of backing store, whereas <em>changes</em> are mutable and represent a logical set of code changes. Each change is linked to a commit (the current immutable snapshot of that change), but if the contents of the change is updated, it will instead point to a different commit. Importantly, each change retains a history of the previous commits it pointed to — these commits can be seen by running <code>jj evolog</code>.&#160;<a href="#fnref:1" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:2">
<p>Go read Steve Klabnik’s “<a href="https://steveklabnik.com/writing/against-names/">Against Names</a>” post which specifically uses this fact about Jujutsu as an example for why avoiding names by design can sometimes make things much easier to use.&#160;<a href="#fnref:2" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:3">
<p>Technically, I might have tried to look up an ambiguous prefix. For example, if I’d tried to look up just “d2”, then I’d have found two IDs that start with this prefix. So once an ID has been found, there’s a bit of extra logic that ensures that the ID really is unique.&#160;<a href="#fnref:3" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:4">
<p>Besides the megamerge thing, there’s a couple of interesting things going on here. Firstly, a merge in Jujutsu is just a change with multiple parents. This is true in Git as well, but it’s not always obvious that this is the case! Secondly, the <code>all:</code> part of the revset is necessary here, because for many commands, Jujutsu requires that the user be explicit when referencing multiple changes. The <code>all:</code> is me explicitly saying “this revset may resolve to multiple changes, use them all”.&#160;<a href="#fnref:4" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
<li id="fn:5">
<p>This query translates roughly to “the current change, if it exists, OR the first two ancestors of any mutable change, OR the trunk/main/master change, if it exists”. Note that <code>X..</code> means “all changes not in X”, which is how we can define all mutable changes as being <code>immutable_heads()..</code>. The concept of immutable and mutable changes could be a blog post in its own right, but it’s discussed briefly in the documentation <a href="https://jj-vcs.github.io/jj/latest/config/#set-of-immutable-commits">here</a>.&#160;<a href="#fnref:5" class="footnote-backref" role="doc-backlink">&#x21a9;&#xfe0e;</a></p>
</li>
</ol>
</div>
]]></content:encoded></item></channel></rss>