Incredibly awesome, but with overlength

Joel David Hamkins answering Daniel Rubin’s questions is incredible. I just had to write this post. Both are great, Joel is friendly and explains extremely well, and Daniel is direct, honest, and engaging in a funny way. And they really talk with each other.

Joel talks with Daniel

For example Daniel asks “How do you explain what it is that you do to a layperson” and Joel’s answer at 9:48 goes “… of course this is connected with set theory, and large cardinals, and forcing, and different universes of set theory …” then Daniel interrupts “you don’t use those terms right of the bat with the layperson” and Joel admits “probably not, no …”

When Joel talks about infinite chess and positions worth omega, or omega+omega, Daniel interrupts: “We have to go backwards a little bit, I think … I feel like I know what cardinality is, but I never really understood ordinals, what is going on there …” and Joel “… it is really quite easy … we gona set aside any ultrafinitist objections … this is how you count to omega^2 … I can define a linear order like that” and Daniel at 17:50 “I understand what you mean – at least – the way I’m thinking of it is subsets of the real line …”

Now I wondered: which ordinals can be represented by a subset of the real line? I understand where Daniel is coming from: Cantor invented ordinals during his analysis of the convergence of Fourier series, and here subsets of the real line somehow played an important role (but they occurred in a different role … where you also can ask which ordinals will be relevant).

(Zeb answered that those are exactly the countable ordinals. And this is true for both roles, because Cantor proved that every closed subset of the real line can be uniquely written as the disjoint union of a perfect set and a countable set.)

Daniel inquired about sets and classes and related paradoxes. At 1:10:36 Joel explains “And this is a picture how the set theoretic universe comes grows. And if you have that picture, then you shouldn’t believe in general comprehension. … But then, the collection of all x such that phi(x), so if those x’es had arrived unboundedly in the hierarchy, there was no stage where you have them all, and so you never got the set. That’s what a proper class is, that’s the picture.”

Other conversations of Daniel and Joel

Daniel Rubin has a Playlist with many other conversations. Another conversation I liked was Modularity of Elliptic Curves | Math PhDs Outside Academia (Jeff Breeding-Allison). At 1:46:00 Daniel starts to talk about his grievances, and at 1:50:00 he starts to really express his exasperation, engaging in a funny way.

Joel David Hamkins has his own playlists too. More importantly, he had a similar session as with Daniel before, answering Theodor Nenu’s questions. Also here, at 28:27 Joel explains “And if you have this picture how sets are forming into a cumulative universe, then there is no support for the general comprehension principle.” The version of this explanation for Daniel was a bit more detailed and delivered slightly better, but of course it remains the same explanation. Now I wonder: what is the picture for NFU (Quine’s new foundations with urelements)? Some of Joel’s explanations in this session are mode advanced compared to his conversation with Daniel. For example, at 1:03:20 he mentions Suslin lines, and then explains Suslin’s problem.

Not bad overall, but nowhere near as incredible or awesome as the conversation with Daniel. It was actually the first conversation in Theodor Nenu’s Philosophical Trials Podcast Playlist. He certainly has interesting guests. I quickly browed into the latest episode, and I got the impression that he got more relaxed and better.

Other awesome stuff with overlength

In his conversation with Jeff Breeding-Allison, Daniel at 17:13 says “lost again, kept alive by some muslims just copied greek texts and finally by the renaissance it made its way back to Europe and Pierre de Fermat has a copy of Diophantos”. I am currently reading “Pathfinders: The Golden Age of Arabic Science” by Jim Al-Khalili (or rather “Im Haus der Weisheit: Die arabischen Wissenschaften als Fundament unserer Kultur”), and he paints a very different picture from “some muslims just copied greek texts”. He wrote that book after producing the 3 part BBC series “Science and Islam” :
Science and Islam – Islamic Knowledge (part 1)
Science and Islam – Ibn al-Haytham & Optics (part 2)
Science and Islam – Medieval Islam Influences (part 3)

Science & Islam (Full) | by Jim Al-Khalili | BBC Documentary (EN)

The ultimate overlength interviews on YouTube can be found at Web of Stories – Life Stories of Remarkable People – Playlists. I watched Edward Teller, Susan Blackmore, Hans Bethe, Freeman Dyson, and Murray Gell-Mann. But except for Susan Blackmore, I watched them before finding that global overview of the available playlists.

Looks like I watched a lot of physicists. For a bit more diversity, I regularly listen to
Sean Carrol – Mindscape Podcast
The are a wide variety of guests, and I try to respect that variety and listen to everybody, independent of background and subject.

A very special episode was Sean Carroll being interviewed by David Zierler of the American Institute of Physics’s Oral History project. Turns out that this oral history project has an impressive collection of interviews, for example:

N. David Mermin – interviewed by David Zierler

Werner Heisenberg – interviewed by Thomas S. Kuhn

Nicely produced “answer this nice question” sessions of a totally different kind are
Robert Lawrence Kuhn – Closer To Truth interviews


This post (or rather some of its links) existed since a long time. It also contained links to interviews from Joe Rogan of Sean Carroll, Jimmy Dore, Neil deGrasse Tyson, and maybe others. But most of those links were defunc, and I decided not to try to replace them. (I did try to recover them, but I have the impression that the material is no longer publicly available in its original form.)

Let me instead mention a logician and philosopher, who seems to consistently produce incredibly awesome overlength material: Walter Dean. I still need to read his latest paper on informal rigour. I really enjoyed his previous paper on consistency and existence in mathematics.

Posted in Uncategorized | Leave a comment

Fields and total orders are the prime objects of nice categories

A field is also a commutative ring, so it is an object in the category of commutative rings. A total order is also a partial order, so it is an object in the category of partially ordered sets. Neither are the prime object of those (nice) categories.

Fields are not the prime objects in the category of commutative rings, because some objects (for example the ring of integers) cannot be decomposed into a product of fields. Total orders are not the prime objects in the category of partial orders, because some objects (for example a non-total partial order on a set with three elements) cannot be decomposed into a product of total orders. At least not for the product (in the sense of category theory) arising with respect to the usual morphisms in the category of partially ordered sets.

Fields are the prime objects in the category of commutative inverse rings. Total orders are the prime objects in the Bool-category of partial orders. Those categories will be defined later, and their names will be explained or at least motivated.

But why do we claim that those are nice categories, or at least nicer than the category of fields and the category of total orders? At least they have products and (meaningful) subobjects, and are natural in various ways. The categories of fields and total orders have (meaningful) subobjects too (and are sufficiently natural), but they don’t have products.

Well, talking about prime objects in a category without products is sort of pointless. But the missing products are also a symptom in this case, of categories having so few morphisms that besides isomorphisms, automorphisms, and subobjects, not much interesting structure is left in the categorical structure.

The Bool-category of partial orders

The dimension of a partial order is the smallest number of total orders the intersection of which gives rise to the partial order. This is the idea how an arbitrary partial order arises as the product of total orders. So the task is to find a category where the product of partial orders (X,\leq_X) and (Y,\leq_Y) is given by (X\cap Y, \leq_X \land\leq_Y). Or more general, since a binary product is not enough for our purposes, the product of a family (X_i,\leq_{X_i}) of partial orders should be given by (\bigcap_i X_i,\bigwedge_i\leq_{X_i}).

The objects of the category will be pairs (X,\leq_X) where X is a set and \leq_X is a reflexive, antisymmetric, and transitive binary relation on X.  A Bool-category is a category where there is at most one morphism from A to B for any objects A and B. In our case, we define that there is a morphism from (X,\leq_X) to (Y,\leq_Y) exactly if X \subseteq Y and \leq_X\ \subseteq\ \leq_Y. (The binary relation \leq_X is a set in the sense that it is a subset of X \times X.)

So this is the Bool-category of partial orders. Or maybe not yet, because it should be a concrete category. So we still need to define its forgetfull functor U to the category of sets. It is U(\ (X,\leq_X)\ )=X on objects and U(\ m\ ) = f with f(x) = x on morphisms m. This may all seem very abstract, but it is basically just the subcategory of the category of partial orders, where the morphisms have been restricted to those f:X \to Y whose domain X is a subset of their codomain Y, and which are the identity on their domain.

Theorem The Bool-category of partial orders has products (in the sense of category theory) for arbitrary families (X_i,\leq_{X_i}) given by by (\bigcap_i X_i,\bigwedge_i\leq_{X_i}). Any partial order is a such product of a family of total orders.

Proof One just has to check the categorical definition of a product. (See non-existing Appendix A, or a follow-up post.) For the second part, a principle that any partial order can be extended to a total order is needed. This follows from the axiom of choice. Let (P,\leq_P) be a given partial order. Then for any two elements a,b with \lnot(a\leq_P b \lor b\leq_P a) take the transitive closure of the partial order where (a,b) is added and the transitive closure or the partial order where (b,a) is added, and extend both to a total order on P. The product of all those total order (two for each a,b with \lnot(a\leq_P b \lor b\leq_P a)) gives the partial order \leq_P.

This theorem is the precise way of stating that total orders are the prime objects in the Bool-category of partial orders. Why call them “prime objects”? Because we can see total orders as the simple building blocks of the more complicated partial orders. And a product (in the sense of category theory) is about the simplest construction for putting building blocks together.

The explanation or motivation for the name of the category is that it is the canonical category enriched over Bool. Being enriched over Bool means that the only information in the morphisms is whether there is a morphism from object A to object B or not. The name Bool-category suppresses the fact that this is also a concrete category and a subcategory of the category of partial orders. (Established names for such a category are posetal category or thin category.) But Bool- is so short and sweet (and the post was already written when I learned about the established names), so it seems to be a good name nevertheless.

Note that any Bool-category trivially has equalizers. Since our Bool-category has products, it automatically has all limits. It doesn’t have coproducts, but the closely related Bool-category of preorders has both products and (small) coproducts. If we denote the transitive closure of a binary relation R by t(R), then the coproduct of a family (X_i,\leq_{X_i}) of preorders is given by (\bigcup_i X_i,t(\bigvee_i\leq_{X_i})) (note that the family must be small, i.e. a set). (See non-existing Appendix B, or a follow-up post.) Any Bool-category also trivially has coequalizers, so the Bool-category of preorders has all limits and all small colimits.

The category of commutative inverse rings

A semigroup S is a set S together with a binary operation \cdot:S\times S \to S which is associative: \forall x,y,z\in S\quad x\cdot(y\cdot z)=(x\cdot y)\cdot z. To simplify notation, concatenation is used instead of \cdot and parentheses are omitted.
An  inverse semigroup is a semigroup S with an unary operation ()^{-1}:S \to S such that: \forall x,y\in S\quad x=xyx\land y=yxy \ \leftrightarrow \ y=x^{-1}. An element y satisfying the left side is called an inverse elements of x, so in other word this equivalence says that inverse elements exist and are unique.

An inverse ring is a ring whose multiplicative semigroup is an inverse semigroup.
A commutative inverse ring is an inverse ring whose multiplicative semigroup is commutative. A homomorphism between (commutative) inverse rings is just a homomorphism between the underlying rings. The operation ()^{-1} is preserved automatically due to its characterization purely in terms of multiplication.

The category of (commutative) inverse rings has those (commutative) inverse rings as its objects and those ring homomorphisms between them as morphisms. One sense in which those are nice categories is that they are a variety of algebras or equational class. This means they are the class of all algebraic structures of a given signature satisfying a given set of identities.

This is well known, but not obvious from the definition given above. The second post on this blog on Algebraic characterizations of inverse semigroups and strongly regular rings included such a characterization by the identities: x=xx^{-1}x \ \land \ x^{-1}=x^{-1}xx^{-1} and xx^{-1}\cdot y^{-1}y \ = \ y^{-1}y\cdot xx^{-1}. The first identity says that x^{-1} is an inverse element of x, and the second identity allows to prove that idempotents commute.

Computations like \frac{xy}{x+y}=\frac{1}{(y/y)/x+(x/x)/y}\neq\frac{1}{1/x+1/y} raise the question how easy or difficult it is to compute in commutative inverse rings. A partial answer is that the equational theory is decidable, but it is NP-hard nevertheless. (See non-existing Appendix C, or a follow-up post). This is neither nice nor ugly.

Theorem Every commutative inverse ring is a subdirect product of a family of fields. And every inverse ring is a subdirect product of a family of skew fields.

Proof Here is a proof by Benjamin Steinberg: “It is an old result that any ring whose multiplicative reduct is completely regular is a subdirect product of division rings.” (Here division ring is just another word for skew field.) But because he was unable to find the old reference, he just wrote down his own proof.

This theorem is the precise way of stating that fields are the prime objects in the category of commutative inverse rings. And we also learn that skew fields are the prime objects in the category of inverse rings. This property of the non-commutative version of the category further increases the niceness of the commutative version, in a certain sense.

What about the name of those categories? An existing name for inverse ring is strongly (von Neumann) regular ring. (And commutative (von Neumann) regular ring for commutative inverse ring.) Those are long names, and regular has already multiple other meanings for rings. Jan Bergstra coined the name meadow for commutative inverse ring and skew meadows for inverse rings. An advantage of those names is that they highlight the close connections to fields and skew fields. In fact, the multiplicative semigroup of an inverse ring is automatically a Clifford semigroup, which is an inverse semigroup with an especially simple structure. An advantage of the name inverse ring is that it highlights the definition was just the most canonical one.

Why think and write about that stuff?

My reference for first order logic was Heinz-Dieter Ebbinghaus, Jörg Flum, Wolfgang Thomas: “Einführung in die mathematische Logik”. They defined the notion of homomorphism mostly for universal Horn theories only. Or maybe not, the wording was more: “11.2.1 Theorem If the term interpretation is a model of the given first order theory, then it is also a free model, i.e. if we define homomorphism like this, then …”.  And then they had: “11.2.5 Corollary For a given first order theory that is consistent and universal Horn, the term interpretation is a model”.

(For total orders, the axiom x \leq y \lor y \leq x that every element is comparable with every other element is not universal Horn. For fields, the axiom x \neq 0 \to x^{-1}x = 1 that non-zero elements are cancelative is not universal Horn. Those axioms also prevent the term interpretation from being a total order, respectively a field.)

Still, the important point is that the most appropriate notion of homomorphism for fields and for total orders remained unclear. Why should h:A\to B be a homomorphism between models A and B exactly if R^A(a_1,\dots,a_n) \Rightarrow R^B(h(a_1),\dots,h(a_n)) and h(f^A(a_1,\dots,a_n))=f^B(h(a_1),\dots,h(a_n)) for all relational symbols R and all function symbols f?

This gave rise to the idea to find nice categories (where the appropriate notion of homomorphism is obvious) in which fields and total orders are embedded. But for this, the category of commutative rings and the category of partial orders would have been good enough. No need to talk about prime objects and using obscure categories (without well established names). The explanation for this part is that the simplest guess for the structure of inverse rings is that they are just subdirect products of skew fields. This turned out to be true, and since finite fields are closely related to prime numbers, talking about prime objects (instead of the established subdirectly irreducible terminology) was attractive.

A long time ago, I read about field in nLab, especially the discussion of “Constructive notions”. It presents many possible notions (of field) with no clear winner. (I plan to read Mike Shulman’s Linear logic for constructive mathematics at some point, because I guess it will make it clearer which constructive notion is best in which context.) It felt quite complicated to me, especially considering that fields are such an important and basic notion in mathematics. The axiom x \leq y \lor y \leq x for total orders can also be problematic for constructive notions (intuitive and classical logic interpret the “logical or” differently). Because I found that confusing, I asked a question at MathOverflow about it. That question received comments that this is an interesting construction, but not really a question. So it was closed (and later deleted automatically). I knew that my question also contained the suggestion that total orders might be prime objects for partial orders, but I didn’t remember whether a construction was included, and what it was.

So some years later, I tried to remember what I had in mind when suggesting that total orders might be prime objects. The construction for the dimension of a partial order seemed to fit best what I remembered, also because it was similar to a trick I once used related to orders and products. It certainly didn’t include the construction of a category, because I was not good at that stuff back then.

The reason why I got better at that category theory stuff is the Applied Category Theory Munich meetup group. (One motivation for me to finish this post was that in the last meeting, Massin mentioned that fields don’t form a nice category.) We first read Brendan Fong and David Spivak’s Seven Sketches in Compositionality: An Invitation to Applied Category Theory. It was easy to read, but introduced me to incredibly many interesting and new ideas. Because that was so nice and easy (except for the last chapter, which still has many typos and other indications of missing reader feedback), we continued with Tom Leinster’s Basic Category Theory. It was in response to exercise 1.1.12 that I first managed to construct a category of partial orders where the categorical product was given by the intersection of the binary relations. (It was not yet a nice category, because all partial orders had to be defined on the same underlying set.) It is also an impressively well written book, but goes far deeper into technical details than Fong & Spivak (where we covered the seven chapters in only eight meetings). For Leinster, we had two meetings for chapter 1, four meeting for chapter 2, and will again have multiple meetings for chapter 4.


This was the post I had planed to write next, after Defining a natural number as a finite string of digits is circular. It was expected to be a short post, but difficult to write. After I discovered the nice Bool-category of partial orders, it was no longer difficult to write, but it was no longer short either. (It would have been even longer with the appendices, but they were not written yet, and they invited discussions of additional unrelated concepts, so the appendices have been postponed for now. They may appear in a follow-up post.)

The point to write this post next was that the preorder on those non-standard natural number constructed in that post on circularity (based on provability whether one number is smaller or equal than another) was a concrete example of a constructive preorder given as the intersection of non-constructive total preorders. (The total preorders arise as the order between numbers in any actual model of the axioms.) This was unexpected for me, both that this phenomenon occurs naturally, and that the characterization as prime objects does not help at all to mitigate the non-constructive character of total orders.

Posted in inverse semigroups | Tagged , | Leave a comment

Prefix-free codes and ordinals

Very nice. I like that the ordinal based construction allows for quite some freedom, while still ensuring that every number can be represented uniquely, and that lexicographically greater codewords map to bigger numbers.

Allowing non-binary digits would provide even more freedom. I guess the only thing which would change in the ordinal based construction is that (2n-1+b) would be replaced by (rn-r+1+d), where d is an r-ary digit. And the encoding of 1^n0x would have to be adapted too.

I wonder what motivated you to write this post. Was it just the obvious motivation to construct a connection between ordinals and natural number representations?

What Immanuel Kant teach you

Consider the problem of representing a number in computer memory, which is idealized as a sequence of zeros and ones. The binary number system is a well-known solution to this problem — for example, the sequence “01101” represents $latex 0 cdot 2^4 + 1 cdot 2^3 + 1 cdot 2^2 + 0 cdot 2^1 + 1 cdot 2^0 = 11$. But there’s a problem: You don’t expect the entire computer to just be used to represent one number; you expect it to have other things stored afterwards. So how do you tell where the number ends? If the sequence begins $latex 01101001dots$ does this represent the number $latex 011_2$ or $latex 01101_2$ or $latex 0110100_2$?

The solution to this problem most commonly used in practice is to declare in advance a fixed number of bits that will be used to represent the number, usually 32 bits or 64 bits. For…

View original post 1,814 more words

Posted in Uncategorized | Leave a comment

Isomorphism of labeled uniqueness trees

The paper Deep Weisfeiler Leman by Martin Grohe, Pascal Schweitzer, Daniel Wiebking introduces a framework that allows the design of purely combinatorial graph isomorphism tests that are more powerful than the well-known Weisfeiler-Leman algorithm. This is a major achievement, see for example the beginning of an email I wrote to Jonathan Gorard (author of the paper Uniqueness Trees: A Possible Polynomial Approach to the Graph Isomorphism Problem) on June 25, 2016:

Dear author,

you recently proposed a purely combinatorial method towards determining isomorphism of graphs, which you called uniqueness trees. For purely combinatorial methods, the most interesting question is how they compare to Weisfeiler-Lehman, and “not being subsumed” by Weisfeiler-Lehman would be considered to be a major achievement, even if the method would not yield a polynomial time algorithm for checking graph isomorphism.

I never received an answer. This is not uncommon, as I wrote in another place:

I sometimes write the authors of such papers my thoughts, but the typical reaction is to totally ignore my email such that I don’t even know whether a spam filter eliminated it before reaching the author, the best reaction is an “thanks for your kind words, I’m used to much more insulting feedback”. Being totally ignored feels bad, but maybe it is an appropriate reaction to “proof refutation”?

But in this post, I want to elaborate another point which I mentioned in that email:

Rooted (colored) trees can be canonically labeled, which allows to use your method for iterative color refinement just like Weisfeiler Lehman. Your uniqueness trees on the other hand are also labeled with the original vertices, but isomorphism of rooted labeled trees is GI-complete, so the labels probably can’t be exploited/included for canonical labeling.

(For completeness, I will also give a small counterexample to the algorithm proposed in that paper.)

Definitions and known results for labeled tree isomorphism

For unlabeled rooted trees, the AHU algorithm (by Aho, Hopcroft and Ullman, Example 3.2 on page 84 in their book “The Design and Analysis of Computer Algorithms”) allows to efficiently decide whether two rooted trees are isomorphic. We may regard the AHU algorithm as an efficient implementation of color refinement (also known as naive vertex classification) for rooted trees. Therefore, color refinement decides (rooted) tree isomorphism. Does it also decide labeled tree isomorphism? That depends on what we mean by isomorphism of labeled graphs:

For labeled graphs, two definitions of isomorphism are in use.

Under one definition, an isomorphism is a vertex bijection which is both edge-preserving and label-preserving.[1][2]

Under another definition, an isomorphism is an edge-preserving vertex bijection which preserves equivalence classes of labels, i.e., vertices with equivalent (e.g., the same) labels are mapped onto the vertices with equivalent labels and vice versa; same with edge labels.[3]

For the first definition, where the isomorphism is required to be label-preserving, color refinement decides isomorphism. It won’t for the second definition, since we know

Theorem: Marked tree isomorphism is isomorphism complete

from section 6.4 Marked Graphs and Marked Trees in Booth, Kellogg S.; Colbourn, C. J. (1977), Problems polynomially equivalent to graph isomorphism, Technical Report CS-77-04, Computer Science Department, University of Waterloo. They used the definition that “A marked graph is a graph together with a partition of its vertices. An isomorphism of marked graphs is an isomorphism of the underlying graphs which preserves the partitioning.”

Isomorphism of labeled uniqueness trees is GI complete

I don’t remember how I figured out that isomorphism of rooted labeled trees is GI-complete when I wrote that email back in 2016. I guess I just looked it up in section 6.4 Marked Graphs and Marked Trees cited above. This time I didn’t look it up, and just came up with a rather obvious construction on the fly:

From a graph G, construct a rooted labeled tree T by putting a node below the root of the tree for every vertex of G. For every edge of G, put two nodes with the same label on the next lower level. An edge has two endpoints, connect the corresponding vertices (or rather the tree nodes corresponding to them) each with one of the tree nodes corresponding to the edge.

This construction does not yet rule out that the rooted labeled trees arising in the uniqueness trees algorithm could be efficiently tested for isomorphism. In those trees, no label is repeated on a given level of the tree. I initially believed that I had an efficient algorithm for this task. Then I realized that my algorithm was just an efficient implementation of color refinement for rooted labeled trees. So the question whether the algorithm worked boiled to whether color refinement decides isomorphism in this case. In the end it wasn’t that difficult to modify the above construction such that it also worked for this case:

A counterexample to the uniqueness trees algorithm

An easy way to break the uniqueness trees algorithm is to ensure that the trees don’t grow beyond depth 3 by replacing each edge by two nodes connected to the original end points of the edge, but not to each other. This ensures that only duplicate nodes are present at depth 2 or 3 of the uniqueness trees. Below we apply this construction to two non-isomorphis trees to construct two non-isomorphic graphs which cannot be distringuished by the uniqueness trees algorithm:
However, this counterexample does not answer the question whether the uniqueness trees algorithm is subsumed by Weisfeiler-Leman. Somebody told me that he remembers that the uniqueness trees algorithm was indeed subsumed by Weisfeiler-Leman. I still don’t see it, but I admit that it is quite plausible. It is probably subsumed by the algorithm where each vertex is individualized separately followed by color refinement to generate a label for that individualized vertex. And that algorithm in turn is probably subsumed by 2-WL. Or more generally, if each k-tuple is individualized separately followed by k-WL to generate a label for that k-tuple, then that algorithm is probably subsumed by 2k-WL.


This was not the post I had planed to write next. It is reasonably short, which is definitively a good thing. It may also be a good thing that it reminds us that marked tree isomorphism is known to be GI complete. In fact, this cs.stackexchange question about a polynomial time algorithm for marked tree isomorphism motivated me to reformulate my supposedly efficient algorithm for labeled uniqueness trees in simple terms (i.e. as an instance of color refinement). This in turn allowed me to step back and find a reduction to show that it cannot work.

Maybe this post would have been more interesting, if I had talked more about details of the new Deep Weisfeiler Leman paper. At least I explained why I consider it to be a major achievement.

Posted in isomorphism | Tagged | Leave a comment

Defining a natural number as a finite string of digits is circular

The length of a finite string is a natural number. If a given Turing machine halts on the empty input, then the number of steps it performs before halting is a natural number. (A Turing machine halts if it reaches a halt state in a finite number of steps.) The concept of natural number is implicitly contained in the word “finite”. Is this circularity trivial?

Suppose that you have a consistent (but maybe unsound/dishonest) axiom system that claims that some given Turing machine would halt. Then this Turing machine defines a natural number in any model of that axiom system. To expose the trivial circularity, we can now use this natural number to define the length of a finite string of digits.

This idea was part of yet another attempt to explain how defining a natural number as a finite string of digits is circular. This post will explore whether this idea can be elaborated to construct a concrete non-standard model of the natural numbers, or more precisely of elementary function arithmetic, also called EFA. (This formal system is weaker then Peano arithmetic, because its induction scheme is restricted to formulas all of whose quantifiers are bounded. It is well suited as base system for our purpose, because all our proofs still go through without modification, and it is much easier to grasp which statements are independent of this system. Basically, it can prove that a function is total iff its runtime can be bounded by a finitely iterated exponential function.)

An attempt to explain that the circularity also occurs in practice

This section can be skipped. It is not a part of the main text. It tries to further illuminate the motivation for this post by quoting from a previous attempt to explain that circularity:

  1. Some people like to imagine natural numbers encoded in binary (least significant bit first) as finite strings of 0s and 1s. Note that this encoding uses equivalence classes of finite strings, since for example 10 and 100 both encode the number 1, i.e. the number of trailing zeros is irrelevant.
  2. But the word “finite” string already presupposes that we know how to distinguish between finite and infinite strings. So let us assume instead that we only have infinite strings of 0s and 1s, and that we want to encode natural numbers as equivalence classes of such strings.
  3. … (taken from 7.2 INTEGERS in “An unsigned-integer is an N-byte (N > 0) integer value. … the remaining seven bits in each byte are concatenated to form the actual integer value itself.”
  4. … Slightly generalising those schemes, we can say that the length is encoded as a prefix-free code, and the number is encoded in binary in a finite string of 0s and 1s of the given length.
  5. The vagueness of “finite” comes in because of the prefix-free code part of the encoding. … And using a non-prefix-free encoding for the length will only create more vagueness, not less.

The idea quoted in the introduction suggests to use a Turing machine as encoding for the length. In a certain sense, this is indeed a non-prefix-free encoding. In another sense, the Turing machine itself must also be encoded (as a finite string), so this does not refute point 5. But we will ignore this further implicit occurrence of “finite” in this post.

Interactions between partial lies, absolute truth, and partial truths

Since we want to construct a concrete non-standard model, we will use EFA + “EFA is inconsistent” as our axiom system. Let us call this axiom system nsEFA, as an acronym for non-standard EFA. Let M be a Turing machine for which nsEFA claims that it halts.

Let us assume that M has a push-only output tape, on which it will successively push the digits describing some number N. In the simplest case, there is just a single digit available, and we get an unary representation of N. More succinct representations are also possible, but this unary representation corresponds more closely to the number of steps performed by M. This correspondence is not exact, which raises an interesting issue:

Can it happen that M only outputs a finite number of digits in an absolute sense, but that nsEFA is undecided about the number of digits output by M?

There is no reason why this should not happen. But this would be bad for our purpose, since it could happen that M would represent 3 in one model of nsEFA, and 4 in another. We can prevent this from happening, by allowing only those M for which EFA can prove that M will halt or output infinitely many digits.

Are there other ways to prevent this from happening, which don’t use the trusted axiom system EFA in addition to nsEFA? We could rely on absolute truth, and only allow those M which halt or output infinitely many digits. (This is equivalent to a \Pi_2^0 sentence for given M, hence I don’t like it.) Or we could use EFA only via the class of functions which can be proved total in EFA, and require that nsEFA must prove for a concrete total function tf(n) from this class that M will not take more steps before halting or outputing the next digit than tf(“total number of steps of M when previous digit was output”). (This is fine for me, since nsEFA cannot lie about this without being inconsistent.)

Preventing the same issue for more succinct representations is a bit more tricky. For a positional numeral systems, it seems to make sense to only allow those M which halt or output infinitely many non-zero digits. But in this case, we could have a valid M which outputs 99999… (in decimal) or 11111… (in binary), but if we add 1, we get 00000… as output, which would be invalid. One way to fix this issue is to allow the digit 2 (in binary) or the digit A (in decimal), then we can use 02111… (in binary) or 0A999… (in decimal) as output. (We will call this additional digit the “overflow” digit, and the digit preceeding it the “maximal” digit.) It is also possible to weaken our requirements instead of having to invent new representation systems (see Appendix C: Weakening the requirements for succinct representations).

So we fixed this issue, but there is another even more serious problem.

Here is the problem: comparing numbers for equality

Let us call a Turing machine which satisfies the conditions described in the previous section an n-machine (because it defines a possibly non-standard natural number in any model of nsEFA). Let M and M’ be n-machines (using the unary representation as output, for simplicity). So both M and M’ define a number in any model of nsEFA. If nsEFA can prove that both M and M’ produce the same output, then the number defined by M and M’ is the same in any model. If nsEFA cannot prove this, then there will be a model where M and M’ define different numbers. Which is good, since this gives us an obvious way to define equality on the set of n-machines. So where is the problem?

Even so we know that M defines a number in any model of nsEFA, we don’t know whether it will define the same number in all models. But we just fixed this in the previous section, no? Well, we only fixed it for the case where M defines a standard natural number.

Comparing numbers from different models for equality is not always possible. For example, the imaginary unit i from one model of the complex number cannot be compared for equality with the negative imaginary unit -i from another model. But often it is possible to say that two numbers are different, even if they come from different models, just like i+1 and i-1 are definitively different.

It can happen that nsEFA can neither prove that M and M’ produce the same output, nor that they produce different output. So there will be a model where M and M’ define the same number, and a model where M and M’ define different numbers. It is unclear whether we should conclude from this that M defines different numbers in different models. An argument using standard natural numbers might be more convincing:

Can it happen that the remainder of the division of M with a standard natural number like 2 or 3 is different in different models of nsEFA?

Yes, it can happen, since nsEFA can be undecided about the exact result for the remainder. So we could have |M| = x+x in one model, and |M| = y+y + 1 in another model (where |M| is the number defined by M in the corresponding models). The trouble is that there is even an n-machine M’, which defines x in the first model and y in the other model. M’ gets constructed from M by outputting only every second digit output by M. One could argue that this means that M or M’ really defines different numbers in those two models.

(For example, let M be a machine which searches for a proof of a contradiction in EFA, which always outputs two digits for each step during the search. Finally, it outputs an additional digit iff the bit at a given finite position in the binary encoding of the found proof is set.)

Why not stop here? What should be achieved?

We must conclude that the idea to build a concrete non-standard model of EFA using only numbers which can be defined by some n-machine M will not work as intended. So why not end the post at this point?

First, it is important to understand what has already been achieved. The idea for this post (see Appendix A: Initial questions about circularity and the idea for this post) was just to use an axiom system like nsEFA and a Turing machine (satisfying suitable requirements based on nsEFA) to define a natural number in any model of nsEFA. This has been achieved. The idea had a more subtle part, which was to use the Turing machine mainly for defining the length of the string of digits, which forms the actual natural number. This part still needs more elaboration, which is one reason to go on with this post.

Another idea for the construction presented in this post has not yet been mentioned explicitly. There is an interesting definition on the top of page 5 of the nice, sweet, and short paper Compression Complexity by Stephen Fenner and Lance Fortnow:

Let \mathbf{BB}(m) (“Busy Beaver of m”) be the maximum time for U(p) to halt over all halting programs p of length at most m. Let p_m be the lexicographically least program of length at most m that achieves \mathbf{BB}(m) running time.

That encoding of \mathbf{BB}(m) by p_m is nice: For fixed m, it is just as efficient as the encoding by the first m-bits of Chaitin’s constant, but much easier to explain (and decode, or rather it must not even be decoded). And for variable m, the overhead over Chaitin’s encoding is just quadratic, so not too bad either. But the “time to halt” is slightly too sensitive to minor details of the program to allow arithmetic. Replacing “time to halt” by “length of unary output” is a natural modification of this encoding which allows computable arithmetic (addition, multiplication, exponentiation, minimum, maximum, and possibly more).

On the one hand, it is no surprise that the concrete definition of these arithmetic operations for the concrete encoding is possible. On the other hand, the section below which actually defined those operations (for unary output) got so long and convoluted that most of its content has been removed from this post. One reason why it got so convoluted is that an n-machine defines a natural number in any model of nsEFA, so we have to prove that our concrete definition of these arithmetic operations coincides (for numbers definable by n-machines) with the arithmetic operations already available in models of nsEFA. In addition, feedback by various people indicated that it should have been made clearer how the concrete definitions of the arithmetic operations are computable and independent of nsEFA, while equality depends on nsEFA and is only semi-decidable (i.e. computably enumerable instead of decidable/computable). And there is also the confusion why it should be relevant that the arithmetic operations are concrete and computable, if we are fine with much less for equality (see Appendix B: What are we allowed to assume?).

More relevant than that certain operation are computable and that some relations are semi-decidable is that some operations are not definable at all. We have seen this in the previous section for the remainder (of the division of M with a standard natural number like 2 or 3), if the definition of an n-machine uses unary output. One may wonder whether we showed this fact sufficiently rigorously, and how our definitions gave rise to this fact. The important part of our definitions is that we prevented that a non-standard number can be equal to a standard number. So if some operation has a standard number as result, and if this standard number is different in different models of nsEFA for two concrete n-machines, then that operation is not definable at all (with respect to the specific definition of n-machine).

What we will try to show when discussing numbers encoded as strings of digits is that the minimum and maximum operations are not definable, but that there are number representations for which the remainder (of the division of M with a standard natural number like 2 or 3) is definable (and computable). This is another reason why we want to explore more succinct representations, in addition to the unary representation. However, the corresponding section does not dig into the more basic details of doing arithmetic with these succinct representations (see Appendix E: More on the LCM numeral system), but only mentions the bare minimum of facts to indicate how its claims could be elaborated.

Finally, it should also be mentioned that this post failed to address certain questions (see Appendix D: Some questions this post failed to answer). The constructed structures seem to be models of Robinson arithmetic, which is not a big deal. The totality of the order relation fails. Therefore, the constructed structures cannot be models of any axiom system which would prove the totality of the order relation. But what about satisfying induction on open formulas? Probably not. Another question is how well the constructed structure can act as a substitute for the non-existent free model of nsEFA, or as a substitute for the non-existent standard model of nsEFA? Probably not very well.

Another point where this post failed is the very existence of this section. It has been added after this post was already published, because feedback indicated that this post failed to clearly communicate what it is trying to achieve. Together with introducing this section, many existing sections have been turned into appendices, and Appendix A has been newly added. Sorry for that, it is definitively bad style to alter a published post in such a way. But this post felt significantly more confusing than any other post in this blog, so not trying to improve it was also not a good option.

Translating the meaning of words into formulas

To add the numbers |M| and |M’| defined by M and M’ (using the unary representation as output), we construct a machine M” which first runs M and then runs M’. We claim that M” is an n-machine and that nsEFA proves |M|+|M’| = |M”|. So the number defined by M” is the sum of |M| and |M’| in any model of nsEFA. We denote M” by ADD(M,M’).

But what does it actually mean that nsEFA proves |M|+|M’| = |M”|? It can only prove (or even talk about) formulas in the first order language of arithmetic. The statement that M defines a number |M| means that we can mechanically translate M into a formula \varphi_M(x) such that nsEFA proves \exists x\varphi_M(x) and \forall xy (\varphi_M(x)\land \varphi_M(y)) \to x=y. To prove |M|+|M’| = |M”|, nsEFA must prove \forall xyz (\varphi_M(x)\land \varphi_{M'}(y)\land \varphi_{M''}(z)) \to x+y=z.

We use a formula \varphi_{M\to}(u,u_0,\dots,u_n;s;v,v_0,\dots,v_n) which describes the behavior of M. It says that M transitions in s steps from the state (u,u_0,\dots,u_n) to the state (v,v_0,\dots,v_n). This allows us to define \varphi_M(x):=\exists sv_1\dots v_n\varphi_{M\to}(0,0,\dots,0;s;\mathbf{k},x,v_1,\dots,v_n). Here the first component v of the state (v,v_0,\dots,v_n) is the line number of the program (or the internal state of the machine), and \mathbf{k} is the unique line of the program with a \mathtt{HALT} statement (or the unique internal halt state of the machine). We used the second component v_0 of the state (v,v_0,\dots,v_n) as the push-only (unary) output tape.

We also define the sentence HALT(M):=\exists sv_0\dots v_n\varphi_{M\to}(0,0,\dots,0;s;\mathbf{k},v_0,\dots,v_n). (A sentence is a formula with no free variables.) M halts if and only if HALT(M) is true for the standard model of the natural numbers. However, the standard model of the natural numbers is not a model of nsEFA. But HALT(M) is still meaningfull for nsEFA: If M halts and nsEFA would prove that HALT(M) is false, then nsEFA would be inconsistent.

Already the definition of nsEFA as EFA + “EFA is inconsistent” uses this translation. We take some machine C which searches for a proof in EFA of a contradiction, and add the formula HALT(C) to EFA as an additional axiom. The expectation is that nsEFA can then prove HALT(C’) for any other machine C’ searching for a contradiction in EFA. This will obviously fail if C’ does unnecessary work in a way that nsEFA cannot prove that it will still find a contradiction in EFA. For example, it could evaluate the Ackermann–Péter function A(m,n) at A(m,0) for increasing values of m between the steps of the search for a contradiction. Or C’ could use a less efficient search strategy than C, like looking for a cut-free proof, while C might accept proofs even if they use the cut rule.

(What if we use EFA + “ZFC in inconsistent” or EFA + “the Riemann hypothesis is false” as our axiom system? In the first case, the remarks on the efficiency loss of cut-free proofs are still relevant. In the second case, it would be hard to come up with different translations for which EFA cannot prove that they are equivalent. And if EFA could prove “the Riemann hypothesis is true”, then that axiom system would be inconsistent anyway.)

We also define the sentence LIVE(M):=HALT(M)∨[\forall uu_0\dots u_n(\exists s\varphi_{M\to}(0,0,\dots,0;s;u,u_0,\dots,u_n)) \to (\exists svv_1\dots v_n\varphi_{M\to}(u,u_0,\dots,u_n;s;v,u_0+1,v_1,\dots,v_n)] (M halts or if M reaches state (u,u_0,\dots,u_n), then it reaches a state (v,u_0+1,v_1\dots,v_n)) which says that M will only take finitely many steps before it outputs another non-zero digit or halts.

M is an n-machine if nsEFA proves HALT(M) and EFA proves LIVE(M). (EFA proves \forall xy (\varphi_M(x)\land \varphi_M(y)) \to x=y, independent of whether M is an n-machine or not. Namely EFA proves (\varphi_{M\to}(u,u_0,\dots,u_n;s;v,v_0,\dots,v_n)\land\varphi_{M\to}(u,u_0,\dots,u_n;s;w,w_0,\dots,w_n)) \to (v=w\land v_0=w_0\land\dots\land v_n=w_n), which expresses the fact that M is deterministic.) If nsEFA proves HALT(M), then nsEFA also proves \exists x\varphi_M(x). This is trivial, since our definition of HALT(M) turns out to be equivalent to \exists x\varphi_M(x).

Back to M”:=ADD(M,M’). How do we know that nsEFA proves HALT(M”) and EFA proves LIVE(M”)? It follows because EFA proves HALT(M) \to HALT(M’) \to HALT(M”) and LIVE(M) \to LIVE(M’) \to LIVE(M”). Details of \varphi_{M\to}(u,u_0,\dots,u_n;s;v,v_0,\dots,v_n) are needed to say more. And nsEFA proves \forall xyz (\varphi_M(x)\land \varphi_{M'}(y)\land \varphi_{M''}(z)) \to x+y=z because EFA proves \varphi_{M\to}(u,0,u_1,\dots,u_n;s;v,v_0,\dots,v_n) \to \varphi_{M\to}(u,u_0,\dots,u_n;s;v,u_0+v_0,v_1\dots,v_n) for any M which doesn’t query the register \mathcal{R}_0 and only increases it.

Using register machines for discussing EFA proofs

We defined ADD(M,M’) in the previous section. The definitions of MUL(M,M’) and POW(M’,M) will be more complicated, because |M’| might be zero (or one), hence the requirement that an n-machine does not run forever without outputting additional digits requires some care. We will also define SUB(M,x), DIV(M,x) and LOG(x,M). Here x must effectively be a standard natural number, for example a number defined by an n-machine for which EFA (instead of nsEFA) can prove that it halts.

In any case, we better define some machine architecture with corresponding programming constructs to facilitate the definition of those operations and the discussion of EFA proofs. A concrete machine architecture also facilitates the discussion of details of \varphi_{M\to}(u,u_0,\dots,u_n;s;v,v_0,\dots,v_n). Instead of Turing machines, we use register machines, since most introductory logic books use them instead of Turing machines. Typically, they first define register machines over an arbitrary alphabet \mathcal{A}=\{a_1,\dots,a_L\} like

\mathtt{PUSH}(r,l) If l=0 then remove the last letter of the word in \mathcal{R}_r. Otherwise append the letter a_l to the word in \mathcal{R}_r.
\mathtt{GOTO}(r,c_0,\dots,c_L) Read the word in \mathcal{R}_r. If the word is empty, go to c_0. Otherwise if the word ends with the letter a_l, go to c_l.
\mathtt{HALT} Halt the machine.

and then focus on the case of the single letter alphabet \mathcal{A}=\{|\}. For that case, we use a more explicit instruction set, which is easier to read:

\mathtt{INC\ } \mathcal{R}_r Increment \mathcal{R}_r. \mathtt{PUSH}(r,1)
\mathtt{DEC\ } \mathcal{R}_r Decrement \mathcal{R}_r. If it is zero, then it stays zero. \mathtt{PUSH}(r,0)
\mathtt{GOTO\ label} Go to the unique line where \mathtt{label{:}} is prepended. \mathtt{GOTO}(0,c_\mathtt{label},c_\mathtt{label})
\mathtt{IF\ }\mathcal{R}_r\mathtt{\ GOTO\ label} If \mathcal{R}_r\neq 0, then go to \mathtt{label}. \mathtt{GOTO}(r,p+1,c_\mathtt{label})
\mathtt{HALT} Halt the machine. \mathtt{HALT}

To avoid \mathtt{GOTO}, we define two simple control constructs:

\mathtt{WHILE\ }\mathcal{R}_r\ \{\text{body}\} \mathtt{IF\ }\mathcal{R}_r\ \{\text{bodytrue}\}\mathtt{\ ELSE\ }\{\text{bodyfalse}\}
\mathtt{GOTO\ lc}
\mathtt{lb{:}\ }\text{body}
\mathtt{lc{:}\ IF\ }\mathcal{R}_r\mathtt{\ GOTO\ lb}
\mathtt{IF\ }\mathcal{R}_r\mathtt{\ GOTO\ lt}
\mathtt{GOTO\ lc}
\mathtt{lt{:}\ }\text{bodytrue}
\mathtt{lc{:}\ }

We also define two simple initialization constructs:

\mathcal{R}_r :=0 \mathcal{R}_s :=\mathcal{R}_r
\mathtt{WHILE\ }\mathcal{R}_r\ \{
__ \mathtt{DEC\ }\mathcal{R}_r
\mathcal{R}_h :=0
\mathtt{WHILE\ }\mathcal{R}_r\ \{
__ \mathtt{INC\ }\mathcal{R}_h
__ \mathtt{DEC\ }\mathcal{R}_r
\mathcal{R}_s :=0
\mathtt{WHILE\ }\mathcal{R}_h\ \{
__ \mathtt{INC\ }\mathcal{R}_r
__ \mathtt{INC\ }\mathcal{R}_s
__ \mathtt{DEC\ }\mathcal{R}_h

Next we need constructs that allow us to express things like “a machine which runs M, and each time M would output a digit, it runs M’ instead”. To avoid that M’ messes up the registers of M, we allow symbolic upper indices on the registers like \mathcal{R}_r^{M}, \mathcal{R}_r^{M'} or even \left(\mathcal{R}_r^{M}\right)^{MUL(M,M')}. We use this to define the program transform construction …

… and this section went on and on and on like this. It was long, the constructions were not robust, the proofs were handwavy, there were no nice new ideas, and motivations were absent. Maybe I will manage to create another blog post from the removed material. Anyway, back to more interesting stuff …

Succinct representations of natural numbers

The proposal from the introduction was to use the number of steps which a given Turing machine performs before halting to define the length of a finite string of digits. The observation that the unary representation fits into this framework too is no good excuse for not looking at more interesting strings of digits. So we have to look at least at a positional base b numeral system (for example with b=10 for decimal, or b=2 for binary). We will also look at the LCM numeral system, which is closely related to the factorial number system and to the primorial number system (see Appendix E: More on the LCM numeral system).

It is possible to define MAX(M,M’) and MIN(M,M’) for the unary representation. We did not define it in the previous section, because at least MIN(M,M’) requires to alternate between running M and M’, which was not among the constructions defined there. Neither MAX(M,M’) nor MIN(M,M’) can be defined for any of our succinct representations. The reason is that already the first few digits of the result will determine whether M or M’ is bigger, but there are M and M’ where in one model of nsEFA it is M which is bigger, and in another model it is M’.

All our succinct representations allow conversion to the unary representation. Since MAX(M,M’) and MIN(M,M’) are defined for the unary representation, but not for our succinct representations, we conclude that conversion from the unary representation to any of our succinct representations is not possible. The fact that conversion to unary is possible also makes it easier to compare numbers defined by n-machines using different representations for equality: The numbers are equal if nsEFA can prove that the numbers which were converted to unary are equal.

Let REM(M,x) (or rather |REM(M,x)|) be the remainder r from the division of |M| by x such that 0 \leq r < x. For the LCM numeral system and the factorial number system, we have that REM(M,x) can be defined for any standard natural number x. For the primorial number system, it can only be defined for any square-free standard natural number x. For binary, x must be a power of two, and for decimal, x must be a product of a power of two and a power of five. We conclude from this that binary cannot be converted to decimal, neither binary nor decimal can be converted to LCM, factorial, or primorial, and primorial cannot be converted into any other of our succinct representations.

What about the remaining conversions between our succinct representations that we have not ruled out yet? If we know the lowest i digits of the decimal representation, then we can compute the lowest i digits of the binary representation. Therefore, we conclude that decimal can be converted to binary. In theory, we would have to check that nsEFA can actually prove that directly converting a decimal nummber to unary gives the same number as first converting it to binary and then to unary. But who cares?

To compute the lowest i digits of the binary representation, it is sufficient to know the lowest 2i digits of the factorial representation, or the lowest 2i digits of the LCM representation. So factorial and LCM can be converted to binary. To compute the lowest i digits of the decimal representation, it is sufficient to know the lowest 5i digits of the factorial representation, or the lowest 5i digits of the LCM representation. So factorial and LCM can be converted to decimal. To compute the lowest π(i) digits of the primorial representation (where π(n) is the prime-counting function), it is sufficient to know the lowest i digits of the factorial or LCM representation. So factorial and LCM can be converted to primorial. To compute the lowest π(i) digits of the LCM representation, it is sufficient to know the lowest i digits of the factorial representation. To compute the lowest i digits of the factorial representation, it is sufficient to know the lowest ii digits of the LCM representation. So it is possible to convert LCM to factorial and factorial to LCM.

In the previous section, we defined ADD(M,M’), MUL(M,M’), POW(M’,M), SUB(M,x), DIV(M,x) and LOG(x,M) using the unary representation. ADD(M,M’), MUL(M,M’), and SUB(M,x) can be defined for any of our succinct representations, LOG(x,M) for none. For the LCM numeral system and the factorial number system, we have that DIV(M,x) can be defined for any standard natural number x. For binary, x must be a power of two, and for decimal, x must be a product of a power of two and a power of five. This is the same as for REM(M,x). But for primorial, DIV(M,x) cannot be defined at all, not if x is square-free (even so this is sufficient for REM(M,x) to be defined), not even if x is a prime number.

What about POW(M’,M)? We have b^a = \text{rem}(b,c)^{\text{rem}(a,\varphi(c))} \mod c, where \varphi(n) is Euler’s totient function. Therefore POW(M’,M) can be defined for the LCM numeral system and the factorial number system, but not for any other of our succinct representations, not even in case |M’|=2. What about other functions? There is a
list of functions
in the wikipedia article on primitive recursive functions. All those functions actually seem to be from ELEMENTARY, but that does not necessarily mean that they can be defined for one of our representations (not even unary). Maybe 4. Factorial a!, 11. sg(a): signum(a): If a=0 then 0 else 1, and 18. (a)i: exponent of pi in a are still interesting. The signum function can be definitively be defined for any of our representations. It seems that the factorial function too can be defined for any of our representations. It is interesting, because it will just output an infinite stream of zero digits for any non-standard natural number, for any of our succinct representations. Actually, our requirements for n-machines currently (see next section) don’t allow us to output an infinite stream of zero digits. However, there is a workaround: we can output the “overflow” digit instead of the first zero digit, and the “maximal” digits instead of the consecutive zero digits. The exponent of pi cannot be defined for unary. But we can define it as a function from the LCM or factorial representation to the unary representation.

So we have a nice collection of trivial facts about our succinct representations. But what are our succinct representations good for, compared to unary? After all, the representation is just an n-machine in both cases, and the n-machine for a succinct representation is not much shorter than the corresponding n-machine for the unary representation. Their runtime and output can be exponentially shorter, but EFA does not care about that either. There is at least one important advantage: The finite initial segments of output digits reveal more information than for unary, and that information is guaranteed to be present (which is not the case for unary). Turns out that for our succinct representation, this information is no more than the remainders of the division with all standard natural numbers (or a suitable infinite subset of the standard natural numbers).

But why is this advantage relevant? Why is our nice collection of trivial facts relevant? In terms of absolute truth, a given n-machine is on the one hand what nsEFA can prove about it, but on the other hand it is also a possibly infinite string of digits. The length of this string is bounded by the number of steps the n-machine performs before it stops. So we nearly succeeded in exposing the circularity of defining a natural number as a finite string of digits. The length of the string (in any of our succinct representations) is indeed a natural number, but only in unary representation. And it cannot (in general) be converted back to any of our succinct representations. Since we used EFA, it is nice that we found a succinct representations for which POW(M’,M) can be defined. Since Gödel’s β function \beta(x_1, x_2, x_3) = \text{rem}(x_1, 1 + (x_3 + 1) \cdot x_2) uses the remainder, it is nice that it can be defined. Since the exponent of pi is one way for encoding sequences into natural numbers, it is interesting that is can only be defined as a function from LCM or factorial to unary.


One of the initial ideas was to construct a non-standard model of EFA, or more precisely a model of nsEFA. This was not achieved. What has been achieved is a very concrete description of some non-standard numbers by Turing machines which do not halt in terms of absolute truth, but for which nsEFA proves that they halt. Those non-standard numbers define unique numbers in any model of nsEFA, and we ensured that it cannot happen that those numbers define different standard natural numbers in different models. We settled on an obvious way to compare those non-standard numbers for equality, and explained why we won’t be able to achieve our initial goal. We decided to go on with the post nevertheless, and show that addition, multiplication, and exponentiation can be defined for those numbers. We then started to dive deeply into technical details, till the point were a section got so technical that most of its content got removed again, because this is supposed to be a more or less straightforward and readable blog post after all.

Anyway, the most important thing about the natural numbers is the ability to write recursions. That’s kind of the defining property, really. You have to be able to do predecessor and comparison to zero or it’s not a good encoding, IMO.

This was a reaction to a previous attempt to explain that the encoding of natural numbers can be important. It has a valid point. Focusing on addition, multiplication, and exponentiation seemed to make sense from the perspective suggested by EFA, but recursion and induction should not have been neglected. On the other hand, Robinson arithmetic completely neglects recursion and induction, our non-standard number seem to be a model of it, and people still treat it as closely related to the natural numbers.

The main idea of this post was to use the natural number implicitly defined by a Turing machine which halts (or for which some axiom system claims that it halts) as the length of a string of digits. This idea turned out to work surprisingly well. The purpose was to illustrate how defining a natural number as a finite string of digits is circular. This was achieved, in a certain sense: We were naturally lead to define operations like SUB(M,x), DIV(M,x), MIN(M,M’), REM(M,x), and the exponent of pi in M. Those operations require three different kinds of natural number: standard natural numbers (those for which EFA can prove that the machine halts), numbers in succinct representation, and numbers in unary representation. SUB and DIV work for both unary and succinct, but the second argument must be standard. MIN only works for unary, and REM only works for succinct. The last one only works as a function from succinct to unary, where pi must be standard.

Were we really naturally lead there, or did some of our previous decisions lead us there, like to only allow those M which halt or output infinitely many non-zero digits? Well, yes and no. The decision to forbid non-standard numbers which are undecided about whether they are equal to a given standard number probably had some impact. At least it would have been more difficult otherwise to argue why certain functions cannot be defined. The specific way how we enforced this decision had no impact on the definability of the functions we actually investigated, and it is unclear whether it could have had in impact.

It did have a minor impact on our succinct representations, but in the end the “overflow” digit is only a minimal modification. It does offer some parallel speedup for addition, but signed digits are necessary to get good parallel speedup for multiplication. Those can be signed digits can be interpreted as a (a-b) representation (of integer numbers), and lures one into trying to find something close to the (c/d) representation (of rational numbers) which might allow good parallel speedup for division or exponentiation. (Well, that hope actually comes from a paper on Division and Iterated Multiplication by W. Hesse, E. Allender, and D. Barrington which says in section 4: “The central idea of all the TC0 algorithms for Iterated Multiplication and related problems is that of Chinese remainder representation.”) This lead to including the factorial and primorial representation in the succinct representations, and in trying to “fix” the primorial representation by including the LCM representation.

Appendix A: Initial questions about circularity and the idea for this post

Of course Hume is right that justifying induction by its success in the past is circular. Of course Copenhagen is right that describing measurements in terms of unitary quantum mechanics is circular. Of course Poincaré is right that defining the natural numbers as finite strings of digits is circular. …

But this circularity is somehow trivial, it doesn’t really count. It does make sense to use induction, describing measurement in terms of unitary quantum mechanics does clarify things, and the natural numbers are really well defined. But why? Has anybody ever written a clear refutation explaining how to overcome those trivial circularities? …

So I wondered how to make this more concrete, in a similar spirit like one shows that equality testing for computational real numbers is undecidable. The idea here is to take some Turing machine for which we would like to decide whether it halts or not, and write out the real number 0.000…000… with an additional zero for every step of the machine, and output one final 1 when (and only if) the machine finally halts. Deciding whether this real number is equal to zero is equivalent to solving the halting problem for the given Turing machine. How can one do something similarly concrete for natural numbers? Suppose that you have a consistent (but maybe unsound/dishonest) axiom system that claim that some given Turing machine would halt. Then this Turing machine defines a natural number in any model of that axiom system. To expose the trivial circularity, we can now use this natural number to define the length of a finite string of digits. How do we get the digits themselves? We can just use any Turing machine which outputs digits for that, independent of whether it halts or not. Well, maybe if it doesn’t halt, it would still be interesting whether it will output a finite or an infinite number of digits, but in the worst case we can ask the axiom system again about its opinion on this matter. The requirement here is that the natural numbers defined in this way should define a natural number in any model of that axiom system.

Appendix B: What are we allowed to assume?

It might seem strange to cast doubt on whether the concept of a natural number is non-circular, but at the same time taking Turing machines and axiom systems for granted. An attempt at an explanation why natural numbers are a tricky subject (in logic) was made in a blog comment after the line Beyond will be dragons, formatting doesn’t work… The bottom line is that it is possible to accept valid natural numbers, but not possible to reject (every possible type of) invalid natural numbers. So what should we allow ourselves to assume and what not? Specific natural numbers, Turing machines, and axiom systems can be given, since we can accept them when they (or rather their descriptions) are valid. A Turing machine can halt, and an axiom system can prove a given statement. This also means that we can talk about an axiom system being inconsistent, since this just means that it proves a certain statement. Talking about an axiom system being consistent is more problematic. We have done it anyway, because this appendix was only added after the post was already finished (based on feedback from Timothy Chow). We did try to avoid assuming that arbitrary arithmetical sentences have definitive truth values, and especially tried to avoid using oracle Turing machines.

However, in the end we are trying to construct a concrete model of an unsound/dishonest axiom system. Such an axiom system can have an opinion about the truth values of many arbitrary arithmetical sentences. So if it would have helped, we would have had no problem to assume that arbitrary arithmetical sentences have definitive truth values. And assuming consistency definitively helps when you are trying to constuct a model, so we just assumed that it was unproblematic.

Appendix C: Weakening the requirements for succinct representations

We did not use the “overflow” digit much in discussing our succinct representations. But we simply did not go into details of addition and multiplication. The previous post on signed digit representations focused more on details. It explained for example why having signed digits is even better than having an “overflow” digit for the speed of multiplication. At the point were we introduced the requirement to output infinitely many non-zero digits (which made the “overflow” digit necessary), we postponed the discussion of alternatives.

Let us try to weaken our requirements instead of having to invent new representation systems. We could forbid trailing zeros (or only allow a finite number of trailing zeros). This is meaningless in terms of absolute truth, since it is meaningless to talk about the trailing zeros if the machine M outputs infinitely many digits. What we want to express is that “if M would halt, then there would be no trailing zeros (or no more trailing zeros than a given finite number)”. So we are fine if nsEFA can prove this, since our requirements only need to ensure that non-standard natural numbers cannot mix with standard natural numbers. A slight refinement is to require that nsEFA must prove for a concrete function tf(n) which can be proved total in EFA that if M outputs more than tf(“total number of steps of M when previous non-zero digit was output”) consecutive zeros, then it will output at least one further non-zero digit before it halts. This refinement tries to ensure that a machine M valid according to our initial requirements is also valid according to our weakened requirements.

Appendix D: Some questions this post failed to answer

The section on comparing numbers for equality showed that our plan for constructing a concrete non-standard model of EFA will not work as intended. But this leaves open some questions, since we did construct a concrete stucture:

  • The totality of the order relation obviously fails. But our concrete structure still seems to be a model of Robinson arithmetic. Does it also satisfy induction on open formulas? It is not obvious why induction on open formulas should fail.
  • Is there some non-concrete equivalence relation on our structure such that the resulting quotient structure would be a model of nsEFA? Can we just take any consistent negation complete extension of nsEFA, and identify those n-machines for which that extension proves that they produce the same output? At least we are not guaranteed to get a model of that extension, because it will (in general) prove that more Turing machines halt than we have included in our structure. (Even worse, the extension has an opinion about arbitrary arithmetical sentences, so we might even have to include oracle Turing machines in our structure, if we wanted to get a model of that extension.) So the resulting quotient structure might not satisfy the induction scheme of nsEFA. Still, the order relation would be total, and the remainder (for a standard natural number) would be defined.
  • In a certain sense, the free algebra is a canonical model of an equational theory. The straight-line program encoding is natural for storing the elements of the free algebra. But nsEFA does not have a canonical model, and the straight-line program encoding of natural numbers might rather feel deliberately perverse. Some people really do not understand why you would want to ever consider this as an encoding of natural numbers. And with respect to the natural numbers, the straight-line program encoding may indeed be inappropriate. So our efforts might be interpreted as an attempt to find appropriate encodings which can take over the role of straight-line program encodings in case of nsEFA. But can we make it more precise in which way our n-machines provide appropriate substitutes for the elements of the non-existing free algebra? Who knows, maybe if we tried hard enough, we would find reason why we cannot do that.

Appendix E: More on the LCM numeral system

As said before, we did not actually talk much about arithmetic in the LCM or factorial representation. Let me copy a discussion about division for factorial and LCM, and relatively fast multiplication for LCM below, to talk at least a bit about those topics:

If an arbitrarily huge number is divided by a fixed number k (assumed to be small), then the factorial number system only needs to know the lowest i+2k digits of the huge input number for being able to write out the lowest i digits of the result. The LCM number system also only need to look ahead a finite number of digits for writing out the lowest i digits of the result. The exact look ahead is harder to determine, it should be more or less the lowest i+ik lowest digits of the huge input number.

However, the LCM number system also has advantages over the factorial number system. Just like the primorial number system, it allows a simple and relatively fast multiplication algorithm. The numbers can be quickly converted into an optimal chinese remainder representation and back:

x_2 + 2 x_3 + 6 x_4 + 12 x_5 + 60 x_7 + \dots + \text{lcm}(1..(p^r-1)) x_{p^r} \ = \ x \ with \ 0\leq x_{p_i^{r_i}} < p_i.

x_2 = x \mod 2
x_2 + 2 x_3 = x \mod 3
x_2 + 2 x_3 + 2 x_4 = x \mod 4
x_2 + 2 x_3 + x_4 + 2 x_5 = x \mod 5
x_2 + 2 x_3 + 6 x_4 + 5 x_5 + 4 x_7 = x \mod 7
x_2 + 2 x_3 + 6 x_4 + 4 x_5 + 4 x_7 + 4 x_8 = x \mod 8
x_2 + 2 x_3 + 6 x_4 + 3 x_5 + 6 x_7 + 6 x_8 + 3 x_9 = x \mod 9
x_2 + 2 x_3 + 6 x_4 + \dots + \text{lcm}(1..(p_i^{r_i}-1)) x_{p_i^{r_i}} = x \mod p_i^{r_i}

So to multiply x and y, one determines an upper bound for the number of places of z = x\cdot y, then computes value of x and y modulo all those places, multiplies them separately for each place (modulo p_i^{r_i}), and then converts the result back to the LCM representation. The conversion back is easy, since one can first determine z_2, then z_3 by subtracting z_2 from the known value z \mod 3 before converting it to z_3, then z_4 by subtracting z_2+2z_3 from the known value z \mod 4 before converting it to z_4, and so on.

This chinese remainder representation is optimal in the sense that the individual moduli are as small as possible for being able to represent a number of a given magnitude. The LCM number system may be even more optimal than the primorial number system in this respect. (It should be possible to do the computations modulo p_i^{r_i} only for the biggest r_i, due to the structure of the LCM number system.)

Posted in Uncategorized | 3 Comments

Theory and practice of signed-digit representations

The integers are sometimes formally constructed as the equivalence classes of ordered pairs of natural numbers (a,b). The equivalence relation \sim is defined via

(a,b) \sim (c,d) iff a+d=b+c

so that (a,b) gets interpreted as a-b. This motivates the signed-digit representation. To avoid storing two numbers instead of one, the subtraction gets performed on the level of digits. This leads to signed digits. The overhead can be kept small by using unsigned 7-bit or 31-bit digits as starting point (instead of decimal or binary digits).

The overhead may be small, but the signed-digit representation remains redundant and non-unique. Comparing two integers in this representation is a non-trivial operation. But addition (and subtraction) becomes easy from a hardware perspective, since carry-bits no longer propagate. Multiplication also gets easier, since a part of multiplication consists of adding together intermediary results.

This post started as a subsection of another yet to be finished post. That subsection tried to use the trivial redundancy of representations of the computable real numbers as motivation to use similar redundant representations also for natural numbers. However, the constant references to simpler precursor ideas, to the well known practical applications, and to further complications and generalizations required in practical applications made that subsection grow totally out of proportion. It is still awfully long, even as a standalone post. The subsections try to cut it into more digestible pieces, but all the references to external material remain. The reader is not expected to read much of the linked material. It only serves as proof that this material has practical applications and is well known. This post may be seen as describing one specific alternative number format in detail, for which I dreamt about sufficient hardware support in a previous post.

Fast parallel arithmetic

In the 6502 Assembly language (C64, Apple II, …), addition is performed by the ADC instruction, which is the mnemonic for Add Memory to Accumulator with Carry. It adds the value in memory plus the carry-bit to the value in the accumulator, stores the result in the accumulator, and sets the carry-bit according to whether the addition overflowed or not. This operation could be represented as A + M + C -> A, or more precisely as
(A + M + C) % 256 -> A
(A + M + C) / 256 -> C
Since A and M can be at most 255 and C at most 1, the resulting C will again be either 0 or 1. This allows to add two arbitrarily long numbers. First one sets the carry bit to zero, then adds the lowest bytes, and then successively the higher bytes. Nice, but not very parallel.

Since the 6502 Assembly language has no instruction for multiplication, calling it fast would be an exaggeration. Let us instead imagine a machine which can perform 8 add with carry additions in parallel, where the carry-bits are stored in an additional byte. Since the meaning of the carry-bits before and after the parallel addition are different, we probably want to left-shift the carry-bits after the addition. Let us imagine that this machine can also perform 4 multiply with carry multiplications in parallel. How should it handle the carry-bits? Since (255+1)*(255+1)-1 can be represented in two bytes, it seems reasonable to treat them as a simple addition of 1 or 0 on input. The carry-bit on output might be interpreted as addition of 256 or 0, which would be consistent with the interpretation on output after addition.

Outputting a second carry-bit to be interpreted as addition of 256^2 or 0 would also be consistent, even so it is not required here. (Since (255+16+1)*(255+16+1)-16^3-256-16-1 = 16^4+16^3-16-1 cannot be represented in two bytes, allowing a second carry-bit may be useful to simplify recursive implementation of multiplication.) It is slightly inconsistent that the add with carry instruction only consumes one carry-bit per addition. After all, the idea is to represent intermediate results as M+C, in order to enable fast parallel addition of intermediate results. But consuming two carry-bits would be problematic since the result of (255+1)+(255+1) is not nicely representable.

What we just described is not new (of course). It is basically just the Carry Save Adder technique used to parallelize multiplication. However, since arithmetic with very long numbers is often used for modular arithmetic, this is not yet the end of the story. The Drawbacks section on wikipedia say that it has to be combined with the Montgomery modular multiplication technique, to be useful for public-key cryptography.

It is cool that our imagined computer can multiply 4 bytes and add 8 bytes in parallel, but how would we actually multiply two 4-bytes long numbers? Each byte of the first number must be multiplied by each byte of the second number, those results must be arranged appropriately and added together. The required 16 multiplication can be done by 4 subsequent parallel multiply with carry instructions, where the second 4-bytes (and the corresponding carry bits) are rotated in between. We have to add 1, 3, 5, 7, 7, 5, 3, 1 bytes in the different places of the result, which requires at least 24 additions. There are at most 6 additions in the last round, so we need at least 4 rounds of (8 bytes) parallel additions. And we might need even more rounds, since we get slightly too many carry-bits.

Faster multiplication and signed integers

Subtraction and signed integers are useful even for multiplication of positive integers, since subtraction is a basic operation required by fast multiplication algorithms like Karatsuba multiplication or its generalization, Toom-Cook multiplication. So what we describe next is not new either. See for example section “3.2 Multiplication” of The Complexity of Boolean Functions by Ingo Wegener, 1987, where the signed-digit representation is used for fast parallel multiplication algorithms.

The 6502 Assembly language also has an SBC instruction, which is the mnemonic for Subtract Memory from Accumulator with Borrow. This operation could be represented as A – M – ~C -> A, apparently the 6502 performs an ADC instruction with the bitwise-not of M. It makes sense, as this is the cheapest way to reduce subtraction to addition. For us, a SBB instruction where ~C got replaced by B fits better into our interpretation scheme
(A – M – B) % 256 -> A
(A – M – B) / 256 -> -B
The integer division here truncates towards -\infty, we could also write
(A – M – B + 256) % 256 -> A
(A – M – B + 256) / 256 -> 1-B
We can read A – M – B both as A – (M+B) and as (A-B) – M. The second reading is preferable, if we imagine a machine which can perform 8 SBB instructions in parallel, where the borrow-bits are stored in an additional byte. Again, we should left-shift the borrow-bits after the subtraction, since their meaning before and after the parallel subtraction is different. The parallel SBB instruction (+ left-shift) returned a signed integer in the a-b signed-digit representation, which we mentioned in the introduction.

Even so bit-twiddling may be nice, it somehow makes the trivial idea behind representing signed integers as a-b using signed-digits feel unnecessarily complicated. For D >= 3, we may just assume that we have a digit between -D+1 and D-1, after addition or subtraction we will have a result between -2*D+2 and 2*D-2, and if we stay in the range -D+2 and D-2, then we can just emit an overflow of -D, 0, or D, and the next digit can consume that overflow without overflowing itself.

More bit-twiddling with borrow-bits

Can we define a straightforward (parallel) ADB (add with borrow) instruction, which adds an unsigned integer to a signed integer, and returns a signed integer? We get
(A-B + M + 1) % 256 -> A
(A-B + M + 1) / 256 -> 1-B
if we just use ADC and replace C by (1-B). This represents the result as -1 + A + (1-B)*256. But how is this compatible with our signed-digit representation? When we left-shift the borrow-bits after the addition, we fill the rightmost place with 1 (instead of 0). In the other places, the -1 cancels with the 1 from (1-B)*256 of the previous byte. The leftmost place is also interesting: If B is set, then we can ignore it. But if it is not set, then A needs an additional byte set to 1. So signed-digit representation works with nearly no overhead.

We basically just replaced C by (1-B). But how could this trivial change enable us to represent signed integers? A-B = A – (1-C) = A+C – 1, so A and C now describe a digit in -1..255, while before it was a digit in 0..256. Even so we prefer the borrow-bits variant here, this way of looking at it (offset binary) can sometimes be helpful, for example when trying to interpret the SBC instruction of the 6502 Assembly language.

In my previous thoughts about (big) integers, there was the question whether the two’s complement representation is well suited for a variable width integer data type. One argument against it was that the addition or subtraction of a small number from a big number would take an unnecessarily long time. In the borrow-bits variant, for addition all bytes where the borrow-bit is not set must be touched, and for subtraction all bytes where the borrow-bit is set must be touched.

Before we come to parallel multiplication, what about multiplication by -1 and by a power of two? -1*(A-B)=B-A, so SBB of B (as bytes without borrow) and A gives the correct result. It is the sum of the bitwise-not of A, the byte-wise B and a carry-bit set to 1. The resulting (to be left-shifted) borrow-bit B is given by (1-C). The result of multiplication by a power of two is given by a left-shift followed by SBB. The same holds for multiplication by minus a power of two. In general, two successive multiplications by -1 will not yield the identity on the binary level, and identities about multiplication by powers of two (like associativity) will not hold on the binary level.

How to perform 4 multiply with borrow multiplications in parallel? If the borrow-bits are interpreted as simple addition of -1 or 0 on input, then the smallest result is -255, and the biggest 255*255, so the borrow-bit on output might be interpreted as addition of -256 or 0. However, what about the problem that addition cannot consume two borrow-bits? ADC outputs a digit in 0..2*255+1, SBB outputs a digit in -256..255, and ADB outputs a digit in -1..2*255. If we let ADB instead output a digit in 0..2*255-1 and an overflow of -256, 0, or 256, then the next digit can consume that overflow without overflowing itself. This extends the output range of ADB to -256…3*255, so we can now consume two borrows-bits of -1 and three bytes. (Basically the same as if we had just performed two subsequent ADB instructions, but the borrow propagation behaves nicer.)

The output range can also be extended one-sided (by 255). Extending ADB to the negative side or extending SBB to the positive side both yield a range -256..2*255, which can consume two borrow-bits of -1 and two bytes (or more precisely two positive bytes, one negative byte, and one borrow-bit of -1). ADC can be extended to the positive side to a range 0..3*255+1, so that it can consume two bytes and two carry-bits of +1.

Redundant representations of the computable real numbers

The idea of using a redundant representation for computable real numbers somehow feels trivial. There is no way to get a completely non-redundant representation anyway. The non-trivial idea is to exploit the redundancy to improve the computational properties of the computable real numbers. This idea is not new (of course, I keep repeating myself), and I tried my luck with it in an email 17 Dec 2018 TK: A signed-digit representation solves some issues to Norman Wildberger in response to the issues he raised in “Difficulties with real numbers as infinite decimals (II) | Real numbers + limits Math Foundations 92” on slide 8 at 31:24 before claiming: No such theory exists (2012)!! There are redundant representations of the computable real numbers, for which the operations of addition, subtraction and multiplication are computable. At the beginning of Problems with limits and Cauchy sequences | Real numbers and limits Math Foundations 94 on slide 1, we read

Attempts at ‘defining real numbers’:
– as infinite decimals
– as Cauchy sequences of rationals
– as Dedekind cuts
– as continued fractions
– others (Axiomatics, …)
None of these work!

In subsection “2.2. Non-termination” of Luther Tychonievich’s post on Continued Fractions, it is explained why computability (of addition and multiplication) fails for a representation by classical continued fractions. But they would be computable, if signed integers (instead of natural numbers) were allowed in the continued fraction expansion. One still needs to keep some restrictions on the numbers in the expansion, but it can be done, see section “3.3 Redundant Euclidean continued fractions” of Exact Real Com­puter Arithmetic with Con­tin­ued Frac­tions by Jean Vuillem­in, 1987. The signed-digit representation is also described there, in section “3.1 Finite representations of infinite sequences of numbers” at the bottom of page 13.

An infinite signed-digit representation of the computable real numbers

Signed-digit representations are not limited to integers. We will use them as representation of computable real numbers. The apparent drawback that the signed-digit representation of a number is not unique becomes a feature when it is used to represent computable real numbers.

We could try to define a computable real number as a Turing machine outputting the digits of its infinite binary extension. An objection is that addition of two such computable real numbers would not be computable. But what if we define it instead as the difference of two non-negative series of digits. It turns out that addition, subtraction and multiplication then become computable. (The reason why a signed-digit representation helps there is because it can eliminate chains of dependent carries.) However, testing whether such a number is zero turns out to be undecidable. This is fine, since

… equality testing for computable real numbers is undecidable. The idea here is to take some Turing machine for which we would like to decide whether it halts or not, and write out the real number 0.000…000… with an additional zero for every step of the machine, and output one final 1 when (and only if) the machine finally halts. Deciding whether this real number is equal to zero is equivalent to solving the halting problem for the given Turing machine.

Even so equality testing is undecidable, it would still be nice to have a representation for which addition and multiplication (and many other operations) are computable. And the signed-digit representation achieves exactly this, more or less. Division is still not computable, since it has a discontinuity at zero, and the signed-digit representation cannot easily avoid that issue. Using a representation via (a-b)/(c-d) can work around that issue, at the cost of also allowing 1/0 and 0/0. Jean Vuillem­in tries to explain why this is important, but at least initially his definitions just confused me.


Even so the above text is awfully long and trivial, at least it is a proper blog post again. Maybe I should have talked more about the less trivial stuff, but the trivial stuff already filled the post. And this bit-twiddling is a real part of my thinking and acting, at least in my distant past. And I hope that having finally finished this post will allow me to also finish the other to be finished post soon, the one whose subsection grew totally out of proportion and turned into this post.

Posted in Uncategorized | 4 Comments

A list of books for understanding the non-relativistic QM — Ajit R. Jadhav’s Weblog

I really like that this post points out that QM is vast, and that it takes a prolonged, sustained effort to learn it. I also like that it explicitly recommends chemistry books. My own first exposure to the photoelectric effect (before university and before my physics teacher at school had even mentioned quantum) was from an introductory chemistry book by Linus Pauling.

TL;DR: NFY (Not for you). In this post, I will list those books which have been actually helpful to me during my self-studies of QM. But before coming to the list, let me first note down a few points which would be important for engineers who wish to study QM on their own. After all, […]

via A list of books for understanding the non-relativistic QM — Ajit R. Jadhav’s Weblog

Posted in Uncategorized | 1 Comment

I’m not a physicist


At the end of 2016, I decided to focus on working through an introductory textbook in quantum mechanics, instead of trying to make progress on my paper(s) to be published. I finished that textbook, which taught me things like Clebsch-Gordan coefficients, the Aharanov-Bohm effect, perturbation theory, Fermi’s golden rule, path integrals, and the Dirac equation. Another physics book that I worked through in 2017 taught me about thermodynamics and its importance for chemistry and solid state physics.

I spend much time with other physics textbooks in 2017, but didn’t read them cover to cover. In general, I spent much time in 2017 on physics, and I was happy about that. My day job (in semiconductor manufacturing) is concerned with physics to a large part. I previously had a bad conscience, because I was spending so little time on physics, or at least I was making so little progress.

But spending so much time on physics meant that little time was left for this blog. And I submitted (with coauthors) an abstract for a conference, which got accepted as oral presentation. The conference was end of February 2018, and I spend the last two months before working day and night to still finish the paper (and presentation) in time. But even after the conference, I continued to spend most of my time on physics. So a post about physics seems like the only reasonable way to get any blog post done at all.

My love-hate relationship with physics

When I had the choice for my first job between one closely related to my thesis and one closely related to physics, I took the one related to physics. In my experience, establishing friendly relationships with physicists was easy. It did work out as expected, but I was surprised how much physics ended up on my plate. But it was fine for me, those were things like electrodynamics and optics, which were easy for me. When quantum mechanical tasks started to land on my plate, I protested that I didn’t manage to finish that course at university. But after switching jobs (Feb 2013), I sort of had to accept that the quantum mechanical tasks were part of my responsibilities.

On the one hand, this was a nice excuse for me to try to understand that part of physics which I was once unable to understand. On the other hand, the question “Why me? There are real physicists that already understood this stuff while at university!” hit me with all its practical implications. I had to spend time to read books and articles about practical and theoretical aspects of my specific tasks. And in parallel, I also had to fill-in the gaps in my understanding of the basics. But I never fully devoted my time to physics, at least not before 2017.

Given that it is my job, am I even allowed to admit that I still fill-in the basics? Or am I too hard trying to pretend being a physicist, for whatever reasons? I did wonder whether I should do a master in physics, given that it is my job. But what for, just to prove that I am able to fulfil some formal criteria for being a physicist? And I doubt that I would become a physicist in my heart, so it could turn out to be very hard to finish a master degree. But I still like to discuss physics, and I would like to be able to judge certain physical fashions.

The code of silence

What could I have said about the Lightwave Research Laboratory at Columbia University in New York? Nothing, it would have just been name dropping. What could I have said about Information ist Energie by Lienhard Pagel? Something, it might have given the impression of name dropping, and it is the reason for this silent section:

This book uses QM in normal handwaving mode. By information, the author means his definition of dynamic information, which is something like information(flow) per time. Higher energy is normally associated with shorter times. What is good about this book is that it really discusses the time dynamics, which is otherwise often neglected.

What could I have said about quantum field theory? I could have talked about my motivations to learn more about it. But most of those were pointless, because I expect silence, no matter how much I have learned. But Sean Carroll gave me a good reason to learn more about QFT with his lecture Extracting the Universe from the Wave Function. I started with Quantum Field Theory for the Gifted Amateur and the many comments in Quantum Field Theory I, Lecture Notes (HS14+) and Quantum Field Theory II, Lecture Notes (FS17). However, then I read An Interpretive Introduction to Quantum Field Theory cover to cover. It was easy to read, with a good mix of technical details, explanations, interpretations, and philosophical clarification.

… much of the interpretive work Teller undertakes is to understand the relationship and possible differences between quantum field-theory — i.e., QFT as quantization of classical fields — and quantum-field theory — i.e., a field theory of ‘quanta’ which lack radical individuation, or as Teller says, “primitive thisness.”

Not sure when I will continue to work through the more technical material. At the moment, I read How is Quantum Field Theory Possible?, just to see how far I will come, before it gets too boring or too difficult.

A conversation about the Copenhagen interpretation

I had an email conversation with BT in March 2017, where he praised Deutsch’s version of MWI, and I defended Heisenberg’s version of Copenhagen:
28 Feb 2017 BT: purpose of science is explanation
9 Mar 2017 TK: what is Copenhagen
13 Mar 2017 BT: Copenhagen is Bohr
20 Mar 2017 TK: historical defence of Copenhagen
28 Mar 2017 TK: QM as general probability

During that conversation, I did some historical investigations:

When David Bohm proposed his new interpretation in 1952, the term “Copenhagen interpretation” didn’t exist yet, he had to talk of the “usual physical interpretation of quantum theory” instead. The idea that the “Copenhagen interpretation” should be the common core of the beliefs by Bohr and Heisenberg probably goes back to Henry P. Stapp in 1972. Before, it was just another way to refer to the orthodox interpretation(s) taught by (good) text books. Heisenberg is the unfortunate soul who coined the term Copenhagen interpretation (in the 50s).

Here is how I see Heisenberg’s position, independent of whether Bohr agrees:

Max Born and Werner Heisenberg claimed in 1927 that quantum mechanics is a closed theory (abgeschlossene Theorie), just like Newton’s mechanics is a closed theory. And by this they meant that its description of reality is complete and accurate, whenever it is applicable. The limits of applicability are not part of the closed theories themselves, according to Heisenberg.

The conversation started, when I quoted Heisenberg’s own explanation:

… what one may call metaphysical realism. The world, i.e., the extended things, ‘exist’. This is to be distinguished from practical realism, and the different forms of realism may be described as follows: We ‘objectivate’ a statement if we claim that its content does not depend on the conditions under which it can be verified. Practical realism assumes that there are statements that can be objectivated and that in fact the largest part of our experience in daily life consists of such statements. Dogmatic realism claims that there are no statements concerning the material world that cannot be objectivated. Practical realism has always been and will always be an essential part of natural science. Dogmatic realism, however, …

and BT replied:

That’s all a bit technical for me! I can’t tell whether he is in favour of “explanations” in the simple(-minded?) Deutsch sense or not.

After setting the passage in context, I got very concrete:

But I disagree that the quoted passage is technical. If he adheres to this passage, then Heisenberg cannot claim that Schrödinger’s cat would be both alive and dead, or that the moon would not be there if nobody watches.

Others, like Christopher A. Fuchs and Asher Peres in Quantum Theory Needs No Interpretation”, are apparently less sure whether (neo-Copenhagen) quantum theory is so clear about that fact. Hence they try to weasel out by claiming: “If Erwin has performed no observation, then there is no reason he cannot reverse Cathy’s digestion and memories. Of course, for that he would need complete control of all the microscopic degrees of freedom of Cathy and her laboratory, but that is a practical problem, not a fundamental one.”

This is non-sense, because the description of the experiment given previously was complete enough to rule out any possibility for Erwin to reverse the situation. Note the relevance of “… a consistent interpretation of QM as applied to what we do in a physical laboratory and how practitioners experience QM in that context.” If Erwin had access to a time machine enabling him to realistically reverse the situation, then it might turn out that Cathy and Erwin indeed lived multiple times through both situations (and experienced real macroscopic superpositions), as depicted in movies like “Back to the Future”.

According to Jan Faye in the SEP article on the Copenhagen Interpretation, even Bohr does not disagree with that conclusion: “Thus, Schrödinger’s Cat did not pose any riddle to Bohr. The cat would be dead or alive long before we open the box to find out.” However, I doubt that Bohr really said this, or replied to the content of Schrödinger’s article in any other direct way. (I read that he complained to Schrödinger about that article for indirectly supporting Einstein in his crusade against QM.)

Questions about Schrödinger’s famous article

At the end of this article on Erwin Schrödinger there are links to interesting papers by Schrödinger. After reading “Die gegenwärtige Situation in der Quantenmechanik” (“The Present Status of Quantum Mechanics (“Cat” Paper)”), I had several questions:

  1. Did Schrödinger know the density matrix formalism? He could have formulated some points he was making more explicit by using it.
  2. Did Bohr ever admit that the example of the cat was spot on, and that of course the cat was either dead or alive, even if nobody cared to check?
  3. Did Schrödinger know about Fermions and Bosons? The last paragraph talks about how the relativistic theory would make the foregoing discussion invalid.

When I first read Schrödinger’s article in September 2014, I didn’t knew much about quantum mechanics or it history. But I had the hope that my instrumentalist ideas related to the density matrix were not too far off from the orthodox interpretation. Today, I know that Carl Friedrich von Weizsäcker denied that the Copenhagen interpretation asserted: “What cannot be observed does not exist.” Instead, it follows the principle:

What is observed certainly exists; about what is not observed we are still free to make suitable assumptions. We use that freedom to avoid paradoxes.

The interesting thing is that Schrödinger’s article is not really about the cat, and not really about attacking quantum mechanics. Instead, Schrödinger tries to explain his own understanding of it. But Bohr interpreted the article as an attack on his interpretation.

  1. Schrödinger at least knew the consequences that follow from the density matrix formalism, so I guess he knew it. Not sure why he does not use it, maybe it was not yet popular among physicists back then.
  2. I doubt Bohr ever admitted that Schrödinger was right.
  3. Schrödinger knew about Fermions and Bosons. The basic connection between Fermi–Dirac statistics, Bose–Einstein statistics and the spin was already known since 1926. But the first derivations of the spin-statistics theorem from relativistic quantum field theory by Markus Fierz and later by Wolfgang Pauli are from 1939.

However, I guess Schrödinger wrote that last paragraph not just because Fermions and Bosons were challenging to fit into his scheme of explanation. He also wanted to stress that relativistic quantum mechanics was not yet fully worked out, so that he could conclude: “Perhaps, the simple procedure of the nonrelativistic theory is only a convenient calculational trick, but it has at the moment attained a tremendous influence over our basic view of Nature.”

While talking about famous articles by Schrödinger, I want to quote a passage from the beginning of “Are There Quantum Jumps? Part I”:

Along with this disregard for historical linkage there is a tendency to forget that all science is bound up with human culture in general, and that scientific findings, even those which at the moment appear the most advanced and esoteric and difficult to grasp, are meaningless outside their cultural context. A theoretical science, unaware that those of its constructs considered relevant and momentous are destined eventually to be framed in concepts and words that have a grip on the educated community and become part and parcel of the general world picture – a theoretical science, I say, where this is forgotten, and where the initiated continue musing to each other in terms that are, at best, understood by a small group of close fellow travellers, will necessarily be cut off from the rest of cultural mankind; in the long run it is bound to atrophy and ossify, however virulently esoteric chat may continue within its joyfully isolated groups of experts.

I want to quote a passage from the beginning of “Are There Quantum Jumps? Part II”:

Even if this shorthand were admissible for the microevent, we have to keep in mind that physicists are not the only people interested in physics and in the outcome of theoretical research in physics. Those results that are tenable and will survive are destined, eventually, to be absorbed into our world picture and to become part and parcel thereof. A great many of our educated contemporaries, not equipped with the mathematical apparatus to follow our more technical deliveries, are yet deeply concerned with many general questions; one of the most stirring among them certainly is whether actually natura facit saltus or no. Whatever abbreviated language we physicists may find convenient to use among ourselves, we ought to be aware of the dilemmas that justly and duly interest others; we must be careful not to veil or distort them by indulging in loose speech


Why did I wrote this post? Probably because I wanted to share my email conversation about the Copenhagen interpretation. And because I finally managed to read a technical book about quantum field theory cover to cover. However, that part already touches the code of silence, and I really wrote two of the emails (the ones were silence would not hurt me too bad) for which I expected silence (and really got silence as expected). But it was a start, and I am confident that I will finish more of the small stuff in the near future, even if the silence will hurt me more. Writing about Schrödinger was just bonus, or maybe I wanted to avoid having to write about physics again in the near future. But I could do a post about my thoughts on probability, at least that would be related to logic.

Posted in physics | 1 Comment

ALogTime, LogCFL, and threshold circuits: dreams of fast solutions

Our protagonists are the following (DLogTime-) uniform circuit classes

Interesting things can be said about those classes. Things like Barrington’s theorem (NC1), closure under complementation by inductive counting (SAC1), or circuits for division and iterated multiplication (TC0). Things like TC0 recognises the Dyck language with two types of parentheses, NC1 recognises visibly pushdown languages, or SAC1 recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space).

But how are those classes related to dreams about speed? There are sometimes situations where both fixed width integers and floating point numbers are problematic, but much faster than better alternatives because of explicit hardware support. The dream is to allow arbitrary alternative number formats getting sufficient hardware support to remain reasonable alternatives to the number formats supported explicitly by current hardware.

ALogTime – fast solutions for predictable stack height

Considering how parameter passing to subroutines and a huge part of memory management in mainstream computer languages is stack based, an obvious and natural variation is to restrict the unbounded memory of a Turing machine to be a stack. With those words, I justified copying a part of my answer to my own question yet another time. During my research for that answer, I also learned about Nested Words and Visibly Pushdown Languages and that they can be recognized in LOGSPACE. Recently, I read

The Boolean formula value problem is in ALOGTIME (25 pages)
by S Buss – Jan 1987

I immediately suspected that Visibly Pushdown Languages can also be recognized in ALogTime, and that recognition in LOGSPACE is just a simple corollary. Indeed it is:

Complexity of Input-Driven Pushdown Automata (21 pages)
by A Okhotin, K Salomaa – 2014

The remarks following the proof of theorem 8 (with the beautiful Figure 3 shown above) say: “Finally, Dymond [12] applied the method of Buss [7] to implement the Mehlhorn–Rytter algorithm [36] in NC1“. Dymond’s paper is short and sweet:

Input-driven languages are in log n depth (4 pages)
by P Dymond – 1988

Polynomial size log depth circuits: Between NC1 and AC1 by Meena Mahajan gives a different summary of Dymond’s paper:

Dymond gave a nice construction [14] showing that they are in fact in NC1. His approach is generic and works not just for VPAs but for any PDA satisfying the following:

  1. no ϵ moves,
  2. an accepting run should end with an empty stack
  3. the height of the pushdown, after processing i letters of the input, should be computable in NC1. If there is more than one run (nondeterministic PDA), and if the height profiles across different runs are different, then the heights computed should be consistent with a single run. Furthermore, if there is an accepting run, then the heights computed should be consistent with some accepting run.

For such PDA, Dymond transforms the problem of recognition to an instance of formula value problem, and then invokes Buss’s ALogTime algorithm [8] for it.

This post links extensively to external material. It is clear that the reader of this post will not read most of it. The more interesting question is whether I have read that material, or at least plan to read it. The initial plan of this post was to indicate which material I have already read and understood, and which material I still plan to read. But this is unpractical and boring for the reader. What I want to say is that I have read Buss’ paper in principle, but didn’t yet work thoroughly enough through section 7, and I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive? In some of the other papers quoted above, I only read the parts which interested me. And here is another paper by Buss et al which I still plan to read:

An optimal parallel algorithm for formula evaluation (42 pages)
by S Buss, S Cook, A Gupta, V Ramachandran – Nov 20, 1990

Parallel and distributed computing

The main conclusion of the last section is that ALogTime is quite powerful, because it can recognize languages accepted by input driven stack automata. I wondered whether this result is taught in typical introductions to parallel computing. Here is my sample:

Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (104 pages)
by Uzi Vishkin – October 12, 2010

Parallel Algorithms (64 pages)
by Guy E. Blelloch and Bruce M. Maggs – 1996

Limits to Parallel Computation: P-Completeness theory (327 pages)
by R Greenlaw, H Hoover, W Ruzzo – 1995

Parallel Algorithms (347 pages)
by H Casanova, A Legrand, Y Robert – 2008

My conclusion is that it is not taught, at least not explicitly. And I already took care that they would talk about ALogTime by saying (uniform) NC1. Chapter 10 where Uzi Vishkin talks about tree contraction is related, and NC1 is discussed in some of the texts. (It should be clear that I didn’t really try to read those texts, and just browsed through them.)

Some words on LogCFL and threshold circuits

I looked for hardware support for LogCFL, as seems possible for NC1

A more promising approach could be to look for problems complete under appropriate many-one reductions for these classes. The word problem for \bar{A}_5 (or \bar{S}_5) would be such a problem for NC1. The solution of that word problem is nicely scalable, since one can subdivide the word into smaller parts, and thereby generate a smaller word on which the same procedure can be applied again.

That many-one reduction to that word problem is Barrington’s theorem proved in

Bounded-width polynomial-size branching programs recognize exactly those languages in NC1 (15 pages)
by David A.Barrington – 1987

One thing which could still go wrong when trying to use TC1, NC1, and SAC1 for actual efficient computer architectures is that those fast and nice many-one reductions are studied to decision problems, but in the end we are much more interested in computing solutions to function problems. This came up when I described my plans to a friend. But I am not the first to dream about hardware support for those theoretical fast solutions:

Explicit Multi-Threading (XMT): A PRAM-On-Chip Vision
A Desktop Supercomputer (4 pages)
by Uzi Vishkin – 2009

Back to LogCFL: for “closure under complementation by inductive counting,” see:

Two applications of inductive counting for complementation problems (20 pages)
by A Borodin, S Cook, P Dymond, W Ruzzo, M Tompa – 1989

and for: “SAC1 recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space),” note that LogCFL = AuxPDA(log n, nO(1)) = SAC1 see here. This means the class of log n auxiliary pushdown automaton which run in non-deterministic polynomial time. (Probably “non-deterministic” can also be replaced by “deterministic”. The direct meaning of LogCFL is the class of problems which can be many-one reduced to CFL in deterministic logspace.) I also guess that LogCFL can be solved by a dynamic programming algorithm, since both NL and CFL allow such a solution. (Probably an easy exercise. I just wanted to have LogCFL in this post, since it is part of my dream, and allows to solve interesting problems. And it seems close to ALogTime from a hardware perspective.)

Now to threshold circuits: “TC0 recognises the Dyck language with two types of parentheses”. I don’t have a good primary reference at the moment. But this shows that threshold circuits are surprisingly close to ALogTime with respect to their capabilities. And “circuits for division and iterated multiplication,” has some references in the wikipedia article, but I liked the following lecture notes:

Lecture 8: Multiplication ∈ TC0 (4 pages)
by Kristoffer Arnsfelt Hansen – 2008

The paper by P Dymond – 1988 ends with the following speculation:

This result shows that a large class of cfl’s is in log depths, and it is natural to inquire about the possibility that all deterministic cfl’s, which are known to be in log space, might also be in log depth. One dcfl in log space which is not currently known to be recognizable in log depth is the two-sided Dyck language on four letters [5].

Word Problems Solvable in Logspace
by RJ Lipton, Y Zalcstein – 1977

It is an interesting question whether ALogTime and LOGSPACE are different for naturally occurring problems. Including LogCFL in the hardware acceleration tries to evade this question. But there are programming approaches specifically focusing on LOGSPACE, since it may model an important aspect of practical algorithms for really big data:

Computation-by-Interaction for Structuring Low-Level Computation
by U Schöpp – 2014

Understanding whether ALogTime != PH is hard to prove

I was wondering about how to define function classes for ALogTime: How to transform words over an input alphabet to words over an output alphabet. I wondered whether I should write a mail to Emil Jeřábek. Maybe I only wanted to have a reason to write him, but I didn’t. Instead, I googled my way until I understood even much weaker function classes. Then a naive idea for proving ALogTime != PH crossed my mind: Prove some problem complete for ALogTime under some extremely weak many-one reduction. If that would work, it would be sort of a “fool’s mate”, since no genuinely new idea is needed, just a careful application of well known techniques. So when Scott Aaronson said

On reflection, though, Mubos has a point: all of us, including me, should keep an open mind. Maybe P≠NP (or P=NP!) is vastly easier to prove than most experts think, and is susceptible to a “fool’s mate.”

in HTTPS / Kurtz / eclipse / Charlottesville / Blum / P vs. NP, I couldn’t help but ask him about whether my naive idea has any chance of proving ALogTime != PH, even if just for testing whether he really keeps such an open mind. His answer was: “I don’t know enough to give you detailed feedback about your specific approach”. So he keeps an open mind, even if Stephen A. Cook is slightly more open with respect to the possibility that separating P from weaker classes like NL or ALogTime could be within reach of current techniques:

The Tree Evaluation Problem: Towards Separating P from NL (8 pages)
by Stephen A. Cook – 2014

This post has many links to papers like the one above. As if I could justify myself for believing that ALogTime can be separated from P by current techniques. But the links show at least that investing time into this topic is an interesting learning experience. The following sections Weak reductions and isomorphism theorems, and DAG automata link to material which I read (or still plan to read) in order to make progress on ALogTime != P.

Weak reductions and isomorphism theorems

DLogTime many-one reduction already look quite weak. (I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive?) One could go weaker by looking at DLogTime-uniform NC0 reduction or even DLogTime-uniform projections (each output symbol/bit depends at most on a single input symbol/bit). But recently, people proposed even weaker uniformity notions than DLogTime:

A circuit uniformity sharper than DLogTime (19 pages)
by G Bonfante, V Mogbil – May 25, 2012

This proposed rational-uniformity can be summarized as as writing n in binary and applying a finite state transducer to that input for computing the circuit description. I haven’t read the full paper yet. Another recently proposed approach uses descriptive complexity for defining very weak uniformity notions:

The lower reaches of circuit uniformity (16 pages)
by C Behle, A Krebs, K-J Lange, P McKenzie – May 12, 2012

I didn’t understand too much of that paper yet. I tried reading revision 0 (didn’t notice that it wasn’t the latest revision), but revision 1 might turn out to be easier to read: “Results are stated in more clear way, and overall readability was improved.”

Another line of attack for making the many-one reductions weaker is to only allow isomorphisms. It may seem counter-intuitive first that this should be possible, but the Schröder-Bernstein theorem can sometimes be adapted. This is described in two classic papers:

Reductions in Circuit Complexity: An Isomorphism Theorem and a Gap Theorem (17 pages)
by M Agrawala, E Allender, S Rudich – 1998

Reducing the Complexity of Reductions (9 pages or 22 pages)
by M Agrawala, E Allender, R Impagliazzo, T Pitassi, S Rudich – 2001

I didn’t read those papers yet, but Algebra, Logic and Complexity in Celebration of Eric Allender and Mike Saks by Neil Immerman gives a nice survey of those results, which I read for a start.

DAG automata

One important property of input driven stack automata is that the dependency graphs will be planar DAGs. For general polynomial time algorithms, the dependency graphs will be general non-planar DAGs (due to variable binding with an unbounded number of variables). In order to learn what is already known about this, I googled “DAG automata”. The first three hits are awesome:

DAG Automata – Variants, Languages and Properties (48 pages)
by J Blum – May 25, 2015

How Useful are Dag Automata? (26 pages)
by S Anantharaman, P Narendran, M Rusinowitch – ‎2004

DAG Automata for Meaning Representation (12 pages)
by F Drewes – London, UK, July 13–14, 2017

The first and third hit are related, Frank Drewes was the adviser of Johannes Blum. I read both, and they were really worth it. (I only browsed the paper from 2004, just to see that the newer papers really achieved a breakthrough.) They reduce determining which rules are useless to a question about Petri nets, and then provide a solution for that problem: “A Petri net is structurally cyclic if every configuration is reachable from itself in one or more steps. We show that structural cyclicity is decidable in deterministic polynomial time.”

Structurally Cyclic Petri Nets (9 pages)
by F Drewes, J Leroux – ‎2015

On understanding a problem, without solving it

One can understand a problem without solving it. So one can understand why something is a problem and why a solution would be challenging, without being able to come up with a solution. I don’t know whether the best experts in computer science understand P != NP in this sense. I don’t understand it yet, but I have the deeply rooted belief that it can be understood.

And understanding for me also means to be able to read work that others have done, and to see and feel the connection to the questions which interest me.


This post is too long, but I didn’t really care this time. Since the reader is expected to read at least some of the linked material, it is implicitly even much longer than its nominal length (which is too long) indicates. The initial goal was to write it in order to be able to focus again on more urgent topics. But of course, this didn’t really worked out that way. Still, this post was always intended to be mainly a collection of links to papers I already read or still plan to read in order make progress on understanding ALogTime, LogCFL, and threshold circuits. And I also knew that I would try to justify myself for believing that this topic is worth studying.

Do I believe that the dreams of fast solutions can come true? I don’t know. Uzi Vishkin seems to have made a much more serious effort than I ever will, and even argued his case from an economic perspective:

Alas, the software spiral is now broken: (a) nobody is building hardware that provides improved performance on the old serial software base; (b) there is no broad parallel computing application software base for which hardware vendors are committed to improve performance; and (c) no agreed-upon architecture currently allows application programmers to build such software base for the future

Could proof of work problems for crytocurrency mining provide the incentive to make dreams come true? Who knows, money is important, after all. But personally, I am rather a believer in “constant dripping wears away the stone,” so I think in the long run, it is quite possible that things progress. Probably in ways I cannot even imagine today.

Posted in automata | 5 Comments

A subset interpretation (with context morphisms) of the sequent calculus for predicate logic

The previous two posts used the sequent calculus with a subset interpretation:

We work with sequents A_{1},\ldots,A_{r}\vdash B_{1},\ldots,B_{s}, and interpret the propositions A_i (and B_j) as subsets of some universe set X. We interpret the sequent itself as [A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}].

While writing the previous post, there was the temptation to introduce the sequent calculus rules for existential and universal quantification. But the subset interpretation with a fixed universe set X didn’t seem helpful for free variables and quantification. But what if we allow different universe sets for different sequents? One way would be to augment the sequent rules to change the universe sets in a well defined way. However, the cut rule and two of the four quantification rules (those that replace an arbitrary term by a quantified variable) are a bit problematic. For them, the universe can sometimes be shrunk down, since a formula or a term was eliminated. Therefore, we just add two separate rules for changing the universe, and apply the sequent rules themselves only with respect to a fixed universe set.

This is the last planed post on the sequent calculus for the moment. (There might be an additional post in this series, extending the subset interpretation to natural deduction.) Since this post will remain vague and unsatisfactory in many ways, a list of related online materials on the sequent calculus seems appropriate: I often consulted the wikipedia article while writing these posts. (I never consulted Sequent Calculus Primer, but it looks nice and has exercises.) This MathOverflow question on introductions to sequent calculus suggests the chapter on Sequent Calculus from Frank Pfenning’s manuscript on Automated Theorem Proving, which I really liked. It also suggests Proofs and Types by Jean-Yves Girard, which was my first introduction to the sequent calculus. It left me unsatisfied, and I agree with Charles Stewart:

A warning: Girard’s style is a little slippery, and it is common for students to say they have read it, who turn out to have absorbed the opinions but little of the results.

I am still slightly ashamed that one of my first questions – when I met a group of academic logicians (with a focus on proof assistants) – was whether they found Girard easy to understand. Girard claims that the sequent calculus is generally ignored by computer scientists, an exception being Logic For Computer Science Foundations of Automatic Theorem Proving by Jean H. Gallier. A nice text, but who has time to read those 500+ pages? This is one reason why I like An Introduction to Proof Theory by Samuel R. Buss.

The sequent calculus rules for predicate logic

We are interested in normal first order predicate logic with predicates and functions. Here a signature with a finite number of predicate and function symbols is given, which include the arity for each of the symbols. The sequent calculus rules for quantification are:

Left quantification rules Right quantification rules
\begin{array}{c} \Gamma,A[t/x]\vdash\Delta\\ \hline \Gamma,\forall xA\vdash \Delta \end{array}(\forall L) \begin{array}{c} \Gamma\vdash A[t/x],\Delta\\ \hline \Gamma\vdash \exists xA,\Delta \end{array}(\exists R)
\begin{array}{c} \Gamma,A[y/x]\vdash\Delta\\ \hline \Gamma,\exists xA\vdash \Delta \end{array}(\exists L) \begin{array}{c} \Gamma\vdash A[y/x],\Delta\\ \hline \Gamma\vdash \forall xA,\Delta \end{array}(\forall R)

Those rules look simple, but what do they mean? What is the difference between t and y? And how to interpret a sequent with formulas containing free variables? First, y is a free variable not occurring in \Gamma, \Delta, or A. And A[y/x] means that each free occurrence of x in A is replaced by y. Next, t is an arbitrary term, with no restrictions at all on the variables occurring in it. A term is just any reasonable combination of function symbols (from the signature) and variables, like f(x,g(f(y,x),y)) where both f and g are binary function symbols.

There are many possible interpretations of predicate logic, but that answer just avoids answering the question. The interpretation of a formula or sequent with free variables can be controversial. My previous effort at a subset based interpretations of predicate logic compatible with non-classical logic left out equality, functions and constants. Even so we are talking only about classical logic here, the goal of the following interpretation is to be also applicable more generally. That interpretation will follow naturally once we clarify which models we want to allow.

Models and interpretations of predicate calculus

For the typical interpretations, a model consists of a set U (which is the universe where the variables live), a subset of U^k for each k-ary predicate symbol, and a function f:U^k \to U for each k-ary function symbol. A straightforward interpretation in that context is that a formula with k free variables is a subset of U^k, and a formula without free variables (i.e. a sentence) is either true or false. And for any set of k variables, we can interpret any formula whose free variables are included in that set as a subset of U^k. A sequent in this interpretation is true, iff for the subsets formed by the formulas interpreted relative to the set of all free variables in the sequent, the intersection of the subsets on the left side is contained in the union of the subsets on the right side.

This straightforward interpretation above is already a subset interpretation, but we want a slightly more general subset interpretation. For propositional logic, the possible models were given by a set X and a subset of X for each propositional variable A_i. In predicate logic, the 0-ary predicates take the role of the propositional variables, so we want them to be subsets of X. For our generalised interpretation, a model consists of a set X (whose subsets are the truth values) and a set U (which is the universe where the variables live), a subset of X \times U^k for each k-ary predicate symbol, and a function f:X \times U^k \to U for each k-ary function symbol.

The way X will enter into the model is to have some implicit context \xi\in X such that formulas like P(x, f(y, z), z) in the formal syntax become P(\xi, x, f(\xi, y, z), z) in the model. So for any set of k free variables, we can interpret any formula whose free variables are included in that set as a subset of X\times U^k. For a given set of variables and a sequent for which the free variables of the formulas are contained in that subset, the a sequent is true, iff for the subsets formed by the formulas interpreted relative that set of variables, the intersection of the subsets on the left side is contained in the union of the subsets on the right side.

(If we have a formula like x+y = y+x, we would like to interpret it as a subset of U^2, so we normally use X=\{()\} and then just omit it. Here the element () is called unit, the set \{()\} represents truth, and the empty set \emptyset represents falsehood.)

Context morphisms give life to the subset interpretation

If we have (a sequent of) subsets A_i over X, then a (partial) function p : Y \to X from Y to X gives us (a sequent of) subsets over Y by taking inverse images p^{-1}(A_i) of the subsets. We have

  • A \subseteq B \Rightarrow p^{-1}(A) \subseteq p^{-1}(B)
  • p^{-1}(A\cap B)=p^{-1}(A)\cap p^{-1}(B)
  • p^{-1}(A\cup B)=p^{-1}(A)\cup p^{-1}(B)

and hence

[A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]
p^{-1}([A_{1}])\cap\ldots\cap p^{-1}([A_{r}])\ \subseteq\ p^{-1}([B_{1}])\cup\ldots\cup p^{-1}([B_{s}])

The direct translation into the language of sequences reads

\begin{array}{c} A_{1}(\xi),\ldots,A_{r}(\xi)\vdash B_{1}(\xi),\ldots,B_{s}(\xi)\\ \hline A_{1}(p(\nu)),\ldots,A_{r}(p(\nu))\vdash B_{1}(p(\nu)),\ldots,B_{s}(p(\nu)) \end{array}

This just tells us that global substitution is unproblematic. (But r\geq 1 is required in case p is a partial function, which is easily missed.) However, the more typical function p for our purpose is similar to p(u,v,w)=(u,v), i.e. a function from Y=U^3 to X=U^2. It is typical, since the expressions grow in complexity for most rules of sequent calculus. Here is an example:

\begin{array}{c} \begin{array}{l}A\vdash A:\{a\}\\ \hline A\vdash A:\{(a,b)\} \end{array} \qquad \begin{array}{l}B\vdash B:\{b\}\\ \hline B\vdash B:\{(a,b)\}\end{array}\\ \hline A,B\vdash A\land B:\{(a,b)\}\end{array}

This example also shows, how we can separately change the universe set, and apply the sequent rules only with respect to a fixed universe set. After applying the (\exists L) or (\forall R) rule, we can shrink the universe set. To illustrate, let us first apply the (\forall L) and (\exists R) rule, where we cannot always shrink the universe set:

\begin{array}{r} P(x) \vdash P(x) : X\times U\\ \hline \forall xP(x) \vdash P(x) : X\times U \end{array}

Here we cannot shrink the universe set.

\begin{array}{l} \forall xP(x) \vdash P(x) : X\times U\\ \hline \forall xP(x) \vdash \exists x P(x) : X\times U \end{array}

Now we can shrink the universe set, but only after checking that the sequent no longer depends on x. But when we apply (\forall R), we can always shrink the universe set:

\begin{array}{l} \forall xP(x) \vdash P(x) : X\times U\\ \hline \forall xP(x) \vdash \forall x P(x) : X\times U\\ \hline \forall xP(x) \vdash \forall x P(x) : X \end{array}

How can we justify this shrinking in terms of images or inverse images? The inverse image under p:X \to X\times U, \xi \to (\xi, c(\xi)) for an arbitrary function (constant) c(\xi) could be a justification. But we don’t really write [\forall xP(x)](c) \vdash [\forall x P(x)](c), and c is arbitrary and irrelevant. So the image under F:X\times U \to X, (\xi, u) \to \xi might be a nicer justification. But we only have

  • A \subseteq B \Rightarrow F(A) \subseteq F(B)
  • F(A\cap B)\subseteq F(A)\cap F(B)
  • F(A\cup B)=F(A)\cup F(B)

However, if we assume F^{-1}(F([A_i]))=[A_i], then F([A_i]\cap\dots\cap [A_r])=F([A_i])\cap\dots\cap F([A_r]) and hence

[A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]
F([A_{1}])\cap\ldots\cap F([A_{r}])\ \subseteq\ F([B_{1}])\cup\ldots\cup F([B_{s}])

Doubts about context morphisms and substitutions

Let us apply the above to propositional logic. Surely, from A,B\vdash A\land B we can deduce C\lor D,C\to D\vdash (C\lor D)\land (C\to D). But we cannot justify it from A,B\vdash A\land B:\{(a,b)\} in terms of inverse images. Can we justify it from A,B\vdash A\land B:X_a\times X_b? Yes, take the map p:X_c\times X_d\to X_a\times X_b with (c,d) \to ([C\lor D](c,d), [C\to D](c,d)). Here X_c and X_d can be arbitrary and X_a=X_b=\{0,1\}. Note that we only need |X_a|, |X_b| \geq 2.

How justified are those substitutions into the expressions? Here, we used

[A\land B]([C\lor D](c,d), [C\to D](c,d))=[(C\lor D)\land (C\to D)](c,d)

Can we justify this? Can such substitutions go wrong? Yes, if p is a partial function:

\begin{array}{c} \top\vdash B(x)\\ \hline [\top](p(y))\vdash [B(\cdot)](p(y))\\ \hline [\top](y)\vdash [B(p(\cdot))](y)\\ \hline \top\vdash B(p(y)) \end{array}

Here [\top](p(y)) = [\top](y) fails, but it would work if p were a total function. We could have omitted \top, but then it would fail because of r=0. That was actually the point were I noticed that the restriction r\geq 1 is required for partial functions.

The substitutions are justified, because they work for each operation individually:

  • [\bot](p(y)) = [\bot](y)
  • [A\land B](p(y)) = [A](p(y))\land [B](p(y))
  • [A\lor B](p(y)) = [A](p(y))\lor [B](p(y))
  • [A-B](p(y)) = [A](p(y))-[B](p(y))
  • [A\oplus B](p(y)) = [A](p(y))\oplus [B](p(y))
  • [\forall z A](p(y)) = \forall z [A](p(y))
  • [\exists z A](p(y)) = \exists z [A](p(y))
  • [A(\cdot)](p(y)) = [A(p(\cdot))](y)

Treating equality without special rules

Equality can be given by a binary predicate which is a congruence relation for the functions, and compatible with the other predicates. There are different ways to treat equality, and it is important for most interesting applications of predicate logic. A canonical approach is to write the axioms on the left side of the sequent, and use normal deductions. Concrete examples for the required axioms were already given in the last post in the section “Axiomatic theories in predicate logic without falsehood” for the theory of groups and in the section “Axiomatic theories in predicate logic without falsehood” for Robinson arithmetic.

Part of the reason for presenting a subset interpretation for predicate logic was to better understand how predicate logic could work in case of non-classical logics. There are theories like Heyting arithmetic (and Gentzen’s own work), which seem to fit nicely with the interpretation above. But there are also intuitionistic dependent type theories with special rules and equality types, whose possible interpretations remain unclear to me, despite the nice subset interpretation explained above. Maybe Per Martin Löf is right: In the end, everybody must understand for himself (p.166)


This post was hard for me to write. This is the classical problem when explaining something obvious (predicate calculus). You have to make a lot of words for something which seems obvious, but where the language is missing to properly write down what is going on behind the scenes. I didn’t solve that language problem, but I came up with a natural example of how the obvious thing can fail to be true. That might help the reader to understand for himself.

The content above already became clear to me while writing the previous post. But only after this current post was nearly finished did I notice the restriction r\geq 1 in case p is a partial function. Which is funny, after I clarified the notion of context morphism which makes sense of that logic without truth from the first post (in this series), I find out that there is an additional restriction that the antecedent should never be empty. My first reaction was that I don’t like this. Maybe I should look for a transformation which turns this logic about partial functions into the logic without negation or falsehood. But my second reaction was that it could make sense, perhaps any logical deduction needs assumptions.

Still the question remains whether there is a reasonable transformation from the logic without truth (and partial functions) to the logic without negation and falsehood. Probably that question will be easy to answer. Somehow one has to take complements, and hope that the structure of the predicate calculus remains intact. It seems to work, but I still have trouble to intuitively understand it. And I want to finish this post now.

Posted in logic, partial functions | Tagged , | Leave a comment