Logic without truth

Pi is wrong! But so what? It is neither new, nor complicated enough to count as real math! And suggestions that 2\pi i or \pi/2 might be even better show that it not clear-cut either.

I recently invested sufficient energy into some logical questions to make real progress. But while telling friends and fellow logicians about it, I realized how irrelevant all my results and conclusions will be. I will have to publish them in an appropriate journal nevertheless, since they continue David Ellerman’s work on partition logic. Not publishing them would be dismissive towards the value of David Ellerman’s work, and he really invested lots of effort into it, and believes in its value. I won’t talk about those results here, since I don’t know how it would impact my ability to publish them.

Let’s have some fun instead, stay extremely close to classical logic, and still demonstrate a logic without truth. And let’s get back to Gerhard Gentzen.

Partial truths

I am fond of partial functions. For a partial function p:X\to Y, we have

  • p^{-1}(A\cap B)=p^{-1}(A)\cap p^{-1}(B)
  • p^{-1}(A\cup B)=p^{-1}(A)\cup p^{-1}(B)
  • p^{-1}(A {}\setminus{} B)=p^{-1}(A) {}\setminus{} p^{-1}(B)
  • p^{-1}(A \Delta B)=p^{-1}(A) \Delta p^{-1}(B) where A \Delta B := (A {}\setminus{} B) \cup (B {}\setminus{} A)

But p^{-1}(Y)=X is only true if p is a total function. Especially p^{-1}(A^c)=p^{-1}(A)^c is only true (even for specific A) if p is total, since otherwise
p^{-1}(Y)=p^{-1}(A\cup A^c)=p^{-1}(A)\cup p^{-1}(A^c) \quad = \quad p^{-1}(A)\cup p^{-1}(A)^c=X
Let’s also prove one of the other claims:
x {}\in{} p^{-1}(A {}\setminus{} B)  \Leftrightarrow p(x) {}\in{} A {}\setminus{} B \Leftrightarrow p(x) {}\in{} A \land  p(x) {}\notin{} B \Leftrightarrow x {}\in{} p^{-1}(A) {}\setminus{} p^{-1}(B)

Note that A {}\setminus{} B = A \triangle (A \cap B). So the preserved operations are “and”, “or”, and “xor” if interpreted from a logical perspective. It would be nice if implication were preserved too. This seems hopeless, since A \to A is always true, and truth is not preserved. But we do have A \subseteq B \Rightarrow p^{-1}(A) \subseteq p^{-1}(B), which means that the external implication given by the order is preserved. So we should be able to turn this into a logic, where internal truth is not preserved under context morphisms.

Gerhard Gentzen’s sequent calculus

The sequent calculus is a proof calculus with significant practical and theoretical advantages compared to more obvious proof calculus systems. It works with sequents A_{1},\ldots,A_{r}\vdash B_{1},\ldots,B_{s}. The propositions A_i (and B_j) could be logical formulas like S(x)=S(y) \to x = y (the 4-th Peano axiom). They can also be interpreted as subsets of some universe set X, which is sufficient for understanding the basics of the sequent calculus. Then the sequent itself is interpreted as [A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}].

Left structural rules Right structural rules
\begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(WL) \begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(WR)
\begin{array}{c} \Gamma,A,A\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(CL) \begin{array}{c} \Gamma\vdash A,A,\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(CR)
\begin{array}{c} \Gamma_{1},A,B,\Gamma_{2}\vdash\Delta\\ \hline \Gamma_{1},B,A,\Gamma_{2}\vdash\Delta \end{array}(PL) \begin{array}{c} \Gamma\vdash\Delta_{1},A,B,\Delta_{2}\\ \hline \Gamma\vdash\Delta_{1},B,A,\Delta_{2} \end{array}(PR)

Here \Gamma,\Delta, ... stand for arbitrary finite sequences of propositions. The structural rules may be relatively boring. The following global rules are slightly more interesting

Axiom Cut
\begin{array}{c} \ \\ \hline A\vdash A \end{array}(I) \begin{array}{c} \Gamma\vdash\Delta,A\quad A,\Sigma\vdash\Pi\\ \hline \Gamma,\Sigma\vdash\Delta,\Pi \end{array}(Cut)

None of the rules up to now has used any logical constant or connective. They can be verified directly for the subset interpretation. The following logical rules can only be verified after the (intended) interpretation of the logical connectives has been fixed.

Left logical rules Right logical rules
\begin{array}{c} \Gamma,A,B\vdash\Delta\\ \hline \Gamma,A\land B\vdash\Delta \end{array}(\land L) \begin{array}{c} \Gamma\vdash A,B,\Delta\\ \hline \Gamma\vdash A\lor B,\Delta \end{array}(\lor R)
\begin{array}{c} \ \\ \hline \bot\vdash\Delta \end{array}(\bot L) \begin{array}{c} \ \\ \hline \Gamma\vdash\top \end{array}(\top R)
\begin{array}{c} \Gamma,A\vdash\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\lor B\vdash\Delta,\Pi \end{array}(\lor L) \begin{array}{c} \Gamma\vdash A,\Delta\quad \Sigma\vdash B,\Pi\\ \hline \Gamma,\Sigma\vdash A\land B,\Delta,\Pi \end{array}(\land R)
\begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\to B\vdash\Delta,\Pi \end{array}(\to L) \begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma\vdash A\to B,\Delta \end{array}(\to R)

One possible interpretation for these connectives in terms of subsets would be [\bot]:=\emptyset, [\top]:=X, [A\land B]:=[A]\cap [B], [A\lor B]:=[A]\cup [B], and [A\to B]:=[A]^c\cup [B].

But it may be more instructive to see an interpretation where one of the classical logical rules is violated. So let us use [A\to B]:=\mathrm{int}([A]^c\cup [B]) instead, where \mathrm{int} is the interior operator of some topological space. The propositions A_i (and B_j) are interpreted as open subsets in this case. The rule \begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma\vdash A\to B,\Delta \end{array}(\to R) is violated now, and has to be replaced by the rule \begin{array}{c} \Gamma,A\vdash B\\ \hline \Gamma\vdash A\to B \end{array}(\to R_J). This gives us the intuitionistic sequent calculus, which exactly characterizes the valid conclusions of intuitionistic logic.

To see that (\to R) is violated, let \Gamma=\top, A correspond to [A]=(0,\infty), B=\bot, and \Delta = A. Above the line we have \mathbb R \cap (0,\infty) \subseteq \emptyset \cup (0,\infty), which is true. Below the line we have \mathbb R \subseteq \mathrm{int}((-\infty,0])\cup (0,\infty), which is false.

An evil twin of sequent calculus

Note that implication satisfies C\land A \vdash B \Leftrightarrow C \vdash (A\to B) or rather [C]\cap [A] \subseteq [B] \Leftrightarrow [C] \subseteq [A\to B]. Let us replace implication by minus. Note that minus satisfies [A] \subseteq [B]\cup [C] \Leftrightarrow [A-B] \subseteq [C] with [A-B]:=[A]{}\setminus{}[B]. Then we get the following two rules instead of (\to L) and (\to R).

\begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma,A- B\vdash \Delta \end{array}(- L) \begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash \Pi\\ \hline \Gamma,\Sigma\vdash A-B,\Delta,\Pi \end{array}(- R)

Of course, we also remove \to and \top from the language together with the rule (\top R). This sequent calculus is still as sound and complete as the original sequent calculus. But we no longer reason about implication, but only about minus. Some sort of implication is still present in \vdash, but it is no longer mirrored internal in the language of the logic itself. So this is our logic without truth.

I don’t really know (or understand) whether this sort of context morphism has any sort of relevance, and whether that logic without truth occurs anywhere in the real world. Is there any relation to the fact that it is easier to deny the relevance or truth of a given conclusion than to prove that it is important and true? What I like about that logic is the asymmetry between implication and falsehood, because I wanted to find naturally occurring asymmetries in mathematical hierarchies and logic. Even for the results that I do want to publish, I have the same problem that I don’t really understand the relevance of the corresponding context morphisms, whether there even should be context morphisms, and whether my proposed context morphisms are the correct ones.


That post initially also contained a logic without falsehood, or rather a logic where falsehood is not used. But it started to get long, and this post is already long enough. Not sure whether this was really a good idea, since the explanation of the sequence calculus was also intended to better understand how such a logic with a reduced set of logical constants and connectives still maintains its main features. Maybe I will manage to create another blog post from the removed material. Or maybe nobody including myself cares anyway, as already indicated at the beginning. Or maybe I should better use my time to finish the paper about the results I wrote about at the beginning, and submit them to…

Posted in category theory, logic, partial functions | Tagged , , | 4 Comments

Learning category theory: a necessary evil?

The end of my last blog post (about isomorphism testing of reversible deterministic finite automata) explained how category theory gave me the idea that the simplified variant of my question about permutation group isomorphism should be easy to solve:

The idea to consider the case where the group is fixed came from reading an introduction to category theory, because those morphisms of left M-sets felt like cheating (or oversimplifying) to me. So I guessed that this case should be easy to solve, if I would only…

I hinted that this felt like cheating (or oversimplifying) to me, but I realized three month later that my original question could be reduced to the simplified variant by a brute force technique exploiting the fact that minimal generating sets of groups are very small. So it turned out that category theory was actually helpful and not cheating at all.

Why does a mathematician interested in logic, (semi-)lattice theory and universal algebra reads an introduction to category theory? Because it is a good readable (medium level) German book on a mathematical subject I’m currently interested in.


And it is written by Martin Brandenburg, who always helped people like me, when they got stuck on questions related to category theory. If the explanations and examples in the book are as good as his contributions on MathOverflow and the MathStack Exchange site, then I would be fluent in both theory and practice after finishing that book.

But why am I interested in category theory?

  • There are all those forgetful functors with left adjoints in universal algebra. And when doing semigroup theory, switching freely between monoids and semigroups by adjoining an identity is very common. Seems like a good idea to understand this.
  • I like inverse semigroups, which are closely related to partial bijections, and hence to partial functions. Mathematicians don’t like partial functions, but category theory at least admits that each function has a domain and a codomain. Some existing work on partial functions has been phrased in the language of category theory.
  • I have read basic introductions to category theory, but kept forgetting about adjoint functors. Then I worked two times thru a reasonable introduction to the adjoint functor theorem (by Chris Henderson), and hopefully settled this problem.
  • I would like to better understand the Ext and Tor functors in group cohomology, and diagram chasing like the snake lemma.
  • I like non-classical logic, including intuitionistic logic, linear logic and partition logic. My understanding of non-classical logic is mostly based on lattice theory. Sadly, lattice theory is a very confusing name, and it has only few followers. So if I translate my ideas about non-classical logic and lattice theory into the language of category theory, then I should be able to reach a more relevant audience.
  • I liked assembly language when I started to program, and set theory feels similar to assembly language. Category theory is one alternative to get around all the engrained prejudices about logic and model theory embedded into mainstream ZFC set theory.

Maybe I could find more reasons, but let’s talk about the content now

  • Category theorists seem to have a strange relation to calculations. The introduction to the adjoint functor theorem by Chris Henderson often has imbalanced parenthesis in the formulas (my printout has those marked on page 6, 7, 11, and 12), indicating that the formulas have at least some errors. Sometimes (like the =\phi_{C,D}(\phi_{C,D}^{-1}(f) near the bottom of page 7) I couldn’t figure out what the correct formula was supposed to be.
  • The book by Martin Brandenburg has fewer of those unnecessary errors, but it is still annoying if f\in \text{Hom}(A,C) is written instead of f\in \text{Hom}(A,B) when explaining the Yoneda construction on page 109, which is difficult for me even without such errors.
  • I actually had to return the book while reading chapter 5 on the Yoneda construction a second time. I bought my own copy of the book in the meantime, but haven’t resumed reading yet.
  • Category theorists are similar to physicists in not noticing the places where the computational complexity can explode. So they claim only the universal property would be important, and the concrete representation an unimportant detail. But identity testing for the free modular lattice on 4 generators is undecidable, it is NP-complete for the free Boolean algebra, and trivial for free abelian groups. The universal algebra people are much more aware of such niceties, something which can easily get lost when doing universal algebra through the lens of category theory.

My overall conclusion is that I learned quite a bit from working through the category theory book, but that it were the concrete examples from which I benefitted most. But I still would not really recommend learning category theory to somebody who doesn’t know yet why he wants to learn it.

Posted in category theory, isomorphism | Tagged , | Leave a comment

A canonical labeling technique by Brendan McKay and isomorphism testing of deterministic finite automata

A deterministic finite automaton (DFA) M is a 5-tuple, (Q, \Sigma, \delta, q_0, F), consisting of

  • a finite set of states Q
  • a finite set of input symbols \Sigma
  • a (partial) transition function \delta:Q\times \Sigma \to Q
  • an initial state q_0\in Q
  • a set of accept states F \subset Q

An isomorphism between two DFAs M = (Q, \Sigma, \delta, q_0, F) and M' = (Q', \Sigma, \delta', q'_0, F') is a bijection p:Q\to Q' such that \delta'(\cdot,s) = p \circ \delta(\cdot,s) \circ p^{-1} \forall s\in\Sigma, q'_0 = p(q_0) and F' = p(F).

Isomorphism of DFA is GI complete

Without further restrictions, the problem of testing whether two DFAs M and M' are isomorphic is as difficult as testing whether two graphs G and G' are isomorphic, i.e. the problem is GI complete. A simple construction shows that the problem of testing whether two digraphs G and G' are isomorphic reduces to DFA isomorphism testing.

For a digraph G=(V,E), construct the DFA M_G=(Q_G, \Sigma, \delta_G, q_0, F) with Q_G:=V \sqcup E \sqcup \{*\}, \Sigma := \{0,1\}, q_0:=*, F:=\{ \} and \delta_G defined for e\in E by \delta_G(e,0)=\text{tail}(e), \delta_G(e,1)=\text{head}(e) and for v\in V\sqcup \{*\} by \delta_G(v,0)=\delta_G(v,1)=*.
Digraph and corresponding DFA
Obviously, the digraphs G and G' are isomorphic iff the DFAs M_G and M_{G'} are isomorphic.

A canonical labeling technique

A reasonable restriction is to require that every state in Q is reachable from q_0. Let \Sigma:=\{s_1,\ldots,s_d\}. Then a unique labeling \ell(q_0) of Q can be produced by a breadth first search of M starting at q_0 as follows: You have a queue (first-in, first-out store), initially containing only q_0. Repeatedly do this: remove the state at the head of the queue, say x, then push into the queue (at the tail) those of \delta(x,s_1),\ldots,\delta(x,s_d) (in that order) which have never previously been put into the queue. Since every state is reachable from q_0, every state is put into the queue eventually. Stop when that has happened and define \ell(q_0) to be the order in which states were put into the queue. Since this labeling is independent of any ordering or labeling of Q, it fixes the only bijection between Q and Q' that can possibly lead to an isomorphism between M and M'. It is easy then to check whether they are actually isomorphic.

Attributing that algorithm to Brendan McKay would be misleading, because it is so obvious. For example J.-E. Pin described the same algorithm (using depth first search instead of breadth first search) earlier, as an answer to a question about finding an isomorphism between finite automata.

Brendan McKay’s canonical labeling technique

The context in which Brendan McKay described his canonical labeling technique was actually slightly more complicated, because there was no distinguished state q_0, and the state space of the automaton wasn’t required to be weakly connected. But instead, \delta(\cdot,s) was injective for all s\in S. If we drop the requirement that \delta(\cdot,s) is a total function and allow it to be partial, then we end up exactly with the reversible deterministic finite automata, described earlier on this blog. Brendan McKay’s technique generalizes effortlessly to this setting, as does the original context (where group actions are replaced by inverse semigroup actions).

Above are some colored digraphs corresponding to reversible deterministic finite automata (the initial state q_0 and the final states F are ignored). We continue to describe the canonical labeling technique in terms of deterministic finite automata. For each state q\in Q, we can produce a unique labeling \ell(q) (of the weakly connected component containing q) by a breadth first search of M starting at q as above, expect when we remove the state x from the head of the queue, (instead of \delta(x,s_1),\ldots,\delta(x,s_d)) we have to push into the queue (at the tail) those of \delta(x,s_1),\delta^{-1}(x,s_1),\ldots,\delta(x,s_d),\delta^{-1}(x,s_d) (in that order) which are defined and have never previously been put into the queue.

But how can we create a canonical labeling of M from the unique labelings \ell(q)? Just take (one of) the lexicographically smallest labelings of each block, and sort the block lexicographically based on those labelings. The lexicographical order in this context is defined by encoding a labeled block in some reasonable (unique) way as a string, and comparing those strings lexicographically. It’s a good idea to modify this lexicographical order to first compare the lengths of the strings, but this is already an optimization, and there are more possible optimizations, as described in Brendan’s answer. These optimizations are the context of the following remark at the end of his answer:

The graph is actually a deterministic finite automaton (at most one edge of each colour leaves each vertex). There could be faster isomorphism algorithms out there for DFAs, though I’m dubious that anything would work faster in practice than a well-tuned implementation of the method I described.

This remark was actually the main motivation of this blog post, i.e. to investigate the relation of Brendan’s technique to isomorphism testing for DFAs. If there were really faster isomorphism algorithms out there for DFAs, then either they would not be applicable to the problem which Brendan had solved there (if they rely on reachability from q_0), or they would also solve GI itself in polynomial time (which is highly unlikely).


The truth is I have been yak shaving again, even so I explicitly told vzn that I didn’t want to get too involved with GI. How did it happen? Fahad Mortuza (aka Jim) asked me for help refuting his attempts to solve GI in quasipolynomial time. (This is completely unrelated to the recent breakthrough by László Babai, except maybe that he triggered some people to spend time on GI again.) While working with him, I realized that permutation group isomorphism might be more difficult than group isomorphism, and asked some questions to get a better feeling for the relation of permutation group isomorphism to GI. The idea to consider the case where the group is fixed came from reading an introduction to category theory, because those morphisms of left M-sets felt like cheating (or oversimplifying) to me. So I guessed that this case should be easy to solve, if I would only understand how Brendan McKay handles automorphisms of graphs efficiently. Then Brendan McKay himself explained his technique (for this case) and I discovered the interesting connection to inverse semigroups and finite automata, so I decided I had to write a blog post on this. Now I have written this post, just to avoid having another item on my ToDo list which never gets done.

Posted in inverse semigroups, isomorphism, partial functions | Tagged , | 4 Comments

On Zeros of a Polynomial in a Finite Grid: the Alon-Furedi bound

This paper is a great achievement. Not just that it formulates and proves a very appropriate common generalization of Alon-Furedi, Schwartz-Zippel and other theorems, it is well organized, easy to read, and very inspiring.

Anurag's Math Blog

My joint paper with Aditya Potukuchi, Pete L. Clark and John R. Schmitt is now up on arXiv: arXiv:1508.06020.

This work started a few months back when I emailed Pete and John, pointing out an easy generalization of Chevalley-Warning theorem using something known as the punctured combinatorial nullstellensatz, after reading their paper on Warning’s second theorem: arXiv:1404.7793. They got pretty excited about it and we started discussing some related things. Finally, it’s not the generalization of Chevalley-Warning that this paper is about but the theorem of Alon-Füredi itself, which is the main tool they used in their paper to generalize Warning’s second theorem. In our discussions we found several unexplored connections between this elementary result on polynomials and results from different areas of maths. My friend Aditya joined us in between with his amazingly simple proof of Alon-Füredi which, along with the annoying realization that a result of DeMillo-Lipton-Zippel doesn’t…

View original post 124 more words

Posted in Uncategorized | Leave a comment


Annoying Precision

My current top candidate for a mathematical concept that should be and is not (as far as I can tell) consistently taught at the advanced undergraduate / beginning graduate level is the notion of a groupoid. Today’s post is a very brief introduction to groupoids together with some suggestions for further reading.

View original post 1,848 more words

Posted in Uncategorized | Leave a comment

Reversibility of binary relations, substochastic matrices, and partial functions

After the last post, I decided that the next post should contain images. Next I decided that the time to publish another post has come. Here is an image of an acceptor finite-state machine, parsing the string “nice”.

finite-state-machine, parsing the string 'nice'

How can we revert this machine, such that it parses the string “ecin”? An easy way is to remove state “6 Error”, revert all remaining arrows, swap state “1 Start” and state “7 Success”, then add state “6 Error” again, and add arrows to it for all yet undefined cases. This post elaborates the observation that renouncing on the state “6 Error” altogether, and instead allow undefined cases, makes everything simpler and more symmetric. This is not just relevant for the deterministic case, but also for the nondeterministic and the probabilistic case. Time reversal is significantly different for (non-)deterministic and probabilistic machines. Using partial functions, binary relations, and substochastic matrices as transitions gives deceptively simple representations of this fact.

Volker Diekert, Manfred Kufleitner, Gerhard Rosenberger: “Diskrete algebraische Methoden: Arithmetik, Kryptographie, Automaten und Gruppen” is a very nice book. Its chapter 7 Automatentheorie gave me the idea that non-deterministic machines arise naturally if (arbitrary) binary relations are used instead of functions. If you read that chapter, you will realize that this post is a significant step back in terms of depth and generality. So the worst is yet to come…

What is the difference between non-determinism and randomness? This question asked for clarification of the following quote

A non-deterministic machine is not the same as a probabilistic machine. In crude terms, a non-deterministic machine is a probabilistic machine in which probabilities for transitions are not known

I gave a short answer trying to highlight the fact that the probabilities on the transitions make time reversal for a probabilistic machine significantly less trivial than time reversal for a non-deterministic machine. Niel de Beaudrap commented that even non-deterministic machines are not reversible without further constraints. I agree with him in a certain complicated technical sense, but I couldn’t really judge whether he had this complicated technical sense in mind. Maybe he just doubts the basic heuristic principle according to which non-deterministic machines are much more symmetry with respect to time than deterministic machines. So I took some time to elaborate my answer, and highlight in more detail the achievable time symmetry for non-deterministic and probabilistic machines. I want to try to reblog my answer here:

Stepping backwards during debugging as a motivation for non-determinism

The notion of a non-deterministic machine suggests itself when you wish to step backward (in time) through a program while debugging. In a typical computer, each step modifies only a finite amount of memory. If you always save this information for the previous 10000 steps, then you can nicely step both forward and backward in the program, and this possibility is not limited to toy programs. If you try to remove the asymmetry between forward steps and backward steps, then you end up with the notion of a non-deterministic machine.

Similarities and differences between non-determinism and randomness

While probabilistic machines shares some characteristics with non-deterministic machines, this symmetry between forward steps and backward steps is not shared. To see this, let’s model the steps or transitions of a deterministic machine by (total or partial) functions, the transitions of a non-deterministic machine by (finite) relations, and the transitions of a probabilistic machine by (sub)stochastic matrices. For example, here are corresponding definitions for finite automata

  • a finite set of states Q
  • a finite set of input symbols \Sigma
  • deterministic: a transition function \delta:Q\times \Sigma \to Q
  • non-deterministic: a transition function \Delta:Q\times \Sigma \to P(Q)
  • non-deterministic: a transition relation \Delta\subset Q\times \Sigma \times Q
  • non-deterministic: a function \Delta: \Sigma \to P(Q \times Q)
  • probabilistic: a function \delta: \Sigma \to ssM(Q)

Here P(Q) is the power set of Q and ssM(Q) is the space of substochatic matrices on Q. A right substochastic matrix is a nonnegative real matrix, with each row summing to at most 1.

There are many different reasonable acceptance conditions

The transitions are only one part of a machine, initial and final states, possible output and acceptance conditions are also important. However, there are only very few non-eqivalent acceptance conditions for deterministic machines, a number of reasonable acceptance conditions for non-deterministic machines (NP, coNP, #P, …), and many possible acceptance conditions for probabilistic machines. Hence this answer focuses primarily on the transitions.

Reversibility is non-trivial for probabilistic machines

A partial function is reversible iff it is injective. A relation is always reversible in a certain sense, by taking the opposite relation (i.e. reversing the direction of the arrows). For a substochastic matrix, taking the transposed matrix is analogous to taking the opposite relation. In general, the transposed matrix is not a substochastic matrix. If it is, then the matrix is said to be *doubly substochastic*. In general P P^T P\neq P, even for a doubly substochastic matrix P, so one can wonder whether this is a reasonable notion of reversibility at all. It is reasonable, because the probability to reach state B from state A in k forward steps is identical to the probability to reach state A from state B in k backward steps. Each path from A to B has the same probability forward and backward. If suitable acceptance conditions (and other boundary conditions) are selected, then doubly substochastic matrices are an appropriate notion of reversibility for probabilistic machines.

Reversibility is tricky even for non-deterministic machines

Just like in general P P^T P\neq P, in general R\circ R^{op}\circ R \neq R for a binary relation R. If R describes a partial function, then R\circ R^{op}\circ R = R and R^{op}\circ R\circ R^{op} = R^{op}. Even if relations P and Q should be strictly reversible in this sense, this doesn’t imply that P\circ Q will be strictly reversible too. So let’s ignore strict reversibility now (even so it feels interesting), and focus on reversal by taking the opposite relation. A similar explanation like for the probabilistic case shows that this reversal works fine if suitable acceptance conditions are used.

These considerations also make sense for pushdown automata

This post suggests that one motivation for non-determinism is to remove that asymmetry between forward steps and backward steps. Is this symmetry of non-determinism limited to finite automata? Here are corresponding symmetric definitions for pushdown automata

  • a finite set of states Q
  • a finite set of input symbols \Sigma
  • a finite set of stack symbols \Gamma
  • deterministic: a partial transition function \delta:Q\times\Gamma\times (\Sigma\cup\{\epsilon\}) \to Q\times\Gamma^{\{0,2\}} such that \delta(q,\gamma,\epsilon)\neq\epsilon only if \delta(q,\gamma,\sigma)=\epsilon for all \sigma\in\Sigma
  • non-deterministic: a transition function \Delta:Q\times\Gamma^{\{0,1\}}\times (\Sigma\cup\{\epsilon\}) \to P(Q\times\Gamma^{\{0,1\}})
  • non-deterministic: a transition relation \Delta\subset Q\times\Gamma^{\{0,1\}}\times (\Sigma\cup\{\epsilon\}) \times Q\times\Gamma^{\{0,1\}}
  • non-deterministic: a function \Delta: \Sigma\cup\{\epsilon\} \to P(Q\times\Gamma^{\{0,1\}}\ \times\ Q\times\Gamma^{\{0,1\}})
  • probabilistic: a function \delta: \Sigma\cup\{\epsilon\} \to ssM(Q\times\Gamma^{\{0,1\}}) such that \delta(\epsilon)+\delta(\sigma)\in ssM(Q\times\Gamma^{\{0,1\}}) for all \sigma\in\Sigma

Here \epsilon is the empty string, \Gamma^{\{0,2\}}=\{\epsilon\}\cup\Gamma\cup(\Gamma\times\Gamma) and \Gamma^{\{0,1\}}=\{\epsilon\}\cup\Gamma. This notation is used because it is similar to \Gamma^*, which is used in many definitions for pushdown automata.

Diagramed verification of reversal for (non)advancing input and stack operations

An advancing input operation with b\in\Sigma\subset\Sigma\cup\{\epsilon\} gets reversed as follows

\begin{matrix}  a|bc \to a|\boxed{b}c \to ab|c\\  a|bc \leftarrow a\boxed{b}|c \leftarrow ab|c\\  c|ba \to c|\boxed{b}a \to cb|a  \end{matrix}

A non advancing input operation with \epsilon\in\Sigma\cup\{\epsilon\} that doesn’t read any input can be reversed

\begin{matrix}  a|bc \to a|bc \to a|bc\\  a|bc \leftarrow a|bc \leftarrow a|bc\\  cb|a \to cb|a \to cb|a  \end{matrix}

Here is a diagram of an advancing input operation whose reversal would look bad


For a stack operation (s,t) \in \Gamma^{\{0,1\}}\times\Gamma^{\{0,1\}}, there are the three cases (s,t)=(a,\epsilon), (s,t)=(\epsilon,a), and (s,t)=(a,b). The stack operation (a,\epsilon) gets reversed to (\epsilon,a) as follows

\begin{matrix}  ab\ldots \to \boxed{a}b\ldots \to |b\ldots\\  \boxed{a}b\ldots \leftarrow |b\ldots \leftarrow b\ldots\\  b\ldots \to |b\ldots \to \boxed{a}b\ldots  \end{matrix}

The stack operation (a,b) gets reversed to (b,a) as follows

\begin{matrix}  ac\ldots \to \boxed{a}c\ldots \to \boxed{b}c\ldots\\  \boxed{a}c\ldots \leftarrow \boxed{b}c\ldots \leftarrow bc\ldots\\  bc\ldots \to \boxed{b}c\ldots \to \boxed{a}c\ldots  \end{matrix}

A generalized stack operation (ab,cde)\in\Gamma^*\times\Gamma^* would be reversed to (cde,ab)

\begin{matrix}  abf\ldots \to \boxed{ab}f\ldots \to \boxed{cde}f\ldots\\  \boxed{ab}f\ldots \leftarrow \boxed{cde}f\ldots \leftarrow cdef\ldots\\  cdef\ldots \to \boxed{cde}f\ldots \to \boxed{ab}f\ldots  \end{matrix}

Reversibility for Turing machines

A machine with more than one stack is equivalent to a Turing machine, and stack operations can easily be reversed. The motivation at the beginning also suggests that reversal (of a Turing machine) should not be difficult. A Turing machine with a typical instruction set is not so great for reversal, because the symbol under the head can influence whether the tape will move left or right. But if the instruction set is modified appropriately (without reducing the computational power of the machine), then reversal is nearly trivial again.

A reversal can also be constructed without modifying the instruction set, but it is not canonical and a bit ugly. It might seem that the existence of a reversal is just as difficult to decide as many other question pertaining to Turing machines, but a reversal is a local construction and the difficult questions often have a global flavor, so pessimism would probably be unjustified here.

The urge to switch to equivalent instruction sets (easier to reverse) shows that these questions are less obvious than they first appear. A more subtle switch happened in this post before, when total functions and stochastic matrices were replaced by partial functions and substochastic matrices. This switch is not strictly necessary, but the reversal is ugly otherwise. The switch to the substochastic matrices was actually the point where it became obvious that reversibility is not so trivial after all, and that one should write down details (as done above) instead of taking just a high level perspective (as presented in the motivation at the beginning). The questions raised by Niel de Beaudrap also contributed to the realization that the high level perspective is slightly shaky.


Non-deterministic machines allow a finite number of deterministic transitions at each step. For probabilistic machines, these transitions additionally have a probability. This post conveys a different perspective on non-determinism and randomness. Ignoring global acceptance conditions, it focuses on local reversibility (as a local symmetry) instead. Because randomness preserves some local symmetries which are not preserved by determinism, this perspective reveals non-trivial differences between non-deterministic and probabilistic machines.

Posted in partial functions | Tagged , | 1 Comment

Algebraic characterizations of inverse semigroups and strongly regular rings

A good place to learn about inverse semigroups are Tero Harju’s Lecture Notes on Semigroups from 1996. (J.M. Howie’s, “Fundamentals of semigroup theory” from 1995 is claimed to be even better, but I can’t comment on that.) Learning about strongly regular rings is more difficult. The information is quite scattered, but the important elementary facts seem to be known, even if difficult to find. A simple request to give a link to the “well-known” signature and equations (for a commutative regular ring) made me realize how being scattered can make it difficult to refer to these “well-known” facts. The plan was to make a blog post which contains these elementary facts, together with proofs. But producing good looking math in wordpress is challenging, if you’ve never done it.

The real text is in a short pdf

The nice looking math is now presented in a short pdf produced using latex (lyx): Algebraic characterizations of inverse semigroups and strongly regular rings. The original proofs were mostly found via google, but after a conversation with a real guru, I realized that google may not be the best way for cheating in this case: This Google Drive folder contains prover9 and the-E-theorem-prover input files. The companion program mace4 to prover9 finds counterexamples, which really helped me to finish the text at all and streamline it nicely, even so none of the counterexamples is mentioned anywhere in the text.

Here is a condensed down excerpt from the short pdf

A semigroup S is a set S together with a binary operation \cdot:S\times S \to S which is associative: \forall x,y,z\in S\quad x\cdot(y\cdot z)=(x\cdot y)\cdot z. To simplify notation, concatenation is used instead of \cdot and parentheses are omitted.
We say that y\in S is a pseudoinverse of x\in S, if x=xyx. We call y an inverse element of x, if x=xyx and y=yxy. If y is a pseudoinverse of x, then yxy is an inverse element of x, because x=x(yxy)x and yxy=(yxy)x(yxy).
A regular semigroup S is a semigroup S where each element has a pseudoinverse: \forall x\in S\exists y\in S\quad x=xyx. An inverse semigroup is a regular semigroup where the inverse elements are unique, such that an inverse operation ^{-1} can be defined implicitly via x=xyx\land y=yxy\leftrightarrow y=x^{-1}.

Lemma 2 The idempotents E_S of an inverse semigroup S are a subsemigroup.

Theorem 1 The following characterizations are equivalent:

  1. S is an inverse semigroup, i.e. a semigroup with an operation ^{-1} satisfying the following quasi-equations

    x=xyx\land y=yxy \ \leftrightarrow \ y=x^{-1}

  2. S is a regular semigroup where idempotents commute, i.e. the following formulas hold

    \forall x\in S\exists y\in S \ x=xyx
    e^{2}=e \land f^{2}=f \rightarrow ef=fe

  3. S is a semigroup with an operation ^{-1} satisfying the following equations

    x=xx^{-1}x \ \land \ x^{-1}=x^{-1}xx^{-1}
    xx^{-1}\cdot y^{-1}y \ = \ y^{-1}y\cdot xx^{-1}

Case (1) implies Case (2): We have to show ef=fe for idempotent e,f\in S. From lemma 2 we have that ef and fe are idempotent. So ef is its own inverse, but fe is also an inverse element of ef: ef\cdot fe\cdot ef=efef=ef and fe\cdot ef\cdot fe=fefe=fe. By uniqueness of inverse elements, we have ef=fe.

Case (2) implies Case (1): Let x',x''\in S be two inverse elements of x. Then x=xx'x, x'=x'xx', x=xx''x, and x''=x''xx''. We have

x'=x'xx'=x'xx''xx' \ = \ x'xx''xx''xx'
x''=x''xx''=x''xx'xx'' \ = \ x''xx'xx'xx''

and x'xx''xx''xx'=x''xx'xx''xx'=x''xx'xx'xx'', because x'x, x''x, xx', and xx'' are idempotent and commute.

Case (1) and Case (2) imply Case (3): The existence of an operation ^{-1} satisfying x=xx^{-1}x and x^{-1}=x^{-1}xx^{-1} follows from (1). The remaining equation follows from (2), because xx^{-1} and y^{-1}y are idempotent and commute.

Case (3) implies Case (2): Let e,f\in S be idempotent. Then e^{-1}=e^{-1}ee^{-1}=e^{-1}e\cdot ee^{-1}=ee^{-1}\cdot e^{-1}e, so e=ee^{-1}e=e^{2}e^{-1}\cdot e^{-1}e^{2}=ee^{-1}\cdot e^{-1}e=e^{-1} and similarly f=f^{-1}. So ef=ee^{-1}\cdot f^{-1}f=f^{-1}f\cdot ee^{-1}=fe.

A regular ring R is a ring R where each element has a multiplicative pseudoinverse: \forall x\in R\exists y\in R\quad x=xyx. A strongly regular ring R is a ring R where each element has a strong multiplicative pseudoinverse: \forall x\in R\exists y\in R\quad x=x^{2}y. An inverse ring R is a ring R where the multiplicative semigroup is an inverse semigroup.

Theorem 3 The following characterizations are equivalent:

  1. R is a strongly regular ring, i.e. the following formula holds

    \forall x\in R\exists y\in R\quad x=x^{2}y

  2. R is a regular ring without nonzero nilpotent elements, i.e. the following formulas hold

    \forall x\in R\exists y\in R \ x=xyx
    x^{2}=0 \rightarrow x=0

  3. R is a regular ring where idempotents are central, i.e. the following formulas hold

    \forall x\in R\exists y\in R \ x=xyx
    e^{2}=e \rightarrow ex=xe

  4. R is a regular ring where idempotents commute, i.e. the following formulas hold

    \forall x\in R\exists y\in R \ x=xyx
    e^{2}=e\land f^{2}=f \rightarrow ef=fe

  5. R is an inverse ring, i.e. a ring with an operation ^{-1} satisfying the following quasi-equations

    x=xyx\land y=yxy\leftrightarrow y=x^{-1}

Case (1) implies Case (2): Let y be a strong pseudoinverse of x, then x=x^{2}y. If x^{2}=0, then x=x^{2}y=0y=0. Now (x-xyx)^{2}=x^{2}-x^{2}yx-xyx^{2}+xyx^{2}yx=x^{2}-x^{2}-xyx^{2}+xyx^{2}=0, hence x=xyx.

Case (2) implies Case (1): Let y be a pseudoinverse of x, then x=xyx. We have (x-x^{2}y)^{2}=x^{2}-x^{3}y-x^{2}yx+x^{2}yx^{2}y=x^{2}-x^{3}y-x^{2}+x^{3}y=0, hence x=x^{2}y.

Case (2) implies Case (3): Let e^{2}=e. We have (ex(1-e))^{2}=0, because (1-e)e=e-e^{2}=0. The same argument without the neutral element 1 reads (ex-exe)^{2}=(ex-exe)e(x-xe)=(exe-exe)(x-xe)=0. Similarly (xe-exe)^{2}=0. Hence ex=exe=xe.

Case (3) implies Case (4): This is trivial.

Case (4) implies Case (2): If x^{2}=0, then (x+xy)^{2}=x^{2}+x^{2}y+xyx+xyxy=0+0y+x+xy=x+xy for any pseudoinverse y of x. Hence x+xy is idempotent and commutes with xy, i.e. xy(x+xy)=(x+xy)xy. This simplifies to xyx=x^{2}y, and the conclusion is x=xyx=x^{2}y=0y=0.

Case (4) and Case (5) are equivalent according to theorem 1.


Why write about inverse semigroups and strongly regular rings, when good lecture notes and textbooks already exist? And why reformat a condensed down excerpt from the short pdf into a wordpress blog? After reading a preprint on skew meadows, I became unsure how much of the elementary facts about strongly regular rings are really “well-known”. And I found it difficult to try to explain background material on such things, when I really want to talk about more distantly related questions. And publishing it as a blog was a consequence on the time I already sank trying to produce good looking math in wordpress.
Here is a disclaimer that I can’t judge how useful inverse semigroups (and their categorification as inverse categories) really are, and that I first have to read more about partial functions (and their categorification as restriction categories) for myself. Also I don’t really care about non-commutative strongly regular rings. I find commutative regular rings useful, because having a total multiplicative inverse operation seems convenient, and I find it canonical to define it the way as implied by the relation to inverse semigroups. This turns Boolean algebras into a special case of commutative regular rings, the idempotents of a strongly regular ring are a Boolean algebra, and x\to xx^{-1} is the canonical projection into this Boolean algebra. I tried to stay non-commital as to whether the inverse operation is explicitly or implicitly defined, because nobody ever complains about whether the inverse operation for a group is explicitly defined or not. The fact that the free commutative regular ring (for a finite set of generators) exists indicates to me that they might sometimes be useful. Jan Bergstra told me that Komori and Ono proved that their equational theory is decidable.
But the point of this post is also to just finish the work on this for the moment, since I already spend a significant amount of time for this over the last three weeks. If you find this stuff interesting, do take a look at the short pdf. It is nicer than this blog post in many ways, and doesn’t try to dive into advanced material either. (Characterizations in terms of Green’s relation or ideals would be more advanced for example, but the advanced material only starts there.) Not sure how useful the input files for the theorem provers are, but part of the appeal of equational reasoning is that it lends itself for experiments with automated reasoning tools.

Posted in equational theory, inverse semigroups | Tagged , | Leave a comment