## A subset interpretation (with context morphisms) of the sequent calculus for predicate logic

The previous two posts used the sequent calculus with a subset interpretation:

We work with sequents $A_{1},\ldots,A_{r}\vdash B_{1},\ldots,B_{s}$, and interpret the propositions $A_i$ (and $B_j$) as subsets of some universe set $X$. We interpret the sequent itself as $[A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]$.

While writing the previous post, there was the temptation to introduce the sequent calculus rules for existential and universal quantification. But the subset interpretation with a fixed universe set $X$ didn’t seem helpful for free variables and quantification. But what if we allow different universe sets for different sequents? One way would be to augment the sequent rules to change the universe sets in a well defined way. However, the cut rule and two of the four quantification rules (those that replace an arbitrary term by a quantified variable) are a bit problematic. For them, the universe can sometimes be shrunk down, since a formula or a term was eliminated. Therefore, we just add two separate rules for changing the universe, and apply the sequent rules themselves only with respect to a fixed universe set.

This is the last planed post on the sequent calculus for the moment. (There might be an additional post in this series, extending the subset interpretation to natural deduction.) Since this post will remain vague and unsatisfactory in many ways, a list of related online materials on the sequent calculus seems appropriate: I often consulted the wikipedia article while writing these posts. (I never consulted Sequent Calculus Primer, but it looks nice and has exercises.) This MathOverflow question on introductions to sequent calculus suggests the chapter on Sequent Calculus from Frank Pfenning’s manuscript on Automated Theorem Proving, which I really liked. It also suggests Proofs and Types by Jean-Yves Girard, which was my first introduction to the sequent calculus. It left me unsatisfied, and I agree with Charles Stewart:

A warning: Girard’s style is a little slippery, and it is common for students to say they have read it, who turn out to have absorbed the opinions but little of the results.

I am still slightly ashamed that one of my first questions – when I met a group of academic logicians (with a focus on proof assistants) – was whether they found Girard easy to understand. Girard claims that the sequent calculus is generally ignored by computer scientists, an exception being Logic For Computer Science Foundations of Automatic Theorem Proving by Jean H. Gallier. A nice text, but who has time to read those 500+ pages? This is one reason why I like An Introduction to Proof Theory by Samuel R. Buss.

### The sequent calculus rules for predicate logic

We are interested in normal first order predicate logic with predicates and functions. Here a signature with a finite number of predicate and function symbols is given, which include the arity for each of the symbols. The sequent calculus rules for quantification are:

Left quantification rules Right quantification rules
$\begin{array}{c} \Gamma,A[t/x]\vdash\Delta\\ \hline \Gamma,\forall xA\vdash \Delta \end{array}(\forall L)$ $\begin{array}{c} \Gamma\vdash A[t/x],\Delta\\ \hline \Gamma\vdash \exists xA,\Delta \end{array}(\exists R)$
$\begin{array}{c} \Gamma,A[y/x]\vdash\Delta\\ \hline \Gamma,\exists xA\vdash \Delta \end{array}(\exists L)$ $\begin{array}{c} \Gamma\vdash A[y/x],\Delta\\ \hline \Gamma\vdash \forall xA,\Delta \end{array}(\forall R)$

Those rules look simple, but what do they mean? What is the difference between $t$ and $y$? And how to interpret a sequent with formulas containing free variables? First, $y$ is a free variable not occurring in $\Gamma$, $\Delta$, or $A$. And $A[y/x]$ means that each free occurrence of $x$ in $A$ is replaced by $y$. Next, $t$ is an arbitrary term, with no restrictions at all on the variables occurring in it. A term is just any reasonable combination of function symbols (from the signature) and variables, like $f(x,g(f(y,x),y))$ where both $f$ and $g$ are binary function symbols.

There are many possible interpretations of predicate logic, but that answer just avoids answering the question. The interpretation of a formula or sequent with free variables can be controversial. My previous effort at a subset based interpretations of predicate logic compatible with non-classical logic left out equality, functions and constants. Even so we are talking only about classical logic here, the goal of the following interpretation is to be also applicable more generally. That interpretation will follow naturally once we clarify which models we want to allow.

### Models and interpretations of predicate calculus

For the typical interpretations, a model consists of a set $U$ (which is the universe where the variables live), a subset of $U^k$ for each $k$-ary predicate symbol, and a function $f:U^k \to U$ for each $k$-ary function symbol. A straightforward interpretation in that context is that a formula with $k$ free variables is a subset of $U^k$, and a formula without free variables (i.e. a sentence) is either true or false. And for any set of $k$ variables, we can interpret any formula whose free variables are included in that set as a subset of $U^k$. A sequent in this interpretation is true, iff for the subsets formed by the formulas interpreted relative to the set of all free variables in the sequent, the intersection of the subsets on the left side is contained in the union of the subsets on the right side.

This straightforward interpretation above is already a subset interpretation, but we want a slightly more general subset interpretation. For propositional logic, the possible models were given by a set $X$ and a subset of $X$ for each propositional variable $A_i$. In predicate logic, the 0-ary predicates take the role of the propositional variables, so we want them to be subsets of $X$. For our generalised interpretation, a model consists of a set $X$ (whose subsets are the truth values) and a set $U$ (which is the universe where the variables live), a subset of $X \times U^k$ for each $k$-ary predicate symbol, and a function $f:X \times U^k \to U$ for each $k$-ary function symbol.

The way $X$ will enter into the model is to have some implicit context $\xi\in X$ such that formulas like $P(x, f(y, z), z)$ in the formal syntax become $P(\xi, x, f(\xi, y, z), z)$ in the model. So for any set of $k$ free variables, we can interpret any formula whose free variables are included in that set as a subset of $X\times U^k$. For a given set of variables and a sequent for which the free variables of the formulas are contained in that subset, the a sequent is true, iff for the subsets formed by the formulas interpreted relative that set of variables, the intersection of the subsets on the left side is contained in the union of the subsets on the right side.

(If we have a formula like $x+y = y+x$, we would like to interpret it as a subset of $U^2$, so we normally use $X=\{()\}$ and then just omit it. Here the element $()$ is called unit, the set $\{()\}$ represents truth, and the empty set $\emptyset$ represents falsehood.)

### Context morphisms give life to the subset interpretation

If we have (a sequent of) subsets $A_i$ over $X$, then a (partial) function $p : Y \to X$ from $Y$ to $X$ gives us (a sequent of) subsets over $Y$ by taking inverse images $p^{-1}(A_i)$ of the subsets. We have

• $A \subseteq B \Rightarrow p^{-1}(A) \subseteq p^{-1}(B)$
• $p^{-1}(A\cap B)=p^{-1}(A)\cap p^{-1}(B)$
• $p^{-1}(A\cup B)=p^{-1}(A)\cup p^{-1}(B)$

and hence

$[A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]$
$\Downarrow$
$p^{-1}([A_{1}])\cap\ldots\cap p^{-1}([A_{r}])\ \subseteq\ p^{-1}([B_{1}])\cup\ldots\cup p^{-1}([B_{s}])$

The direct translation into the language of sequences reads

$\begin{array}{c} A_{1}(\xi),\ldots,A_{r}(\xi)\vdash B_{1}(\xi),\ldots,B_{s}(\xi)\\ \hline A_{1}(p(\nu)),\ldots,A_{r}(p(\nu))\vdash B_{1}(p(\nu)),\ldots,B_{s}(p(\nu)) \end{array}$

This just tells us that global substitution is unproblematic. (But $r\geq 1$ is required in case $p$ is a partial function, which is easily missed.) However, the more typical function $p$ for our purpose is similar to $p(u,v,w)=(u,v)$, i.e. a function from $Y=U^3$ to $X=U^2$. It is typical, since the expressions grow in complexity for most rules of sequent calculus. Here is an example:

$\begin{array}{c} \begin{array}{l}A\vdash A:\{a\}\\ \hline A\vdash A:\{(a,b)\} \end{array} \qquad \begin{array}{l}B\vdash B:\{b\}\\ \hline B\vdash B:\{(a,b)\}\end{array}\\ \hline A,B\vdash A\land B:\{(a,b)\}\end{array}$

This example also shows, how we can separately change the universe set, and apply the sequent rules only with respect to a fixed universe set. After applying the $(\exists L)$ or $(\forall R)$ rule, we can shrink the universe set. To illustrate, let us first apply the $(\forall L)$ and $(\exists R)$ rule, where we cannot always shrink the universe set:

$\begin{array}{r} P(x) \vdash P(x) : X\times U\\ \hline \forall xP(x) \vdash P(x) : X\times U \end{array}$

Here we cannot shrink the universe set.

$\begin{array}{l} \forall xP(x) \vdash P(x) : X\times U\\ \hline \forall xP(x) \vdash \exists x P(x) : X\times U \end{array}$

Now we can shrink the universe set, but only after checking that the sequent no longer depends on $x$. But when we apply $(\forall R)$, we can always shrink the universe set:

$\begin{array}{l} \forall xP(x) \vdash P(x) : X\times U\\ \hline \forall xP(x) \vdash \forall x P(x) : X\times U\\ \hline \forall xP(x) \vdash \forall x P(x) : X \end{array}$

How can we justify this shrinking in terms of images or inverse images? The inverse image under $p:X \to X\times U, \xi \to (\xi, c(\xi))$ for an arbitrary function (constant) $c(\xi)$ could be a justification. But we don’t really write $[\forall xP(x)](c) \vdash [\forall x P(x)](c)$, and $c$ is arbitrary and irrelevant. So the image under $F:X\times U \to X, (\xi, u) \to \xi$ might be a nicer justification. But we only have

• $A \subseteq B \Rightarrow F(A) \subseteq F(B)$
• $F(A\cap B)\subseteq F(A)\cap F(B)$
• $F(A\cup B)=F(A)\cup F(B)$

However, if we assume $F^{-1}(F([A_i]))=[A_i]$, then $F([A_i]\cap\dots\cap [A_r])=F([A_i])\cap\dots\cap F([A_r])$ and hence

$[A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]$
$\Downarrow$
$F([A_{1}])\cap\ldots\cap F([A_{r}])\ \subseteq\ F([B_{1}])\cup\ldots\cup F([B_{s}])$

### Doubts about context morphisms and substitutions

Let us apply the above to propositional logic. Surely, from $A,B\vdash A\land B$ we can deduce $C\lor D,C\to D\vdash (C\lor D)\land (C\to D)$. But we cannot justify it from $A,B\vdash A\land B:\{(a,b)\}$ in terms of inverse images. Can we justify it from $A,B\vdash A\land B:X_a\times X_b$? Yes, take the map $p:X_c\times X_d\to X_a\times X_b$ with $(c,d) \to ([C\lor D](c,d), [C\to D](c,d))$. Here $X_c$ and $X_d$ can be arbitrary and $X_a=X_b=\{0,1\}$. Note that we only need $|X_a|, |X_b| \geq 2$.

How justified are those substitutions into the expressions? Here, we used

$[A\land B]([C\lor D](c,d), [C\to D](c,d))=[(C\lor D)\land (C\to D)](c,d)$

Can we justify this? Can such substitutions go wrong? Yes, if $p$ is a partial function:

$\begin{array}{c} \top\vdash B(x)\\ \hline [\top](p(y))\vdash [B(\cdot)](p(y))\\ \hline [\top](y)\vdash [B(p(\cdot))](y)\\ \hline \top\vdash B(p(y)) \end{array}$

Here $[\top](p(y)) = [\top](y)$ fails, but it would work if $p$ were a total function. We could have omitted $\top$, but then it would fail because of $r=0$. That was actually the point were I noticed that the restriction $r\geq 1$ is required for partial functions.

The substitutions are justified, because they work for each operation individually:

• $[\bot](p(y)) = [\bot](y)$
• $[A\land B](p(y)) = [A](p(y))\land [B](p(y))$
• $[A\lor B](p(y)) = [A](p(y))\lor [B](p(y))$
• $[A-B](p(y)) = [A](p(y))-[B](p(y))$
• $[A\oplus B](p(y)) = [A](p(y))\oplus [B](p(y))$
• $[\forall z A](p(y)) = \forall z [A](p(y))$
• $[\exists z A](p(y)) = \exists z [A](p(y))$
• $[A(\cdot)](p(y)) = [A(p(\cdot))](y)$

### Treating equality without special rules

Equality can be given by a binary predicate which is a congruence relation for the functions, and compatible with the other predicates. There are different ways to treat equality, and it is important for most interesting applications of predicate logic. A canonical approach is to write the axioms on the left side of the sequent, and use normal deductions. Concrete examples for the required axioms were already given in the last post in the section “Axiomatic theories in predicate logic without falsehood” for the theory of groups and in the section “Axiomatic theories in predicate logic without falsehood” for Robinson arithmetic.

Part of the reason for presenting a subset interpretation for predicate logic was to better understand how predicate logic could work in case of non-classical logics. There are theories like Heyting arithmetic (and Gentzen’s own work), which seem to fit nicely with the interpretation above. But there are also intuitionistic dependent type theories with special rules and equality types, whose possible interpretations remain unclear to me, despite the nice subset interpretation explained above. Maybe Per Martin Löf is right: In the end, everybody must understand for himself (p.166)

### Conclusions?

This post was hard for me to write. This is the classical problem when explaining something obvious (predicate calculus). You have to make a lot of words for something which seems obvious, but where the language is missing to properly write down what is going on behind the scenes. I didn’t solve that language problem, but I came up with a natural example of how the obvious thing can fail to be true. That might help the reader to understand for himself.

The content above already became clear to me while writing the previous post. But only after this current post was nearly finished did I notice the restriction $r\geq 1$ in case $p$ is a partial function. Which is funny, after I clarified the notion of context morphism which makes sense of that logic without truth from the first post (in this series), I find out that there is an additional restriction that the antecedent should never be empty. My first reaction was that I don’t like this. Maybe I should look for a transformation which turns this logic about partial functions into the logic without negation or falsehood. But my second reaction was that it could make sense, perhaps any logical deduction needs assumptions.

Still the question remains whether there is a reasonable transformation from the logic without truth (and partial functions) to the logic without negation and falsehood. Probably that question will be easy to answer. Somehow one has to take complements, and hope that the structure of the predicate calculus remains intact. It seems to work, but I still have trouble to intuitively understand it. And I want to finish this post now.

## Logic without negation and falsehood

In the last post, consideration related to partial functions lead us to present a logic without truth and implication, using the binary minus operation as a dual of implication and substitute for unary negation. But logic without implication and equivalence is strange. So in this post we want to omit negation and falsehood instead.

(This post got surprisingly long. Sometimes I got slightly sidetracked, for example the discussion of the parity function and its sequent rules could have been omitted. But I already tried to remove most of those sidetracks. The real reason for the size explosion are the examples of axiomatic theories in predicate logic. But those had to be discussed, since they make it clear that one can really omit falsehood without loosing anything essential.)

### The classical sequent calculus

Before starting to omit logical operations, let us first recall the classical sequent calculus including as many logical operations as barely reasonable. If we limit ourself to constants, unary operations, and binary operations, then we get:

• Constants for truth ($\top$) and falsehood ($\bot$)
• An unary operation for negation ($\lnot$)
• Binary operations for: or ($\lor$), and ($\land$), implication ($\to$), minus ($-$), nand ($|$), nor ($\overline{\lor}$), equivalence ($\equiv$), and xor ($\oplus$). The remaining binary operations add nothing new: reverse implication, reverse minus, first, second, not first, not second, true, and false

We work with sequents $A_{1},\ldots,A_{r}\vdash B_{1},\ldots,B_{s}$, and interpret the propositions $A_i$ (and $B_j$) as subsets of some universe set $X$. We interpret the sequent itself as $[A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]$. Let $\Gamma,\Delta, ...$ stand for arbitrary finite sequences of propositions.

Left structural rules Right structural rules
$\begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(WL)$ $\begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(WR)$
$\begin{array}{c} \Gamma,A,A\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(CL)$ $\begin{array}{c} \Gamma\vdash A,A,\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(CR)$
$\begin{array}{c} \Gamma_{1},A,B,\Gamma_{2}\vdash\Delta\\ \hline \Gamma_{1},B,A,\Gamma_{2}\vdash\Delta \end{array}(PL)$ $\begin{array}{c} \Gamma\vdash\Delta_{1},A,B,\Delta_{2}\\ \hline \Gamma\vdash\Delta_{1},B,A,\Delta_{2} \end{array}(PR)$
Axiom Cut
$\begin{array}{c} \ \\ \hline A\vdash A \end{array}(I)$ $\begin{array}{c} \Gamma\vdash\Delta,A\quad A,\Sigma\vdash\Pi\\ \hline \Gamma,\Sigma\vdash\Delta,\Pi \end{array}(Cut)$
Left logical rules Right logical rules
$\begin{array}{c} \ \\ \hline \bot\vdash\Delta \end{array}(\bot L)$ $\begin{array}{c} \ \\ \hline \Gamma\vdash\top \end{array}(\top R)$
$\begin{array}{c} \Gamma\vdash A,\Delta\\ \hline \Gamma,\lnot A\vdash\Delta \end{array}(\lnot L)$ $\begin{array}{c} \Gamma,A\vdash \Delta\\ \hline \Gamma\vdash \lnot A,\Delta \end{array}(\lnot R)$
$\begin{array}{c} \Gamma,A,B\vdash\Delta\\ \hline \Gamma,A\land B\vdash\Delta \end{array}(\land L)$ $\begin{array}{c} \Gamma\vdash A,B,\Delta\\ \hline \Gamma\vdash A\lor B,\Delta \end{array}(\lor R)$
$\begin{array}{c} \Gamma,A\vdash\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\lor B\vdash\Delta,\Pi \end{array}(\lor L)$ $\begin{array}{c} \Gamma\vdash A,\Delta\quad \Sigma\vdash B,\Pi\\ \hline \Gamma,\Sigma\vdash A\land B,\Delta,\Pi \end{array}(\land R)$
$\begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma,A- B\vdash \Delta \end{array}(- L)$ $\begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma\vdash A\to B,\Delta \end{array}(\to R)$
$\begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\to B\vdash\Delta,\Pi \end{array}(\to L)$ $\begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash \Pi\\ \hline \Gamma,\Sigma\vdash A-B,\Delta,\Pi \end{array}(- R)$
$\begin{array}{c} \Gamma\vdash A,B,\Delta\\ \hline \Gamma,A\overline{\lor} B\vdash \Delta \end{array}(\overline{\lor} L)$ $\begin{array}{c} \Gamma,A,B\vdash\Delta\\ \hline \Gamma\vdash A|B,\Delta \end{array}(| R)$
$\begin{array}{c} \Gamma\vdash A,\Delta\quad \Sigma\vdash B,\Pi\\ \hline \Gamma,\Sigma,A|B\vdash \Delta,\Pi \end{array}(| L)$ $\begin{array}{c} \Gamma,A\vdash\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma\vdash A\overline{\lor} B,\Delta,\Pi \end{array}(\overline{\lor} R)$
$\begin{array}{c} \Gamma,A\vdash B,\Delta\quad\Sigma,B\vdash A,\Pi\\ \hline \Gamma,\Sigma,A\oplus B\vdash \Delta,\Pi \end{array}(\oplus L)$ $\begin{array}{c} \Gamma,A\vdash B,\Delta\quad\Sigma,B\vdash A,\Pi\\ \hline \Gamma,\Sigma\vdash A\equiv B,\Delta,\Pi \end{array}(\equiv R)$
$\begin{array}{c} \Gamma\vdash A,B,\Delta\quad\Sigma,A,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\equiv B\vdash\Delta,\Pi \end{array}(\equiv L)$ $\begin{array}{c} \Gamma\vdash A,B,\Delta\quad\Sigma,A,B\vdash\Pi\\ \hline \Gamma,\Sigma\vdash A\oplus B,\Delta,\Pi \end{array}(\oplus R)$

The rules for equivalence were taken from here, where they are also proved. Observe that $(A\equiv B)\equiv C=(A\oplus B)\oplus C = A\oplus(B\oplus C)$. Clearly $A_1 \oplus \dots \oplus A_r$ is the parity function, which is famous for its computational complexity. It is easy to derive the rules for it (from the rules given above), but those rules have $2^r$ sequents above the line.

The rules for $A_1 \lor \dots \lor A_r$ and $A_1 \land \dots \land A_r$ are obvious. The rules for $A_1 \to \dots \to A_r \to B$ and $A - B_1 - \dots - B_r$ are also obvious, if we agree that this should be read as $A_1 \to (\dots \to (A_r \to B).)$ and $(.(A - B_1) - \dots) - B_r$. Then $A|B=A\to B\to\bot$ and $A\overline{\lor}B=\top-A-B$, so there is no need for multi-argument versions of nand and nor.

### Omitting falsehood from classical logic

If we want to omit falsehood, it is not sufficient to just omit falsehood ($\bot$). The operations minus ($-$) and xor ($\oplus$) must be omitted too, since $\bot = A-A = A\oplus A$. Also nand ($|$) and nor ($\overline{\lor}$) must be omitted, since $\bot = A \overline{\lor}(A\overline{\lor}A)$ and $\bot = (A|(A|A))|(A|(A|A))$. (They are functionally complete, so the only surprise is the length of the formulas). And if we want to keep truth ($\top$) or any of the remaining binary operations (or ($\lor$), and ($\land$), implication ($\to$), or equivalence ($\equiv$)), then also negation ($\lnot$) must be omitted.

But without negation, how can we express the law of excluded middle or double negation? Both $A\lor \lnot A$ and $\begin{array}{c} \lnot\lnot A \\ \hline A \end{array}$ use negation. If we look at proof by cases, then $\begin{array}{c} \lnot A \to A \\ \hline A \end{array}$ or rather its generalization $\begin{array}{c} (A\to B) \to A \\ \hline A \end{array}$ suggests itself, which is called Peirce’s law. And the sequent calculus suggests $\begin{array}{c} A\to(B\lor C) \\ \hline (A\to B)\lor C\end{array}$, which expresses a nice algebraic property of classical logic. But how do we prove that we get classical logic, and what does that mean?

At least classical tautologies like $((A\to B)\to B) \to (A\lor B)$ not using any of the omitted operations should still be tautologies. If “still be tautologies” is interpreted as being provable in the classical sequent calculus, then this follows from cut elimination. The cut elimination theorem says that the rule (Cut) can be avoided or removed from proofs in the sequent calculus. Since for all other rules, the formulas above the line are subformulas of the formulas below the line, only operations occurring in the sequent to be proved can occur in a cut-free proof.

However, my intended interpretation of “still be tautologies” is being provable in the natural deduction calculus. But let us not dive into the natural deduction calculus here, since this post will get long anyway. We still need to clarify what it means that we get classical logic. The basic idea is that logical expressions involving falsehood can be translated into expressions avoiding falsehood, which still mean essentially the same thing. But what exactly can be achieved by such a translation? Time for a step back.

### Boolean functions of arbitrary arity

A Boolean function is a function of the form $f : B^k \to B$, where $B = \{0, 1\}$ is a Boolean domain and $k$ is a non-negative integer called the arity of the function. Any Boolean function can be represented by a formula in classical logic. If we omit falsehood, a Boolean function with $f(1,\dots,1)=0$ certainly cannot be represented, for otherwise our goal to remove falsehood would have failed. The question is whether we can represent all Boolean functions with $f(1,\dots,1)=1$, and whether the complexity of this representation is still comparable to the complexity of the representation where falsehood is available.

A two step approach allows to convert a given representation to one where falsehood is avoided. The first step is to replace $A-B$ by $(A\to B)\to\bot$, $A\oplus B$ by $(A\equiv B)\to\bot$, $A|B$ by $(A\land B)\to\bot$, $A\overline{\lor}B$ by $(A\lor B)\to\bot$, and $\lnot A$ by $A\to\bot$. This step increases the size of the representation only by a small constant factor. The second step is to replace $\bot$ by the conjunction of all $k$ input variables. This step increases the size of the representation by a factor $k$ in the worst case. It works, since at least one of the input variables is $0$, if $f$ is evaluated at an argument different from $(1,\dots,1)$.

If one can define a new variable, then the factor $k$ can be avoided by defining $p:=\bigwedge_{i=1}^kA_i$. However, the underlying idea is have a logical constant $p$ together with axioms $p\to A_i$. This is equivalent to the single axiom $\bigwedge_{i=1}^k(p\to A_i)$ which simplifies to $p\to\bigwedge_{i=1}^k A_i$.

### Interpretation of logic without falsehood

When I initially wondered about omitting negation and falsehood from classical logic, I didn’t worry too much about “what it means that we get classical logic” and said

For a specific formula, falsehood gets replaced by the conjunction of all relevant logical expressions.

Noah Schweber disagreed with my opinion that “you can omit negation and falsehood, and still get essentially the same classical logic” and said

First, I would like to strongly disagree with the third sentence of your recent comment – just because (basically) the same proof system is complete for a restricted logical language when restricted appropriately, doesn’t mean that that restricted logic is in any way similar to what you started with.

So I replaced my harmless initial words by

One approach might be to translate $\lnot A$ as $A \to p$, where $p$ is a free propositional variable. It should be possible to show that this works, but if not then we just keep falsehood in the language.

However, these new words are problematic, since $\lnot\lnot A \to A$ is a classical tautology, but $((A\to p)\to p)\to A$ is not. My initial words were not ideal too, since the meaning of “the conjunction of all relevant logical expressions” remained unclear.

The intuition behind my initial words was that falsehood is just the bottom element of the partial order from the Boolean algebra. Removing the explicit symbol (name) for the bottom element from the language should be harmless, since the bottom element can still be described as the greatest lower bound of “some relevant elements”. (However, removing the symbol does affect the allowed homomorphisms, since the bottom element is no longer forced to be mapped to another bottom element.)

The intuition behind my new words was an unlucky mix up of ideas from Curry’s paradox and the idea to use a free variable as a substitute for an arbitrary symbol acting as a potentially context dependent replacement for falsehood. We will come back to those ideas in the section on axiomatic theories in predicate logic without falsehood.

What are the “relevant logical expressions” for the conjunction mentioned above? Are $A\lor B$, $A\to B$, or $\forall x\forall y\ x=y$ relevant logical expressions? The first two are not, since $(A\lor B)\land B = B =(A\to B)\land B$, i.e. the propositional variable $B$ itself is already sufficient. The third one is relevant, at least for predicate logic with equality. For propositional logic, it is sufficient to replace falsehood by the conjunction of all proposition variables occurring in the formula (and the assumptions of its proof). For predicate logic, it is sufficient to replace falsehood by the conjunction of the universally quantified predicate symbols, including the equality predicate as seen above (if present).

### Axiomatic theories not using negation or falsehood

If the axioms or axiom schemes of an axiomatic theory (in classical first order predicate logic) don’t use negation or falsehood or any of the other operations we had to remove together with falsehood, then we don’t need to explicitly remove falsehood.

Let us look at the theory of groups as one example. If we denote the group multiplication by $\circ$ and the unit by $e$, then the axioms are

• $\forall{} x \forall{} y \forall{} z\ (x\circ y)\circ z=x\circ (y\circ z)$
• $\forall x\ x\circ e = x$
• $\forall x\exists y\ x\circ y=e$

We see that neither falsehood nor negation occurs in those axioms. This is actually a bit cheated, since it would seem that implication also doesn’t occur. It actually does occurs in the axioms governing equality. Here are some axioms for equality:

• $\forall x\ x=x$
• $\forall x\forall y\ (x=y \to y=x)$
• $\forall x\forall y\forall z\ (x=y \to (y=z \to x=z))$

They only use implication, but something like the axiom schema

• $x=y \to (\phi(x,x)\to\phi(x,y))$

is still missing. Here $\phi(x,y)$ is any formula, and may contain more free variables in addition to $x$ and $y$. This doesn’t work for us, since this could also include formulas $\phi$ which use negation or falsehood! But at least for the theory of groups, the following two axioms (derived from instances of the axiom scheme) should be sufficient.

• $\forall x\forall y\forall z\ (x=y \to x\circ z=y\circ z)$
• $\forall x\forall y\forall z\ (x=y \to z\circ x=z\circ y)$

Here is a typical theorem of the theory of groups that should now be provable.

$\forall x\forall y\forall z\ (x\circ y = e \to (z\circ x = e \to y=z))$

Here is an informal proof: Since $x\circ y = e$ and $\forall x\forall y\ (x=y \to z\circ x=z\circ y)$, we have $z\circ(x\circ y) = z\circ e$. Since $z\circ(x\circ y)=(z\circ x)\circ y$ and $z\circ e = z$, we have $(z\circ x)\circ y = z$. Since $z\circ x = e$ and $\forall x\forall z\ (x=z \to x\circ y=z\circ y)$, we have $e\circ y = z$. If we could show $e\circ y = y$ (this was omitted in axioms given above), then the proof would be complete.

### Axiomatic theories in predicate logic without falsehood

Let us next look at Robinson arithmetic, which is essentially Peano arithmetic without the induction axiom schema.

1. $(Sx=0)\to\bot$
2. $(Sx=Sy)\to x=y$
3. $y=0\lor\exists x\ Sx=y$
4. $x+0 = x$
5. $x + Sy = S(x+y)$
6. $x\cdot 0 = 0$
7. $x\cdot Sy = (x\cdot y) + x$

Here are the additional axioms for equality.

• $x=y \to S(x)=S(y)$
• $x=y \to x+z=y+z$
• $x=y \to z+x=z+y$
• $x=y \to x\cdot z=y\cdot z$
• $x=y \to z\cdot x=z\cdot y$

Universal quantification has been omitted, but that should not worry us here. We see that the first axiom is the only one using falsehood. We could replace it by

1. $(Sx=0)\to\forall x\forall y\ x=y$

Or we could also introduce a constant $F$ and replace by the axioms

1. $F\to\forall x\forall y\ x=y$
2. $(Sx=0)\to F$

The second approach has the advantage that if we add the induction axiom schema to get Peano arithmetic, we can just replace falsehood by the constant $F$. The first approach has the advantage that is shows that falsehood was never really required for the induction axiom schema.

The second approach comes closer to my motivations to remove falsehood from the language. I wanted to avoid negation and falsehood, because if there is no (bottom) element connecting everything together, then the damage caused by the principle of explosion can be restricted to specific domains of discourse. I also had the idea to use more specific propositional constants instead of falsehood. For ZFC set theory, one could use one propositional constant $F_=$ for the axiom asserting the existence of the empty set, and another propositional constant $F_\in$ for the axiom asserting regularity. However, as soon as one adds the axioms $F_= \to \forall x\forall y\ x=y$ and $F_\in \to \forall x\forall y\ x\in y$, both constants will probably turn out to be equivalent.

### Conclusions

Such a long post, for such a trivial thing as removing falsehood from the language. At least it should be clear now that one can really remove falsehood if desired, and what it means that one doesn’t loose anything essential. But is there anything to be gained by removing falsehood? The initial idea to remove falsehood came from the rule $\begin{array}{c} A\to(B\lor C) \\ \hline (A\to B)\lor C\end{array}$ (suggested by the sequent calculus), which characterises classical logic by a nice algebraic property (not involving negation or falsehood). The motivation to finish this post came from the realisation that implication is the central part of logic. I just read this recent post by Peter Smith on the material conditional, and realised that one really gains something by removing falsehood: the counterintuitive material conditional $P\to Q = \lnot P \lor Q$ goes away, and implication becomes the first class citizen it should have always been.

The counterintuitive part didn’t really disappear. The material conditional decomposes into the two sequents $P\to Q \vdash (P\to F)\lor Q$ and $(P\to F)\lor Q, F\to Q\vdash P\to Q$. The proof sketch for those is $\begin{array}{c} P\to Q \\ \hline P\to(F\lor Q) \\ \hline (P\to F)\lor Q\end{array}$ and $\begin{array}{c}\begin{array}{c} (P\to F)\lor Q \\ \hline P\to(F\lor Q)\end{array} \quad \begin{array}{c} \ \\ F\to Q\end{array} \\ \hline P\to Q\end{array}$. The proof of the second doesn’t even use the special properties of classical logic, but this part is the counterintuitive one: Why should $P\to Q$ be true, if $P\to F$? Well, the intuitive everyday logic is more a sort of modal logic than a propositional or predicate logic.

Posted in logic | Tagged , | 3 Comments

## Logic without truth

Pi is wrong! But so what? It is neither new, nor complicated enough to count as real math! And suggestions that $2\pi i$ or $\pi/2$ might be even better show that it not clear-cut either.

I recently invested sufficient energy into some logical questions to make real progress. But while telling friends and fellow logicians about it, I realized how irrelevant all my results and conclusions will be. I will have to publish them in an appropriate journal nevertheless, since they continue David Ellerman’s work on partition logic. Not publishing them would be dismissive towards the value of David Ellerman’s work, and he really invested lots of effort into it, and believes in its value. I won’t talk about those results here, since I don’t know how it would impact my ability to publish them.

Let’s have some fun instead, stay extremely close to classical logic, and still demonstrate a logic without truth. And let’s get back to Gerhard Gentzen.

### Partial truths

I am fond of partial functions. For a partial function $p:X\to Y$, we have

• $p^{-1}(A\cap B)=p^{-1}(A)\cap p^{-1}(B)$
• $p^{-1}(A\cup B)=p^{-1}(A)\cup p^{-1}(B)$
• $p^{-1}(A {}\setminus{} B)=p^{-1}(A) {}\setminus{} p^{-1}(B)$
• $p^{-1}(A \Delta B)=p^{-1}(A) \Delta p^{-1}(B)$ where $A \Delta B := (A {}\setminus{} B) \cup (B {}\setminus{} A)$

But $p^{-1}(Y)=X$ is only true if $p$ is a total function. Especially $p^{-1}(A^c)=p^{-1}(A)^c$ is only true (even for specific $A$) if $p$ is total, since otherwise
$p^{-1}(Y)=p^{-1}(A\cup A^c)=p^{-1}(A)\cup p^{-1}(A^c) \quad = \quad p^{-1}(A)\cup p^{-1}(A)^c=X$
Let’s also prove one of the other claims:
$x {}\in{} p^{-1}(A {}\setminus{} B) \Leftrightarrow p(x) {}\in{} A {}\setminus{} B \Leftrightarrow p(x) {}\in{} A \land p(x) {}\notin{} B \Leftrightarrow x {}\in{} p^{-1}(A) {}\setminus{} p^{-1}(B)$

Note that $A {}\setminus{} B = A \triangle (A \cap B)$. So the preserved operations are “and”, “or”, and “xor” if interpreted from a logical perspective. It would be nice if implication were preserved too. This seems hopeless, since $A \to A$ is always true, and truth is not preserved. But we do have $A \subseteq B \Rightarrow p^{-1}(A) \subseteq p^{-1}(B)$, which means that the external implication given by the order is preserved. So we should be able to turn this into a logic, where internal truth is not preserved under context morphisms.

### Gerhard Gentzen’s sequent calculus

The sequent calculus is a proof calculus with significant practical and theoretical advantages compared to more obvious proof calculus systems. It works with sequents $A_{1},\ldots,A_{r}\vdash B_{1},\ldots,B_{s}$. The propositions $A_i$ (and $B_j$) could be logical formulas like $S(x)=S(y) \to x = y$ (the 4-th Peano axiom). They can also be interpreted as subsets of some universe set $X$, which is sufficient for understanding the basics of the sequent calculus. Then the sequent itself is interpreted as $[A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]$.

Left structural rules Right structural rules
$\begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(WL)$ $\begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(WR)$
$\begin{array}{c} \Gamma,A,A\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(CL)$ $\begin{array}{c} \Gamma\vdash A,A,\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(CR)$
$\begin{array}{c} \Gamma_{1},A,B,\Gamma_{2}\vdash\Delta\\ \hline \Gamma_{1},B,A,\Gamma_{2}\vdash\Delta \end{array}(PL)$ $\begin{array}{c} \Gamma\vdash\Delta_{1},A,B,\Delta_{2}\\ \hline \Gamma\vdash\Delta_{1},B,A,\Delta_{2} \end{array}(PR)$

Here $\Gamma,\Delta, ...$ stand for arbitrary finite sequences of propositions. The structural rules may be relatively boring. The following global rules are slightly more interesting

Axiom Cut
$\begin{array}{c} \ \\ \hline A\vdash A \end{array}(I)$ $\begin{array}{c} \Gamma\vdash\Delta,A\quad A,\Sigma\vdash\Pi\\ \hline \Gamma,\Sigma\vdash\Delta,\Pi \end{array}(Cut)$

None of the rules up to now has used any logical constant or connective. They can be verified directly for the subset interpretation. The following logical rules can only be verified after the (intended) interpretation of the logical connectives has been fixed.

Left logical rules Right logical rules
$\begin{array}{c} \Gamma,A,B\vdash\Delta\\ \hline \Gamma,A\land B\vdash\Delta \end{array}(\land L)$ $\begin{array}{c} \Gamma\vdash A,B,\Delta\\ \hline \Gamma\vdash A\lor B,\Delta \end{array}(\lor R)$
$\begin{array}{c} \ \\ \hline \bot\vdash\Delta \end{array}(\bot L)$ $\begin{array}{c} \ \\ \hline \Gamma\vdash\top \end{array}(\top R)$
$\begin{array}{c} \Gamma,A\vdash\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\lor B\vdash\Delta,\Pi \end{array}(\lor L)$ $\begin{array}{c} \Gamma\vdash A,\Delta\quad \Sigma\vdash B,\Pi\\ \hline \Gamma,\Sigma\vdash A\land B,\Delta,\Pi \end{array}(\land R)$
$\begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\to B\vdash\Delta,\Pi \end{array}(\to L)$ $\begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma\vdash A\to B,\Delta \end{array}(\to R)$

One possible interpretation for these connectives in terms of subsets would be $[\bot]:=\emptyset$, $[\top]:=X$, $[A\land B]:=[A]\cap [B]$, $[A\lor B]:=[A]\cup [B]$, and $[A\to B]:=[A]^c\cup [B]$.

But it may be more instructive to see an interpretation where one of the classical logical rules is violated. So let us use $[A\to B]:=\mathrm{int}([A]^c\cup [B])$ instead, where $\mathrm{int}$ is the interior operator of some topological space. The propositions $A_i$ (and $B_j$) are interpreted as open subsets in this case. The rule $\begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma\vdash A\to B,\Delta \end{array}(\to R)$ is violated now, and has to be replaced by the rule $\begin{array}{c} \Gamma,A\vdash B\\ \hline \Gamma\vdash A\to B \end{array}(\to R_J)$. This gives us the intuitionistic sequent calculus, which exactly characterizes the valid conclusions of intuitionistic logic.

To see that $(\to R)$ is violated, let $\Gamma=\top$, $A$ correspond to $[A]=(0,\infty)$, $B=\bot$, and $\Delta = A$. Above the line we have $\mathbb R \cap (0,\infty) \subseteq \emptyset \cup (0,\infty)$, which is true. Below the line we have $\mathbb R \subseteq \mathrm{int}((-\infty,0])\cup (0,\infty)$, which is false.

### An evil twin of sequent calculus

Note that implication satisfies $C\land A \vdash B \Leftrightarrow C \vdash (A\to B)$ or rather $[C]\cap [A] \subseteq [B] \Leftrightarrow [C] \subseteq [A\to B]$. Let us replace implication by minus. Note that minus satisfies $[A] \subseteq [B]\cup [C] \Leftrightarrow [A-B] \subseteq [C]$ with $[A-B]:=[A]{}\setminus{}[B]$. Then we get the following two rules instead of $(\to L)$ and $(\to R)$.

 $\begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma,A- B\vdash \Delta \end{array}(- L)$ $\begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash \Pi\\ \hline \Gamma,\Sigma\vdash A-B,\Delta,\Pi \end{array}(- R)$

Of course, we also remove $\to$ and $\top$ from the language together with the rule $(\top R)$. This sequent calculus is still as sound and complete as the original sequent calculus. But we no longer reason about implication, but only about minus. Some sort of implication is still present in $\vdash$, but it is no longer mirrored internal in the language of the logic itself. So this is our logic without truth.

I don’t really know (or understand) whether this sort of context morphism has any sort of relevance, and whether that logic without truth occurs anywhere in the real world. Is there any relation to the fact that it is easier to deny the relevance or truth of a given conclusion than to prove that it is important and true? What I like about that logic is the asymmetry between implication and falsehood, because I wanted to find naturally occurring asymmetries in mathematical hierarchies and logic. Even for the results that I do want to publish, I have the same problem that I don’t really understand the relevance of the corresponding context morphisms, whether there even should be context morphisms, and whether my proposed context morphisms are the correct ones.

### Conclusions?

That post initially also contained a logic without falsehood, or rather a logic where falsehood is not used. But it started to get long, and this post is already long enough. Not sure whether this was really a good idea, since the explanation of the sequence calculus was also intended to better understand how such a logic with a reduced set of logical constants and connectives still maintains its main features. Maybe I will manage to create another blog post from the removed material. Or maybe nobody including myself cares anyway, as already indicated at the beginning. Or maybe I should better use my time to finish the paper about the results I wrote about at the beginning, and submit them to…

Posted in category theory, logic, partial functions | Tagged , , | 6 Comments

## Learning category theory: a necessary evil?

The end of my last blog post (about isomorphism testing of reversible deterministic finite automata) explained how category theory gave me the idea that the simplified variant of my question about permutation group isomorphism should be easy to solve:

The idea to consider the case where the group is fixed came from reading an introduction to category theory, because those morphisms of left $M$-sets felt like cheating (or oversimplifying) to me. So I guessed that this case should be easy to solve, if I would only…

I hinted that this felt like cheating (or oversimplifying) to me, but I realized three month later that my original question could be reduced to the simplified variant by a brute force technique exploiting the fact that minimal generating sets of groups are very small. So it turned out that category theory was actually helpful and not cheating at all.

Why does a mathematician interested in logic, (semi-)lattice theory and universal algebra reads an introduction to category theory? Because it is a good readable (medium level) German book on a mathematical subject I’m currently interested in.

And it is written by Martin Brandenburg, who always helped people like me, when they got stuck on questions related to category theory. If the explanations and examples in the book are as good as his contributions on MathOverflow and the MathStack Exchange site, then I would be fluent in both theory and practice after finishing that book.

But why am I interested in category theory?

• There are all those forgetful functors with left adjoints in universal algebra. And when doing semigroup theory, switching freely between monoids and semigroups by adjoining an identity is very common. Seems like a good idea to understand this.
• I like inverse semigroups, which are closely related to partial bijections, and hence to partial functions. Mathematicians don’t like partial functions, but category theory at least admits that each function has a domain and a codomain. Some existing work on partial functions has been phrased in the language of category theory.
• I have read basic introductions to category theory, but kept forgetting about adjoint functors. Then I worked two times thru a reasonable introduction to the adjoint functor theorem (by Chris Henderson), and hopefully settled this problem.
• I would like to better understand the Ext and Tor functors in group cohomology, and diagram chasing like the snake lemma.
• I like non-classical logic, including intuitionistic logic, linear logic and partition logic. My understanding of non-classical logic is mostly based on lattice theory. Sadly, lattice theory is a very confusing name, and it has only few followers. So if I translate my ideas about non-classical logic and lattice theory into the language of category theory, then I should be able to reach a more relevant audience.
• I liked assembly language when I started to program, and set theory feels similar to assembly language. Category theory is one alternative to get around all the engrained prejudices about logic and model theory embedded into mainstream ZFC set theory.

Maybe I could find more reasons, but let’s talk about the content now

• Category theorists seem to have a strange relation to calculations. The introduction to the adjoint functor theorem by Chris Henderson often has imbalanced parenthesis in the formulas (my printout has those marked on page 6, 7, 11, and 12), indicating that the formulas have at least some errors. Sometimes (like the $=\phi_{C,D}(\phi_{C,D}^{-1}(f)$ near the bottom of page 7) I couldn’t figure out what the correct formula was supposed to be.
• The book by Martin Brandenburg has fewer of those unnecessary errors, but it is still annoying if $f\in \text{Hom}(A,C)$ is written instead of $f\in \text{Hom}(A,B)$ when explaining the Yoneda construction on page 109, which is difficult for me even without such errors.
• I actually had to return the book while reading chapter 5 on the Yoneda construction a second time. I bought my own copy of the book in the meantime, but haven’t resumed reading yet.
• Category theorists are similar to physicists in not noticing the places where the computational complexity can explode. So they claim only the universal property would be important, and the concrete representation an unimportant detail. But identity testing for the free modular lattice on 4 generators is undecidable, it is NP-complete for the free Boolean algebra, and trivial for free abelian groups. The universal algebra people are much more aware of such niceties, something which can easily get lost when doing universal algebra through the lens of category theory.

My overall conclusion is that I learned quite a bit from working through the category theory book, but that it were the concrete examples from which I benefitted most. But I still would not really recommend learning category theory to somebody who doesn’t know yet why he wants to learn it.

## A canonical labeling technique by Brendan McKay and isomorphism testing of deterministic finite automata

A deterministic finite automaton (DFA) $M$ is a 5-tuple, $(Q, \Sigma, \delta, q_0, F)$, consisting of

• a finite set of states $Q$
• a finite set of input symbols $\Sigma$
• a (partial) transition function $\delta:Q\times \Sigma \to Q$
• an initial state $q_0\in Q$
• a set of accept states $F \subset Q$

An isomorphism between two DFAs $M = (Q, \Sigma, \delta, q_0, F)$ and $M' = (Q', \Sigma, \delta', q'_0, F')$ is a bijection $p:Q\to Q'$ such that $\delta'(\cdot,s) = p \circ \delta(\cdot,s) \circ p^{-1} \forall s\in\Sigma$, $q'_0 = p(q_0)$ and $F' = p(F)$.

### Isomorphism of DFA is GI complete

Without further restrictions, the problem of testing whether two DFAs $M$ and $M'$ are isomorphic is as difficult as testing whether two graphs $G$ and $G'$ are isomorphic, i.e. the problem is GI complete. A simple construction shows that the problem of testing whether two digraphs $G$ and $G'$ are isomorphic reduces to DFA isomorphism testing.

For a digraph $G=(V,E)$, construct the DFA $M_G=(Q_G, \Sigma, \delta_G, q_0, F)$ with $Q_G:=V \sqcup E \sqcup \{*\}$, $\Sigma := \{0,1\}$, $q_0:=*$, $F:=\{ \}$ and $\delta_G$ defined for $e\in E$ by $\delta_G(e,0)=\text{tail}(e)$, $\delta_G(e,1)=\text{head}(e)$ and for $v\in V\sqcup \{*\}$ by $\delta_G(v,0)=\delta_G(v,1)=*$.

Obviously, the digraphs $G$ and $G'$ are isomorphic iff the DFAs $M_G$ and $M_{G'}$ are isomorphic.

### A canonical labeling technique

A reasonable restriction is to require that every state in $Q$ is reachable from $q_0$. Let $\Sigma:=\{s_1,\ldots,s_d\}$. Then a unique labeling $\ell(q_0)$ of $Q$ can be produced by a breadth first search of $M$ starting at $q_0$ as follows: You have a queue (first-in, first-out store), initially containing only $q_0$. Repeatedly do this: remove the state at the head of the queue, say $x$, then push into the queue (at the tail) those of $\delta(x,s_1),\ldots,\delta(x,s_d)$ (in that order) which have never previously been put into the queue. Since every state is reachable from $q_0$, every state is put into the queue eventually. Stop when that has happened and define $\ell(q_0)$ to be the order in which states were put into the queue. Since this labeling is independent of any ordering or labeling of $Q$, it fixes the only bijection between $Q$ and $Q'$ that can possibly lead to an isomorphism between $M$ and $M'$. It is easy then to check whether they are actually isomorphic.

Attributing that algorithm to Brendan McKay would be misleading, because it is so obvious. For example J.-E. Pin described the same algorithm (using depth first search instead of breadth first search) earlier, as an answer to a question about finding an isomorphism between finite automata.

### Brendan McKay’s canonical labeling technique

The context in which Brendan McKay described his canonical labeling technique was actually slightly more complicated, because there was no distinguished state $q_0$, and the state space of the automaton wasn’t required to be weakly connected. But instead, $\delta(\cdot,s)$ was injective for all $s\in S$. If we drop the requirement that $\delta(\cdot,s)$ is a total function and allow it to be partial, then we end up exactly with the reversible deterministic finite automata, described earlier on this blog. Brendan McKay’s technique generalizes effortlessly to this setting, as does the original context (where group actions are replaced by inverse semigroup actions).

Above are some colored digraphs corresponding to reversible deterministic finite automata (the initial state $q_0$ and the final states $F$ are ignored). We continue to describe the canonical labeling technique in terms of deterministic finite automata. For each state $q\in Q$, we can produce a unique labeling $\ell(q)$ (of the weakly connected component containing $q$) by a breadth first search of $M$ starting at $q$ as above, expect when we remove the state $x$ from the head of the queue, (instead of $\delta(x,s_1),\ldots,\delta(x,s_d)$) we have to push into the queue (at the tail) those of $\delta(x,s_1),\delta^{-1}(x,s_1),\ldots,\delta(x,s_d),\delta^{-1}(x,s_d)$ (in that order) which are defined and have never previously been put into the queue.

But how can we create a canonical labeling of $M$ from the unique labelings $\ell(q)$? Just take (one of) the lexicographically smallest labelings of each block, and sort the block lexicographically based on those labelings. The lexicographical order in this context is defined by encoding a labeled block in some reasonable (unique) way as a string, and comparing those strings lexicographically. It’s a good idea to modify this lexicographical order to first compare the lengths of the strings, but this is already an optimization, and there are more possible optimizations, as described in Brendan’s answer. These optimizations are the context of the following remark at the end of his answer:

The graph is actually a deterministic finite automaton (at most one edge of each colour leaves each vertex). There could be faster isomorphism algorithms out there for DFAs, though I’m dubious that anything would work faster in practice than a well-tuned implementation of the method I described.

This remark was actually the main motivation of this blog post, i.e. to investigate the relation of Brendan’s technique to isomorphism testing for DFAs. If there were really faster isomorphism algorithms out there for DFAs, then either they would not be applicable to the problem which Brendan had solved there (if they rely on reachability from $q_0$), or they would also solve GI itself in polynomial time (which is highly unlikely).

### Conclusions?

The truth is I have been yak shaving again, even so I explicitly told vzn that I didn’t want to get too involved with GI. How did it happen? Fahad Mortuza (aka Jim) asked me for help refuting his attempts to solve GI in quasipolynomial time. (This is completely unrelated to the recent breakthrough by László Babai, except maybe that he triggered some people to spend time on GI again.) While working with him, I realized that permutation group isomorphism might be more difficult than group isomorphism, and asked some questions to get a better feeling for the relation of permutation group isomorphism to GI. The idea to consider the case where the group is fixed came from reading an introduction to category theory, because those morphisms of left $M$-sets felt like cheating (or oversimplifying) to me. So I guessed that this case should be easy to solve, if I would only understand how Brendan McKay handles automorphisms of graphs efficiently. Then Brendan McKay himself explained his technique (for this case) and I discovered the interesting connection to inverse semigroups and finite automata, so I decided I had to write a blog post on this. Now I have written this post, just to avoid having another item on my ToDo list which never gets done.

## On Zeros of a Polynomial in a Finite Grid: the Alon-Furedi bound

This paper is a great achievement. Not just that it formulates and proves a very appropriate common generalization of Alon-Furedi, Schwartz-Zippel and other theorems, it is well organized, easy to read, and very inspiring.

My joint paper with Aditya Potukuchi, Pete L. Clark and John R. Schmitt is now up on arXiv: arXiv:1508.06020.

This work started a few months back when I emailed Pete and John, pointing out an easy generalization of Chevalley-Warning theorem using something known as the punctured combinatorial nullstellensatz, after reading their paper on Warning’s second theorem: arXiv:1404.7793. They got pretty excited about it and we started discussing some related things. Finally, it’s not the generalization of Chevalley-Warning that this paper is about but the theorem of Alon-Füredi itself, which is the main tool they used in their paper to generalize Warning’s second theorem. In our discussions we found several unexplored connections between this elementary result on polynomials and results from different areas of maths. My friend Aditya joined us in between with his amazingly simple proof of Alon-Füredi which, along with the annoying realization that a result of DeMillo-Lipton-Zippel doesn’t…

View original post 124 more words