Theory and practice of signed-digit representations

The integers are sometimes formally constructed as the equivalence classes of ordered pairs of natural numbers (a,b). The equivalence relation \sim is defined via

(a,b) \sim (c,d) iff a+d=b+c

so that (a,b) gets interpreted as a-b. This motivates the signed-digit representation. To avoid storing two numbers instead of one, the subtraction gets performed on the level of digits. This leads to signed digits. The overhead can be kept small by using unsigned 7-bit or 31-bit digits as starting point (instead of decimal or binary digits).

The overhead may be small, but the signed-digit representation remains redundant and non-unique. Comparing two integers in this representation is a non-trivial operation. But addition (and subtraction) becomes easy from a hardware perspective, since carry-bits no longer propagate. Multiplication also gets easier, since a part of multiplication consists of adding together intermediary results.

This post started as a subsection of another yet to be finished post. That subsection tried to use the trivial redundancy of representations of the computable real numbers as motivation to use similar redundant representations also for natural numbers. However, the constant references to simpler precursor ideas, to the well known practical applications, and to further complications and generalizations required in practical applications made that subsection grow totally out of proportion. It is still awfully long, even as a standalone post. The subsections try to cut it into more digestible pieces, but all the references to external material remain. The reader is not expected to read much of the linked material. It only serves as proof that this material has practical applications and is well known. This post may be seen as describing one specific alternative number format in detail, for which I dreamt about sufficient hardware support in a previous post.

Fast parallel arithmetic

In the 6502 Assembly language (C64, Apple II, …), addition is performed by the ADC instruction, which is the mnemonic for Add Memory to Accumulator with Carry. It adds the value in memory plus the carry-bit to the value in the accumulator, stores the result in the accumulator, and sets the carry-bit according to whether the addition overflowed or not. This operation could be represented as A + M + C -> A, or more precisely as
(A + M + C) % 256 -> A
(A + M + C) / 256 -> C
Since A and M can be at most 255 and C at most 1, the resulting C will again be either 0 or 1. This allows to add two arbitrarily long numbers. First one sets the carry bit to zero, then adds the lowest bytes, and then successively the higher bytes. Nice, but not very parallel.

Since the 6502 Assembly language has no instruction for multiplication, calling it fast would be an exaggeration. Let us instead imagine a machine which can perform 8 add with carry additions in parallel, where the carry-bits are stored in an additional byte. Since the meaning of the carry-bits before and after the parallel addition are different, we probably want to left-shift the carry-bits after the addition. Let us imagine that this machine can also perform 4 multiply with carry multiplications in parallel. How should it handle the carry-bits? Since (255+1)*(255+1)-1 can be represented in two bytes, it seems reasonable to treat them as a simple addition of 1 or 0 on input. The carry-bit on output might be interpreted as addition of 256 or 0, which would be consistent with the interpretation on output after addition.

Outputting a second carry-bit to be interpreted as addition of 256^2 or 0 would also be consistent, even so it is not required here. (Since (255+16+1)*(255+16+1)-16^3-256-16-1 = 16^4+16^3-16-1 cannot be represented in two bytes, allowing a second carry-bit may be useful to simplify recursive implementation of multiplication.) It is slightly inconsistent that the add with carry instruction only consumes one carry-bit per addition. After all, the idea is to represent intermediate results as M+C, in order to enable fast parallel addition of intermediate results. But consuming two carry-bits would be problematic since the result of (255+1)+(255+1) is not nicely representable.

What we just described is not new (of course). It is basically just the Carry Save Adder technique used to parallelize multiplication. However, since arithmetic with very long numbers is often used for modular arithmetic, this is not yet the end of the story. The Drawbacks section on wikipedia say that it has to be combined with the Montgomery modular multiplication technique, to be useful for public-key cryptography.

It is cool that our imagined computer can multiply 4 bytes and add 8 bytes in parallel, but how would we actually multiply two 4-bytes long numbers? Each byte of the first number must be multiplied by each byte of the second number, those results must be arranged appropriately and added together. The required 16 multiplication can be done by 4 subsequent parallel multiply with carry instructions, where the second 4-bytes (and the corresponding carry bits) are rotated in between. We have to add 1, 3, 5, 7, 7, 5, 3, 1 bytes in the different places of the result, which requires at least 24 additions. There are at most 6 additions in the last round, so we need at least 4 rounds of (8 bytes) parallel additions. And we might need even more rounds, since we get slightly too many carry-bits.

Faster multiplication and signed integers

Subtraction and signed integers are useful even for multiplication of positive integers, since subtraction is a basic operation required by fast multiplication algorithms like Karatsuba multiplication or its generalization, Toom-Cook multiplication. So what we describe next is not new either. See for example section “3.2 Multiplication” of The Complexity of Boolean Functions by Ingo Wegener, 1987, where the signed-digit representation is used for fast parallel multiplication algorithms.

The 6502 Assembly language also has an SBC instruction, which is the mnemonic for Subtract Memory from Accumulator with Borrow. This operation could be represented as A – M – ~C -> A, apparently the 6502 performs an ADC instruction with the bitwise-not of M. It makes sense, as this is the cheapest way to reduce subtraction to addition. For us, a SBB instruction where ~C got replaced by B fits better into our interpretation scheme
(A – M – B) % 256 -> A
(A – M – B) / 256 -> -B
The integer division here truncates towards -\infty, we could also write
(A – M – B + 256) % 256 -> A
(A – M – B + 256) / 256 -> 1-B
We can read A – M – B both as A – (M+B) and as (A-B) – M. The second reading is preferable, if we imagine a machine which can perform 8 SBB instructions in parallel, where the borrow-bits are stored in an additional byte. Again, we should left-shift the borrow-bits after the subtraction, since their meaning before and after the parallel subtraction is different. The parallel SBB instruction (+ left-shift) returned a signed integer in the a-b signed-digit representation, which we mentioned in the introduction.

Even so bit-twiddling may be nice, it somehow makes the trivial idea behind representing signed integers as a-b using signed-digits feel unnecessarily complicated. For D >= 3, we may just assume that we have a digit between -D+1 and D-1, after addition or subtraction we will have a result between -2*D+2 and 2*D-2, and if we stay in the range -D+2 and D-2, then we can just emit an overflow of -D, 0, or D, and the next digit can consume that overflow without overflowing itself.

More bit-twiddling with borrow-bits

Can we define a straightforward (parallel) ADB (add with borrow) instruction, which adds an unsigned integer to a signed integer, and returns a signed integer? We get
(A-B + M + 1) % 256 -> A
(A-B + M + 1) / 256 -> 1-B
if we just use ADC and replace C by (1-B). This represents the result as -1 + A + (1-B)*256. But how is this compatible with our signed-digit representation? When we left-shift the borrow-bits after the addition, we fill the rightmost place with 1 (instead of 0). In the other places, the -1 cancels with the 1 from (1-B)*256 of the previous byte. The leftmost place is also interesting: If B is set, then we can ignore it. But if it is not set, then A needs an additional byte set to 1. So signed-digit representation works with nearly no overhead.

We basically just replaced C by (1-B). But how could this trivial change enable us to represent signed integers? A-B = A – (1-C) = A+C – 1, so A and C now describe a digit in -1..255, while before it was a digit in 0..256. Even so we prefer the borrow-bits variant here, this way of looking at it (offset binary) can sometimes be helpful, for example when trying to interpret the SBC instruction of the 6502 Assembly language.

In my previous thoughts about (big) integers, there was the question whether the two’s complement representation is well suited for a variable width integer data type. One argument against it was that the addition or subtraction of a small number from a big number would take an unnecessarily long time. In the borrow-bits variant, for addition all bytes where the borrow-bit is not set must be touched, and for subtraction all bytes where the borrow-bit is set must be touched.

Before we come to parallel multiplication, what about multiplication by -1 and by a power of two? -1*(A-B)=B-A, so SBB of B (as bytes without borrow) and A gives the correct result. It is the sum of the bitwise-not of A, the byte-wise B and a carry-bit set to 1. The resulting (to be left-shifted) borrow-bit B is given by (1-C). The result of multiplication by a power of two is given by a left-shift followed by SBB. The same holds for multiplication by minus a power of two. In general, two successive multiplications by -1 will not yield the identity on the binary level, and identities about multiplication by powers of two (like associativity) will not hold on the binary level.

How to perform 4 multiply with borrow multiplications in parallel? If the borrow-bits are interpreted as simple addition of -1 or 0 on input, then the smallest result is -255, and the biggest 255*255, so the borrow-bit on output might be interpreted as addition of -256 or 0. However, what about the problem that addition cannot consume two borrow-bits? ADC outputs a digit in 0..2*255+1, SBB outputs a digit in -256..255, and ADB outputs a digit in -1..2*255. If we let ADB instead output a digit in 0..2*255-1 and an overflow of -256, 0, or 256, then the next digit can consume that overflow without overflowing itself. This extends the output range of ADB to -256…3*255, so we can now consume two borrows-bits of -1 and three bytes. (Basically the same as if we had just performed two subsequent ADB instructions, but the borrow propagation behaves nicer.)

The output range can also be extended one-sided (by 255). Extending ADB to the negative side or extending SBB to the positive side both yield a range -256..2*255, which can consume two borrow-bits of -1 and two bytes (or more precisely two positive bytes, one negative byte, and one borrow-bit of -1). ADC can be extended to the positive side to a range 0..3*255+1, so that it can consume two bytes and two carry-bits of +1.

Redundant representations of the computable real numbers

The idea of using a redundant representation for computable real numbers somehow feels trivial. There is no way to get a completely non-redundant representation anyway. The non-trivial idea is to exploit the redundancy to improve the computational properties of the computable real numbers. This idea is not new (of course, I keep repeating myself), and I tried my luck with it in an email 17 Dec 2018 TK: A signed-digit representation solves some issues to Norman Wildberger in response to the issues he raised in “Difficulties with real numbers as infinite decimals (II) | Real numbers + limits Math Foundations 92” on slide 8 at 31:24 before claiming: No such theory exists (2012)!! There are redundant representations of the computable real numbers, for which the operations of addition, subtraction and multiplication are computable. At the beginning of Problems with limits and Cauchy sequences | Real numbers and limits Math Foundations 94 on slide 1, we read

Attempts at ‘defining real numbers’:
– as infinite decimals
– as Cauchy sequences of rationals
– as Dedekind cuts
– as continued fractions
– others (Axiomatics, …)
None of these work!

In subsection “2.2. Non-termination” of Luther Tychonievich’s post on Continued Fractions, it is explained why computability (of addition and multiplication) fails for a representation by classical continued fractions. But they would be computable, if signed integers (instead of natural numbers) were allowed in the continued fraction expansion. One still needs to keep some restrictions on the numbers in the expansion, but it can be done, see section “3.3 Redundant Euclidean continued fractions” of Exact Real Com­puter Arithmetic with Con­tin­ued Frac­tions by Jean Vuillem­in, 1987. The signed-digit representation is also described there, in section “3.1 Finite representations of infinite sequences of numbers” at the bottom of page 13.

An infinite signed-digit representation of the computable real numbers

Signed-digit representations are not limited to integers. We will use them as representation of computable real numbers. The apparent drawback that the signed-digit representation of a number is not unique becomes a feature when it is used to represent computable real numbers.

We could try to define a computable real number as a Turing machine outputting the digits of its infinite binary extension. An objection is that addition of two such computable real numbers would not be computable. But what if we define it instead as the difference of two non-negative series of digits. It turns out that addition, subtraction and multiplication then become computable. (The reason why a signed-digit representation helps there is because it can eliminate chains of dependent carries.) However, testing whether such a number is zero turns out to be undecidable. This is fine, since

… equality testing for computable real numbers is undecidable. The idea here is to take some Turing machine for which we would like to decide whether it halts or not, and write out the real number 0.000…000… with an additional zero for every step of the machine, and output one final 1 when (and only if) the machine finally halts. Deciding whether this real number is equal to zero is equivalent to solving the halting problem for the given Turing machine.

Even so equality testing is undecidable, it would still be nice to have a representation for which addition and multiplication (and many other operations) are computable. And the signed-digit representation achieves exactly this, more or less. Division is still not computable, since it has a discontinuity at zero, and the signed-digit representation cannot easily avoid that issue. Using a representation via (a-b)/(c-d) can work around that issue, at the cost of also allowing 1/0 and 0/0. Jean Vuillem­in tries to explain why this is important, but at least initially his definitions just confused me.

Conclusions?

Even so the above text is awfully long and trivial, at least it is a proper blog post again. Maybe I should have talked more about the less trivial stuff, but the trivial stuff already filled the post. And this bit-twiddling is a real part of my thinking and acting, at least in my distant past. And I hope that having finally finished this post will allow me to also finish the other to be finished post soon, the one whose subsection grew totally out of proportion and turned into this post.

Advertisements
Posted in Uncategorized | Leave a comment

A list of books for understanding the non-relativistic QM — Ajit R. Jadhav’s Weblog

I really like that this post points out that QM is vast, and that it takes a prolonged, sustained effort to learn it. I also like that it explicitly recommends chemistry books. My own first exposure to the photoelectric effect (before university and before my physics teacher at school had even mentioned quantum) was from an introductory chemistry book by Linus Pauling.

TL;DR: NFY (Not for you). In this post, I will list those books which have been actually helpful to me during my self-studies of QM. But before coming to the list, let me first note down a few points which would be important for engineers who wish to study QM on their own. After all, […]

via A list of books for understanding the non-relativistic QM — Ajit R. Jadhav’s Weblog

Quote | Posted on by | 1 Comment

I’m not a physicist

Background

At the end of 2016, I decided to focus on working through an introductory textbook in quantum mechanics, instead of trying to make progress on my paper(s) to be published. I finished that textbook, which taught me things like Clebsch-Gordan coefficients, the Aharanov-Bohm effect, perturbation theory, Fermi’s golden rule, path integrals, and the Dirac equation. Another physics book that I worked through in 2017 taught me about thermodynamics and its importance for chemistry and solid state physics.

I spend much time with other physics textbooks in 2017, but didn’t read them cover to cover. In general, I spent much time in 2017 on physics, and I was happy about that. My day job (in semiconductor manufacturing) is concerned with physics to a large part. I previously had a bad conscience, because I was spending so little time on physics, or at least I was making so little progress.

But spending so much time on physics meant that little time was left for this blog. And I submitted (with coauthors) an abstract for a conference, which got accepted as oral presentation. The conference was end of February 2018, and I spend the last two months before working day and night to still finish the paper (and presentation) in time. But even after the conference, I continued to spend most of my time on physics. So a post about physics seems like the only reasonable way to get any blog post done at all.

My love-hate relationship with physics

When I had the choice for my first job between one closely related to my thesis and one closely related to physics, I took the one related to physics. In my experience, establishing friendly relationships with physicists was easy. It did work out as expected, but I was surprised how much physics ended up on my plate. But it was fine for me, those were things like electrodynamics and optics, which were easy for me. When quantum mechanical tasks started to land on my plate, I protested that I didn’t manage to finish that course at university. But after switching jobs (Feb 2013), I sort of had to accept that the quantum mechanical tasks were part of my responsibilities.

On the one hand, this was a nice excuse for me to try to understand that part of physics which I was once unable to understand. On the other hand, the question “Why me? There are real physicists that already understood this stuff while at university!” hit me with all its practical implications. I had to spend time to read books and articles about practical and theoretical aspects of my specific tasks. And in parallel, I also had to fill-in the gaps in my understanding of the basics. But I never fully devoted my time to physics, at least not before 2017.

Given that it is my job, am I even allowed to admit that I still fill-in the basics? Or am I too hard trying to pretend being a physicist, for whatever reasons? I did wonder whether I should do a master in physics, given that it is my job. But what for, just to prove that I am able to fulfil some formal criteria for being a physicist? And I doubt that I would become a physicist in my heart, so it could turn out to be very hard to finish a master degree. But I still like to discuss physics, and I would like to be able to judge certain physical fashions.

The code of silence

What could I have said about the Lightwave Research Laboratory at Columbia University in New York? Nothing, it would have just been name dropping. What could I have said about Information ist Energie by Lienhard Pagel? Something, it might have given the impression of name dropping, and it is the reason for this silent section:

This book uses QM in normal handwaving mode. By information, the author means his definition of dynamic information, which is something like information(flow) per time. Higher energy is normally associated with shorter times. What is good about this book is that it really discusses the time dynamics, which is otherwise often neglected.

What could I have said about quantum field theory? I could have talked about my motivations to learn more about it. But most of those were pointless, because I expect silence, no matter how much I have learned. But Sean Carroll gave me a good reason to learn more about QFT with his lecture Extracting the Universe from the Wave Function. I started with Quantum Field Theory for the Gifted Amateur and the many comments in Quantum Field Theory I, Lecture Notes (HS14+) and Quantum Field Theory II, Lecture Notes (FS17). However, then I read An Interpretive Introduction to Quantum Field Theory cover to cover. It was easy to read, with a good mix of technical details, explanations, interpretations, and philosophical clarification.

… much of the interpretive work Teller undertakes is to understand the relationship and possible differences between quantum field-theory — i.e., QFT as quantization of classical fields — and quantum-field theory — i.e., a field theory of ‘quanta’ which lack radical individuation, or as Teller says, “primitive thisness.”

Not sure when I will continue to work through the more technical material. At the moment, I read How is Quantum Field Theory Possible?, just to see how far I will come, before it gets too boring or too difficult.

A conversation about the Copenhagen interpretation

I had an email conversation with BT in March 2017, where he praised Deutsch’s version of MWI, and I defended Heisenberg’s version of Copenhagen:
28 Feb 2017 BT: purpose of science is explanation
9 Mar 2017 TK: what is Copenhagen
13 Mar 2017 BT: Copenhagen is Bohr
20 Mar 2017 TK: historical defence of Copenhagen
28 Mar 2017 TK: QM as general probability

During that conversation, I did some historical investigations:

When David Bohm proposed his new interpretation in 1952, the term “Copenhagen interpretation” didn’t exist yet, he had to talk of the “usual physical interpretation of quantum theory” instead. The idea that the “Copenhagen interpretation” should be the common core of the beliefs by Bohr and Heisenberg probably goes back to Henry P. Stapp in 1972. Before, it was just another way to refer to the orthodox interpretation(s) taught by (good) text books. Heisenberg is the unfortunate soul who coined the term Copenhagen interpretation (in the 50s).

Here is how I see Heisenberg’s position, independent of whether Bohr agrees:

Max Born and Werner Heisenberg claimed in 1927 that quantum mechanics is a closed theory (abgeschlossene Theorie), just like Newton’s mechanics is a closed theory. And by this they meant that its description of reality is complete and accurate, whenever it is applicable. The limits of applicability are not part of the closed theories themselves, according to Heisenberg.

The conversation started, when I quoted Heisenberg’s own explanation:

… what one may call metaphysical realism. The world, i.e., the extended things, ‘exist’. This is to be distinguished from practical realism, and the different forms of realism may be described as follows: We ‘objectivate’ a statement if we claim that its content does not depend on the conditions under which it can be verified. Practical realism assumes that there are statements that can be objectivated and that in fact the largest part of our experience in daily life consists of such statements. Dogmatic realism claims that there are no statements concerning the material world that cannot be objectivated. Practical realism has always been and will always be an essential part of natural science. Dogmatic realism, however, …

and BT replied:

That’s all a bit technical for me! I can’t tell whether he is in favour of “explanations” in the simple(-minded?) Deutsch sense or not.

After setting the passage in context, I got very concrete:

But I disagree that the quoted passage is technical. If he adheres to this passage, then Heisenberg cannot claim that Schrödinger’s cat would be both alive and dead, or that the moon would not be there if nobody watches.

Others, like Christopher A. Fuchs and Asher Peres in Quantum Theory Needs No Interpretation”, are apparently less sure whether (neo-Copenhagen) quantum theory is so clear about that fact. Hence they try to weasel out by claiming: “If Erwin has performed no observation, then there is no reason he cannot reverse Cathy’s digestion and memories. Of course, for that he would need complete control of all the microscopic degrees of freedom of Cathy and her laboratory, but that is a practical problem, not a fundamental one.”

This is non-sense, because the description of the experiment given previously was complete enough to rule out any possibility for Erwin to reverse the situation. Note the relevance of “… a consistent interpretation of QM as applied to what we do in a physical laboratory and how practitioners experience QM in that context.” If Erwin had access to a time machine enabling him to realistically reverse the situation, then it might turn out that Cathy and Erwin indeed lived multiple times through both situations (and experienced real macroscopic superpositions), as depicted in movies like “Back to the Future”.

According to Jan Faye in the SEP article on the Copenhagen Interpretation, even Bohr does not disagree with that conclusion: “Thus, Schrödinger’s Cat did not pose any riddle to Bohr. The cat would be dead or alive long before we open the box to find out.” However, I doubt that Bohr really said this, or replied to the content of Schrödinger’s article in any other direct way. (I read that he complained to Schrödinger about that article for indirectly supporting Einstein in his crusade against QM.)

Questions about Schrödinger’s famous article

At the end of this article on Erwin Schrödinger there are links to interesting papers by Schrödinger. After reading “Die gegenwärtige Situation in der Quantenmechanik” (“The Present Status of Quantum Mechanics (“Cat” Paper)”), I had several questions:

  1. Did Schrödinger know the density matrix formalism? He could have formulated some points he was making more explicit by using it.
  2. Did Bohr ever admit that the example of the cat was spot on, and that of course the cat was either dead or alive, even if nobody cared to check?
  3. Did Schrödinger know about Fermions and Bosons? The last paragraph talks about how the relativistic theory would make the foregoing discussion invalid.

When I first read Schrödinger’s article in September 2014, I didn’t knew much about quantum mechanics or it history. But I had the hope that my instrumentalist ideas related to the density matrix were not too far off from the orthodox interpretation. Today, I know that Carl Friedrich von Weizsäcker denied that the Copenhagen interpretation asserted: “What cannot be observed does not exist.” Instead, it follows the principle:

What is observed certainly exists; about what is not observed we are still free to make suitable assumptions. We use that freedom to avoid paradoxes.

The interesting thing is that Schrödinger’s article is not really about the cat, and not really about attacking quantum mechanics. Instead, Schrödinger tries to explain his own understanding of it. But Bohr interpreted the article as an attack on his interpretation.

  1. Schrödinger at least knew the consequences that follow from the density matrix formalism, so I guess he knew it. Not sure why he does not use it, maybe it was not yet popular among physicists back then.
  2. I doubt Bohr ever admitted that Schrödinger was right.
  3. Schrödinger knew about Fermions and Bosons. The basic connection between Fermi–Dirac statistics, Bose–Einstein statistics and the spin was already known since 1926. But the first derivations of the spin-statistics theorem from relativistic quantum field theory by Markus Fierz and later by Wolfgang Pauli are from 1939.

However, I guess Schrödinger wrote that last paragraph not just because Fermions and Bosons were challenging to fit into his scheme of explanation. He also wanted to stress that relativistic quantum mechanics was not yet fully worked out, so that he could conclude: “Perhaps, the simple procedure of the nonrelativistic theory is only a convenient calculational trick, but it has at the moment attained a tremendous influence over our basic view of Nature.”

While talking about famous articles by Schrödinger, I want to quote a passage from the beginning of “Are There Quantum Jumps? Part I”:

Along with this disregard for historical linkage there is a tendency to forget that all science is bound up with human culture in general, and that scientific findings, even those which at the moment appear the most advanced and esoteric and difficult to grasp, are meaningless outside their cultural context. A theoretical science, unaware that those of its constructs considered relevant and momentous are destined eventually to be framed in concepts and words that have a grip on the educated community and become part and parcel of the general world picture – a theoretical science, I say, where this is forgotten, and where the initiated continue musing to each other in terms that are, at best, understood by a small group of close fellow travellers, will necessarily be cut off from the rest of cultural mankind; in the long run it is bound to atrophy and ossify, however virulently esoteric chat may continue within its joyfully isolated groups of experts.

I want to quote a passage from the beginning of “Are There Quantum Jumps? Part II”:

Even if this shorthand were admissible for the microevent, we have to keep in mind that physicists are not the only people interested in physics and in the outcome of theoretical research in physics. Those results that are tenable and will survive are destined, eventually, to be absorbed into our world picture and to become part and parcel thereof. A great many of our educated contemporaries, not equipped with the mathematical apparatus to follow our more technical deliveries, are yet deeply concerned with many general questions; one of the most stirring among them certainly is whether actually natura facit saltus or no. Whatever abbreviated language we physicists may find convenient to use among ourselves, we ought to be aware of the dilemmas that justly and duly interest others; we must be careful not to veil or distort them by indulging in loose speech

Conclusions

Why did I wrote this post? Probably because I wanted to share my email conversation about the Copenhagen interpretation. And because I finally managed to read a technical book about quantum field theory cover to cover. However, that part already touches the code of silence, and I really wrote two of the emails (the ones were silence would not hurt me too bad) for which I expected silence (and really got silence as expected). But it was a start, and I am confident that I will finish more of the small stuff in the near future, even if the silence will hurt me more. Writing about Schrödinger was just bonus, or maybe I wanted to avoid having to write about physics again in the near future. But I could do a post about my thoughts on probability, at least that would be related to logic.

Posted in physics | 1 Comment

ALogTime, LogCFL, and threshold circuits: dreams of fast solutions

Our protagonists are the following (DLogTime-) uniform circuit classes

Interesting things can be said about those classes. Things like Barrington’s theorem (NC1), closure under complementation by inductive counting (SAC1), or circuits for division and iterated multiplication (TC0). Things like TC0 recognises the Dyck language with two types of parentheses, NC1 recognises visibly pushdown languages, or SAC1 recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space).

But how are those classes related to dreams about speed? There are sometimes situations where both fixed width integers and floating point numbers are problematic, but much faster than better alternatives because of explicit hardware support. The dream is to allow arbitrary alternative number formats getting sufficient hardware support to remain reasonable alternatives to the number formats supported explicitly by current hardware.

ALogTime – fast solutions for predictable stack height

Considering how parameter passing to subroutines and a huge part of memory management in mainstream computer languages is stack based, an obvious and natural variation is to restrict the unbounded memory of a Turing machine to be a stack. With those words, I justified copying a part of my answer to my own question yet another time. During my research for that answer, I also learned about Nested Words and Visibly Pushdown Languages and that they can be recognized in LOGSPACE. Recently, I read

The Boolean formula value problem is in ALOGTIME (25 pages)
https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean/paper.pdf
by S Buss – Jan 1987

I immediately suspected that Visibly Pushdown Languages can also be recognized in ALogTime, and that recognition in LOGSPACE is just a simple corollary. Indeed it is:

Complexity of Input-Driven Pushdown Automata (21 pages)
http://research.cs.queensu.ca/~ksalomaa/julk/p47-okhotin.pdf
by A Okhotin, K Salomaa – 2014

Fig3MehlhornRytterAlgorithm
The remarks following the proof of theorem 8 (with the beautiful Figure 3 shown above) say: “Finally, Dymond [12] applied the method of Buss [7] to implement the Mehlhorn–Rytter algorithm [36] in NC1“. Dymond’s paper is short and sweet:

Input-driven languages are in log n depth (4 pages)
https://www.sciencedirect.com/science/article/pii/0020019088901482?via%3Dihub
by P Dymond – 1988

Polynomial size log depth circuits: Between NC1 and AC1 by Meena Mahajan gives a different summary of Dymond’s paper:

Dymond gave a nice construction [14] showing that they are in fact in NC1. His approach is generic and works not just for VPAs but for any PDA satisfying the following:

  1. no ϵ moves,
  2. an accepting run should end with an empty stack
  3. the height of the pushdown, after processing i letters of the input, should be computable in NC1. If there is more than one run (nondeterministic PDA), and if the height profiles across different runs are different, then the heights computed should be consistent with a single run. Furthermore, if there is an accepting run, then the heights computed should be consistent with some accepting run.

For such PDA, Dymond transforms the problem of recognition to an instance of formula value problem, and then invokes Buss’s ALogTime algorithm [8] for it.

This post links extensively to external material. It is clear that the reader of this post will not read most of it. The more interesting question is whether I have read that material, or at least plan to read it. The initial plan of this post was to indicate which material I have already read and understood, and which material I still plan to read. But this is unpractical and boring for the reader. What I want to say is that I have read Buss’ paper in principle, but didn’t yet work thoroughly enough through section 7, and I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive? In some of the other papers quoted above, I only read the parts which interested me. And here is another paper by Buss et al which I still plan to read:

An optimal parallel algorithm for formula evaluation (42 pages)
https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean2/finalversion.pdf
by S Buss, S Cook, A Gupta, V Ramachandran – Nov 20, 1990

Parallel and distributed computing

The main conclusion of the last section is that ALogTime is quite powerful, because it can recognize languages accepted by input driven stack automata. I wondered whether this result is taught in typical introductions to parallel computing. Here is my sample:

Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (104 pages)
http://www.umiacs.umd.edu/users/vishkin/PUBLICATIONS/classnotes.pdf
by Uzi Vishkin – October 12, 2010

Parallel Algorithms (64 pages)
https://www.cs.cmu.edu/~guyb/papers/BM04.pdf
by Guy E. Blelloch and Bruce M. Maggs – 1996

Limits to Parallel Computation: P-Completeness theory (327 pages)
https://homes.cs.washington.edu/~ruzzo/papers/limits.pdf
by R Greenlaw, H Hoover, W Ruzzo – 1995

Parallel Algorithms (347 pages)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.466.8142&rep=rep1&type=pdf
by H Casanova, A Legrand, Y Robert – 2008

My conclusion is that it is not taught, at least not explicitly. And I already took care that they would talk about ALogTime by saying (uniform) NC1. Chapter 10 where Uzi Vishkin talks about tree contraction is related, and NC1 is discussed in some of the texts. (It should be clear that I didn’t really try to read those texts, and just browsed through them.)

Some words on LogCFL and threshold circuits

I looked for hardware support for LogCFL, as seems possible for NC1

A more promising approach could be to look for problems complete under appropriate many-one reductions for these classes. The word problem for \bar{A}_5 (or \bar{S}_5) would be such a problem for NC1. The solution of that word problem is nicely scalable, since one can subdivide the word into smaller parts, and thereby generate a smaller word on which the same procedure can be applied again.

That many-one reduction to that word problem is Barrington’s theorem proved in

Bounded-width polynomial-size branching programs recognize exactly those languages in NC1 (15 pages)
https://people.cs.umass.edu/~barring/publications/bwbp.pdf
https://www.sciencedirect.com/science/article/pii/0022000089900378
by David A.Barrington – 1987

One thing which could still go wrong when trying to use TC1, NC1, and SAC1 for actual efficient computer architectures is that those fast and nice many-one reductions are studied to decision problems, but in the end we are much more interested in computing solutions to function problems. This came up when I described my plans to a friend. But I am not the first to dream about hardware support for those theoretical fast solutions:

Explicit Multi-Threading (XMT): A PRAM-On-Chip Vision
A Desktop Supercomputer (4 pages)
http://www.umiacs.umd.edu/users/vishkin/XMT/index.shtml
http://www.umiacs.umd.edu/users/vishkin/XMT/ICCD09proc-version.pdf
by Uzi Vishkin – 2009

Back to LogCFL: for “closure under complementation by inductive counting,” see:

Two applications of inductive counting for complementation problems (20 pages)
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.394.1662&rep=rep1&type=pdf
by A Borodin, S Cook, P Dymond, W Ruzzo, M Tompa – 1989

and for: “SAC1 recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space),” note that LogCFL = AuxPDA(log n, nO(1)) = SAC1 see here. This means the class of log n auxiliary pushdown automaton which run in non-deterministic polynomial time. (Probably “non-deterministic” can also be replaced by “deterministic”. The direct meaning of LogCFL is the class of problems which can be many-one reduced to CFL in deterministic logspace.) I also guess that LogCFL can be solved by a dynamic programming algorithm, since both NL and CFL allow such a solution. (Probably an easy exercise. I just wanted to have LogCFL in this post, since it is part of my dream, and allows to solve interesting problems. And it seems close to ALogTime from a hardware perspective.)

Now to threshold circuits: “TC0 recognises the Dyck language with two types of parentheses”. I don’t have a good primary reference at the moment. But this shows that threshold circuits are surprisingly close to ALogTime with respect to their capabilities. And “circuits for division and iterated multiplication,” has some references in the wikipedia article, but I liked the following lecture notes:

Lecture 8: Multiplication ∈ TC0 (4 pages)
https://users-cs.au.dk/arnsfelt/CT08/scribenotes/lecture8.pdf
by Kristoffer Arnsfelt Hansen – 2008

The paper by P Dymond – 1988 ends with the following speculation:

This result shows that a large class of cfl’s is in log depths, and it is natural to inquire about the possibility that all deterministic cfl’s, which are known to be in log space, might also be in log depth. One dcfl in log space which is not currently known to be recognizable in log depth is the two-sided Dyck language on four letters [5].

Word Problems Solvable in Logspace
https://rjlipton.wordpress.com/2009/04/16/the-word-problem-for-free-groups/
http://cpsc.yale.edu/sites/default/files/files/tr60.pdf
by RJ Lipton, Y Zalcstein – 1977

It is an interesting question whether ALogTime and LOGSPACE are different for naturally occurring problems. Including LogCFL in the hardware acceleration tries to evade this question. But there are programming approaches specifically focusing on LOGSPACE, since it may model an important aspect of practical algorithms for really big data:

Computation-by-Interaction for Structuring Low-Level Computation
http://www2.tcs.ifi.lmu.de/~schoepp/habil.pdf
by U Schöpp – 2014

Understanding whether ALogTime != PH is hard to prove

I was wondering about how to define function classes for ALogTime: How to transform words over an input alphabet to words over an output alphabet. I wondered whether I should write a mail to Emil Jeřábek. Maybe I only wanted to have a reason to write him, but I didn’t. Instead, I googled my way until I understood even much weaker function classes. Then a naive idea for proving ALogTime != PH crossed my mind: Prove some problem complete for ALogTime under some extremely weak many-one reduction. If that would work, it would be sort of a “fool’s mate”, since no genuinely new idea is needed, just a careful application of well known techniques. So when Scott Aaronson said

On reflection, though, Mubos has a point: all of us, including me, should keep an open mind. Maybe P≠NP (or P=NP!) is vastly easier to prove than most experts think, and is susceptible to a “fool’s mate.”

in HTTPS / Kurtz / eclipse / Charlottesville / Blum / P vs. NP, I couldn’t help but ask him about whether my naive idea has any chance of proving ALogTime != PH, even if just for testing whether he really keeps such an open mind. His answer was: “I don’t know enough to give you detailed feedback about your specific approach”. So he keeps an open mind, even if Stephen A. Cook is slightly more open with respect to the possibility that separating P from weaker classes like NL or ALogTime could be within reach of current techniques:

The Tree Evaluation Problem: Towards Separating P from NL (8 pages)
https://www.cs.toronto.edu/~toni/Courses/PvsNP/CS2429.html
https://www.cs.toronto.edu/~toni/Courses/PvsNP/Lectures/lecture6.pdf
by Stephen A. Cook – 2014

This post has many links to papers like the one above. As if I could justify myself for believing that ALogTime can be separated from P by current techniques. But the links show at least that investing time into this topic is an interesting learning experience. The following sections Weak reductions and isomorphism theorems, and DAG automata link to material which I read (or still plan to read) in order to make progress on ALogTime != P.

Weak reductions and isomorphism theorems

DLogTime many-one reduction already look quite weak. (I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive?) One could go weaker by looking at DLogTime-uniform NC0 reduction or even DLogTime-uniform projections (each output symbol/bit depends at most on a single input symbol/bit). But recently, people proposed even weaker uniformity notions than DLogTime:

A circuit uniformity sharper than DLogTime (19 pages)
https://hal.archives-ouvertes.fr/hal-00701420/document
by G Bonfante, V Mogbil – May 25, 2012

This proposed rational-uniformity can be summarized as as writing n in binary and applying a finite state transducer to that input for computing the circuit description. I haven’t read the full paper yet. Another recently proposed approach uses descriptive complexity for defining very weak uniformity notions:

The lower reaches of circuit uniformity (16 pages)
https://eccc.weizmann.ac.il/report/2011/095/
by C Behle, A Krebs, K-J Lange, P McKenzie – May 12, 2012

I didn’t understand too much of that paper yet. I tried reading revision 0 (didn’t notice that it wasn’t the latest revision), but revision 1 might turn out to be easier to read: “Results are stated in more clear way, and overall readability was improved.”

Another line of attack for making the many-one reductions weaker is to only allow isomorphisms. It may seem counter-intuitive first that this should be possible, but the Schröder-Bernstein theorem can sometimes be adapted. This is described in two classic papers:

Reductions in Circuit Complexity: An Isomorphism Theorem and a Gap Theorem (17 pages)
https://www.sciencedirect.com/science/article/pii/S0022000098915835
http://ftp.cs.rutgers.edu/pub/allender/switch3.pdf
http://www.cs.cmu.edu/~rudich/papers/ac0reductions.ps
by M Agrawala, E Allender, S Rudich – 1998

Reducing the Complexity of Reductions (9 pages or 22 pages)
https://cseweb.ucsd.edu/~russell/reductions.ps
http://ftp.cs.rutgers.edu/pub/allender/stoc97.pdf
by M Agrawala, E Allender, R Impagliazzo, T Pitassi, S Rudich – 2001

I didn’t read those papers yet, but Algebra, Logic and Complexity in Celebration of Eric Allender and Mike Saks by Neil Immerman gives a nice survey of those results, which I read for a start.

DAG automata

One important property of input driven stack automata is that the dependency graphs will be planar DAGs. For general polynomial time algorithms, the dependency graphs will be general non-planar DAGs (due to variable binding with an unbounded number of variables). In order to learn what is already known about this, I googled “DAG automata”. The first three hits are awesome:

DAG Automata – Variants, Languages and Properties (48 pages)
http://www8.cs.umu.se/education/examina/Rapporter/JohannesBlum_submitted.pdf
by J Blum – May 25, 2015

How Useful are Dag Automata? (26 pages)
https://hal.archives-ouvertes.fr/hal-00077510/document
by S Anantharaman, P Narendran, M Rusinowitch – ‎2004

DAG Automata for Meaning Representation (12 pages)
http://aclweb.org/anthology/W17-3409
by F Drewes – London, UK, July 13–14, 2017

The first and third hit are related, Frank Drewes was the adviser of Johannes Blum. I read both, and they were really worth it. (I only browsed the paper from 2004, just to see that the newer papers really achieved a breakthrough.) They reduce determining which rules are useless to a question about Petri nets, and then provide a solution for that problem: “A Petri net is structurally cyclic if every configuration is reachable from itself in one or more steps. We show that structural cyclicity is decidable in deterministic polynomial time.”

Structurally Cyclic Petri Nets (9 pages)
https://arxiv.org/abs/1510.08331
by F Drewes, J Leroux – ‎2015

On understanding a problem, without solving it

One can understand a problem without solving it. So one can understand why something is a problem and why a solution would be challenging, without being able to come up with a solution. I don’t know whether the best experts in computer science understand P != NP in this sense. I don’t understand it yet, but I have the deeply rooted belief that it can be understood.

And understanding for me also means to be able to read work that others have done, and to see and feel the connection to the questions which interest me.

Conclusions?

This post is too long, but I didn’t really care this time. Since the reader is expected to read at least some of the linked material, it is implicitly even much longer than its nominal length (which is too long) indicates. The initial goal was to write it in order to be able to focus again on more urgent topics. But of course, this didn’t really worked out that way. Still, this post was always intended to be mainly a collection of links to papers I already read or still plan to read in order make progress on understanding ALogTime, LogCFL, and threshold circuits. And I also knew that I would try to justify myself for believing that this topic is worth studying.

Do I believe that the dreams of fast solutions can come true? I don’t know. Uzi Vishkin seems to have made a much more serious effort than I ever will, and even argued his case from an economic perspective:

Alas, the software spiral is now broken: (a) nobody is building hardware that provides improved performance on the old serial software base; (b) there is no broad parallel computing application software base for which hardware vendors are committed to improve performance; and (c) no agreed-upon architecture currently allows application programmers to build such software base for the future

Could proof of work problems for crytocurrency mining provide the incentive to make dreams come true? Who knows, money is important, after all. But personally, I am rather a believer in “constant dripping wears away the stone,” so I think in the long run, it is quite possible that things progress. Probably in ways I cannot even imagine today.

Posted in automata | 4 Comments

A subset interpretation (with context morphisms) of the sequent calculus for predicate logic

The previous two posts used the sequent calculus with a subset interpretation:

We work with sequents A_{1},\ldots,A_{r}\vdash B_{1},\ldots,B_{s}, and interpret the propositions A_i (and B_j) as subsets of some universe set X. We interpret the sequent itself as [A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}].

While writing the previous post, there was the temptation to introduce the sequent calculus rules for existential and universal quantification. But the subset interpretation with a fixed universe set X didn’t seem helpful for free variables and quantification. But what if we allow different universe sets for different sequents? One way would be to augment the sequent rules to change the universe sets in a well defined way. However, the cut rule and two of the four quantification rules (those that replace an arbitrary term by a quantified variable) are a bit problematic. For them, the universe can sometimes be shrunk down, since a formula or a term was eliminated. Therefore, we just add two separate rules for changing the universe, and apply the sequent rules themselves only with respect to a fixed universe set.

This is the last planed post on the sequent calculus for the moment. (There might be an additional post in this series, extending the subset interpretation to natural deduction.) Since this post will remain vague and unsatisfactory in many ways, a list of related online materials on the sequent calculus seems appropriate: I often consulted the wikipedia article while writing these posts. (I never consulted Sequent Calculus Primer, but it looks nice and has exercises.) This MathOverflow question on introductions to sequent calculus suggests the chapter on Sequent Calculus from Frank Pfenning’s manuscript on Automated Theorem Proving, which I really liked. It also suggests Proofs and Types by Jean-Yves Girard, which was my first introduction to the sequent calculus. It left me unsatisfied, and I agree with Charles Stewart:

A warning: Girard’s style is a little slippery, and it is common for students to say they have read it, who turn out to have absorbed the opinions but little of the results.

I am still slightly ashamed that one of my first questions – when I met a group of academic logicians (with a focus on proof assistants) – was whether they found Girard easy to understand. Girard claims that the sequent calculus is generally ignored by computer scientists, an exception being Logic For Computer Science Foundations of Automatic Theorem Proving by Jean H. Gallier. A nice text, but who has time to read those 500+ pages? This is one reason why I like An Introduction to Proof Theory by Samuel R. Buss.

The sequent calculus rules for predicate logic

We are interested in normal first order predicate logic with predicates and functions. Here a signature with a finite number of predicate and function symbols is given, which include the arity for each of the symbols. The sequent calculus rules for quantification are:

Left quantification rules Right quantification rules
\begin{array}{c} \Gamma,A[t/x]\vdash\Delta\\ \hline \Gamma,\forall xA\vdash \Delta \end{array}(\forall L) \begin{array}{c} \Gamma\vdash A[t/x],\Delta\\ \hline \Gamma\vdash \exists xA,\Delta \end{array}(\exists R)
\begin{array}{c} \Gamma,A[y/x]\vdash\Delta\\ \hline \Gamma,\exists xA\vdash \Delta \end{array}(\exists L) \begin{array}{c} \Gamma\vdash A[y/x],\Delta\\ \hline \Gamma\vdash \forall xA,\Delta \end{array}(\forall R)

Those rules look simple, but what do they mean? What is the difference between t and y? And how to interpret a sequent with formulas containing free variables? First, y is a free variable not occurring in \Gamma, \Delta, or A. And A[y/x] means that each free occurrence of x in A is replaced by y. Next, t is an arbitrary term, with no restrictions at all on the variables occurring in it. A term is just any reasonable combination of function symbols (from the signature) and variables, like f(x,g(f(y,x),y)) where both f and g are binary function symbols.

There are many possible interpretations of predicate logic, but that answer just avoids answering the question. The interpretation of a formula or sequent with free variables can be controversial. My previous effort at a subset based interpretations of predicate logic compatible with non-classical logic left out equality, functions and constants. Even so we are talking only about classical logic here, the goal of the following interpretation is to be also applicable more generally. That interpretation will follow naturally once we clarify which models we want to allow.

Models and interpretations of predicate calculus

For the typical interpretations, a model consists of a set U (which is the universe where the variables live), a subset of U^k for each k-ary predicate symbol, and a function f:U^k \to U for each k-ary function symbol. A straightforward interpretation in that context is that a formula with k free variables is a subset of U^k, and a formula without free variables (i.e. a sentence) is either true or false. And for any set of k variables, we can interpret any formula whose free variables are included in that set as a subset of U^k. A sequent in this interpretation is true, iff for the subsets formed by the formulas interpreted relative to the set of all free variables in the sequent, the intersection of the subsets on the left side is contained in the union of the subsets on the right side.

This straightforward interpretation above is already a subset interpretation, but we want a slightly more general subset interpretation. For propositional logic, the possible models were given by a set X and a subset of X for each propositional variable A_i. In predicate logic, the 0-ary predicates take the role of the propositional variables, so we want them to be subsets of X. For our generalised interpretation, a model consists of a set X (whose subsets are the truth values) and a set U (which is the universe where the variables live), a subset of X \times U^k for each k-ary predicate symbol, and a function f:X \times U^k \to U for each k-ary function symbol.

The way X will enter into the model is to have some implicit context \xi\in X such that formulas like P(x, f(y, z), z) in the formal syntax become P(\xi, x, f(\xi, y, z), z) in the model. So for any set of k free variables, we can interpret any formula whose free variables are included in that set as a subset of X\times U^k. For a given set of variables and a sequent for which the free variables of the formulas are contained in that subset, the a sequent is true, iff for the subsets formed by the formulas interpreted relative that set of variables, the intersection of the subsets on the left side is contained in the union of the subsets on the right side.

(If we have a formula like x+y = y+x, we would like to interpret it as a subset of U^2, so we normally use X=\{()\} and then just omit it. Here the element () is called unit, the set \{()\} represents truth, and the empty set \emptyset represents falsehood.)

Context morphisms give life to the subset interpretation

If we have (a sequent of) subsets A_i over X, then a (partial) function p : Y \to X from Y to X gives us (a sequent of) subsets over Y by taking inverse images p^{-1}(A_i) of the subsets. We have

  • A \subseteq B \Rightarrow p^{-1}(A) \subseteq p^{-1}(B)
  • p^{-1}(A\cap B)=p^{-1}(A)\cap p^{-1}(B)
  • p^{-1}(A\cup B)=p^{-1}(A)\cup p^{-1}(B)

and hence

[A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]
\Downarrow
p^{-1}([A_{1}])\cap\ldots\cap p^{-1}([A_{r}])\ \subseteq\ p^{-1}([B_{1}])\cup\ldots\cup p^{-1}([B_{s}])

The direct translation into the language of sequences reads

\begin{array}{c} A_{1}(\xi),\ldots,A_{r}(\xi)\vdash B_{1}(\xi),\ldots,B_{s}(\xi)\\ \hline A_{1}(p(\nu)),\ldots,A_{r}(p(\nu))\vdash B_{1}(p(\nu)),\ldots,B_{s}(p(\nu)) \end{array}

This just tells us that global substitution is unproblematic. (But r\geq 1 is required in case p is a partial function, which is easily missed.) However, the more typical function p for our purpose is similar to p(u,v,w)=(u,v), i.e. a function from Y=U^3 to X=U^2. It is typical, since the expressions grow in complexity for most rules of sequent calculus. Here is an example:

\begin{array}{c} \begin{array}{l}A\vdash A:\{a\}\\ \hline A\vdash A:\{(a,b)\} \end{array} \qquad \begin{array}{l}B\vdash B:\{b\}\\ \hline B\vdash B:\{(a,b)\}\end{array}\\ \hline A,B\vdash A\land B:\{(a,b)\}\end{array}

This example also shows, how we can separately change the universe set, and apply the sequent rules only with respect to a fixed universe set. After applying the (\exists L) or (\forall R) rule, we can shrink the universe set. To illustrate, let us first apply the (\forall L) and (\exists R) rule, where we cannot always shrink the universe set:

\begin{array}{r} P(x) \vdash P(x) : X\times U\\ \hline \forall xP(x) \vdash P(x) : X\times U \end{array}

Here we cannot shrink the universe set.

\begin{array}{l} \forall xP(x) \vdash P(x) : X\times U\\ \hline \forall xP(x) \vdash \exists x P(x) : X\times U \end{array}

Now we can shrink the universe set, but only after checking that the sequent no longer depends on x. But when we apply (\forall R), we can always shrink the universe set:

\begin{array}{l} \forall xP(x) \vdash P(x) : X\times U\\ \hline \forall xP(x) \vdash \forall x P(x) : X\times U\\ \hline \forall xP(x) \vdash \forall x P(x) : X \end{array}

How can we justify this shrinking in terms of images or inverse images? The inverse image under p:X \to X\times U, \xi \to (\xi, c(\xi)) for an arbitrary function (constant) c(\xi) could be a justification. But we don’t really write [\forall xP(x)](c) \vdash [\forall x P(x)](c), and c is arbitrary and irrelevant. So the image under F:X\times U \to X, (\xi, u) \to \xi might be a nicer justification. But we only have

  • A \subseteq B \Rightarrow F(A) \subseteq F(B)
  • F(A\cap B)\subseteq F(A)\cap F(B)
  • F(A\cup B)=F(A)\cup F(B)

However, if we assume F^{-1}(F([A_i]))=[A_i], then F([A_i]\cap\dots\cap [A_r])=F([A_i])\cap\dots\cap F([A_r]) and hence

[A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]
\Downarrow
F([A_{1}])\cap\ldots\cap F([A_{r}])\ \subseteq\ F([B_{1}])\cup\ldots\cup F([B_{s}])

Doubts about context morphisms and substitutions

Let us apply the above to propositional logic. Surely, from A,B\vdash A\land B we can deduce C\lor D,C\to D\vdash (C\lor D)\land (C\to D). But we cannot justify it from A,B\vdash A\land B:\{(a,b)\} in terms of inverse images. Can we justify it from A,B\vdash A\land B:X_a\times X_b? Yes, take the map p:X_c\times X_d\to X_a\times X_b with (c,d) \to ([C\lor D](c,d), [C\to D](c,d)). Here X_c and X_d can be arbitrary and X_a=X_b=\{0,1\}. Note that we only need |X_a|, |X_b| \geq 2.

How justified are those substitutions into the expressions? Here, we used

[A\land B]([C\lor D](c,d), [C\to D](c,d))=[(C\lor D)\land (C\to D)](c,d)

Can we justify this? Can such substitutions go wrong? Yes, if p is a partial function:

\begin{array}{c} \top\vdash B(x)\\ \hline [\top](p(y))\vdash [B(\cdot)](p(y))\\ \hline [\top](y)\vdash [B(p(\cdot))](y)\\ \hline \top\vdash B(p(y)) \end{array}

Here [\top](p(y)) = [\top](y) fails, but it would work if p were a total function. We could have omitted \top, but then it would fail because of r=0. That was actually the point were I noticed that the restriction r\geq 1 is required for partial functions.

The substitutions are justified, because they work for each operation individually:

  • [\bot](p(y)) = [\bot](y)
  • [A\land B](p(y)) = [A](p(y))\land [B](p(y))
  • [A\lor B](p(y)) = [A](p(y))\lor [B](p(y))
  • [A-B](p(y)) = [A](p(y))-[B](p(y))
  • [A\oplus B](p(y)) = [A](p(y))\oplus [B](p(y))
  • [\forall z A](p(y)) = \forall z [A](p(y))
  • [\exists z A](p(y)) = \exists z [A](p(y))
  • [A(\cdot)](p(y)) = [A(p(\cdot))](y)

Treating equality without special rules

Equality can be given by a binary predicate which is a congruence relation for the functions, and compatible with the other predicates. There are different ways to treat equality, and it is important for most interesting applications of predicate logic. A canonical approach is to write the axioms on the left side of the sequent, and use normal deductions. Concrete examples for the required axioms were already given in the last post in the section “Axiomatic theories in predicate logic without falsehood” for the theory of groups and in the section “Axiomatic theories in predicate logic without falsehood” for Robinson arithmetic.

Part of the reason for presenting a subset interpretation for predicate logic was to better understand how predicate logic could work in case of non-classical logics. There are theories like Heyting arithmetic (and Gentzen’s own work), which seem to fit nicely with the interpretation above. But there are also intuitionistic dependent type theories with special rules and equality types, whose possible interpretations remain unclear to me, despite the nice subset interpretation explained above. Maybe Per Martin Löf is right: In the end, everybody must understand for himself (p.166)

Conclusions?

This post was hard for me to write. This is the classical problem when explaining something obvious (predicate calculus). You have to make a lot of words for something which seems obvious, but where the language is missing to properly write down what is going on behind the scenes. I didn’t solve that language problem, but I came up with a natural example of how the obvious thing can fail to be true. That might help the reader to understand for himself.

The content above already became clear to me while writing the previous post. But only after this current post was nearly finished did I notice the restriction r\geq 1 in case p is a partial function. Which is funny, after I clarified the notion of context morphism which makes sense of that logic without truth from the first post (in this series), I find out that there is an additional restriction that the antecedent should never be empty. My first reaction was that I don’t like this. Maybe I should look for a transformation which turns this logic about partial functions into the logic without negation or falsehood. But my second reaction was that it could make sense, perhaps any logical deduction needs assumptions.

Still the question remains whether there is a reasonable transformation from the logic without truth (and partial functions) to the logic without negation and falsehood. Probably that question will be easy to answer. Somehow one has to take complements, and hope that the structure of the predicate calculus remains intact. It seems to work, but I still have trouble to intuitively understand it. And I want to finish this post now.

Posted in logic, partial functions | Tagged , | Leave a comment

Logic without negation and falsehood

In the last post, consideration related to partial functions lead us to present a logic without truth and implication, using the binary minus operation as a dual of implication and substitute for unary negation. But logic without implication and equivalence is strange. So in this post we want to omit negation and falsehood instead.

(This post got surprisingly long. Sometimes I got slightly sidetracked, for example the discussion of the parity function and its sequent rules could have been omitted. But I already tried to remove most of those sidetracks. The real reason for the size explosion are the examples of axiomatic theories in predicate logic. But those had to be discussed, since they make it clear that one can really omit falsehood without loosing anything essential.)

The classical sequent calculus

Before starting to omit logical operations, let us first recall the classical sequent calculus including as many logical operations as barely reasonable. If we limit ourself to constants, unary operations, and binary operations, then we get:

  • Constants for truth (\top) and falsehood (\bot)
  • An unary operation for negation (\lnot)
  • Binary operations for: or (\lor), and (\land), implication (\to), minus (-), nand (|), nor (\overline{\lor}), equivalence (\equiv), and xor (\oplus). The remaining binary operations add nothing new: reverse implication, reverse minus, first, second, not first, not second, true, and false

We work with sequents A_{1},\ldots,A_{r}\vdash B_{1},\ldots,B_{s}, and interpret the propositions A_i (and B_j) as subsets of some universe set X. We interpret the sequent itself as [A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}]. Let \Gamma,\Delta, ... stand for arbitrary finite sequences of propositions.

Left structural rules Right structural rules
\begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(WL) \begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(WR)
\begin{array}{c} \Gamma,A,A\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(CL) \begin{array}{c} \Gamma\vdash A,A,\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(CR)
\begin{array}{c} \Gamma_{1},A,B,\Gamma_{2}\vdash\Delta\\ \hline \Gamma_{1},B,A,\Gamma_{2}\vdash\Delta \end{array}(PL) \begin{array}{c} \Gamma\vdash\Delta_{1},A,B,\Delta_{2}\\ \hline \Gamma\vdash\Delta_{1},B,A,\Delta_{2} \end{array}(PR)
Axiom Cut
\begin{array}{c} \ \\ \hline A\vdash A \end{array}(I) \begin{array}{c} \Gamma\vdash\Delta,A\quad A,\Sigma\vdash\Pi\\ \hline \Gamma,\Sigma\vdash\Delta,\Pi \end{array}(Cut)
Left logical rules Right logical rules
\begin{array}{c} \ \\ \hline \bot\vdash\Delta \end{array}(\bot L) \begin{array}{c} \ \\ \hline \Gamma\vdash\top \end{array}(\top R)
\begin{array}{c} \Gamma\vdash A,\Delta\\ \hline \Gamma,\lnot A\vdash\Delta \end{array}(\lnot L) \begin{array}{c} \Gamma,A\vdash \Delta\\ \hline \Gamma\vdash \lnot A,\Delta \end{array}(\lnot R)
\begin{array}{c} \Gamma,A,B\vdash\Delta\\ \hline \Gamma,A\land B\vdash\Delta \end{array}(\land L) \begin{array}{c} \Gamma\vdash A,B,\Delta\\ \hline \Gamma\vdash A\lor B,\Delta \end{array}(\lor R)
\begin{array}{c} \Gamma,A\vdash\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\lor B\vdash\Delta,\Pi \end{array}(\lor L) \begin{array}{c} \Gamma\vdash A,\Delta\quad \Sigma\vdash B,\Pi\\ \hline \Gamma,\Sigma\vdash A\land B,\Delta,\Pi \end{array}(\land R)
\begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma,A- B\vdash \Delta \end{array}(- L) \begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma\vdash A\to B,\Delta \end{array}(\to R)
\begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\to B\vdash\Delta,\Pi \end{array}(\to L) \begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash \Pi\\ \hline \Gamma,\Sigma\vdash A-B,\Delta,\Pi \end{array}(- R)
\begin{array}{c} \Gamma\vdash A,B,\Delta\\ \hline \Gamma,A\overline{\lor} B\vdash \Delta \end{array}(\overline{\lor} L) \begin{array}{c} \Gamma,A,B\vdash\Delta\\ \hline \Gamma\vdash A|B,\Delta \end{array}(| R)
\begin{array}{c} \Gamma\vdash A,\Delta\quad \Sigma\vdash B,\Pi\\ \hline \Gamma,\Sigma,A|B\vdash \Delta,\Pi \end{array}(| L) \begin{array}{c} \Gamma,A\vdash\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma\vdash A\overline{\lor} B,\Delta,\Pi \end{array}(\overline{\lor} R)
\begin{array}{c} \Gamma,A\vdash B,\Delta\quad\Sigma,B\vdash A,\Pi\\ \hline \Gamma,\Sigma,A\oplus B\vdash \Delta,\Pi \end{array}(\oplus L) \begin{array}{c} \Gamma,A\vdash B,\Delta\quad\Sigma,B\vdash A,\Pi\\ \hline \Gamma,\Sigma\vdash A\equiv B,\Delta,\Pi \end{array}(\equiv R)
\begin{array}{c} \Gamma\vdash A,B,\Delta\quad\Sigma,A,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\equiv B\vdash\Delta,\Pi \end{array}(\equiv L) \begin{array}{c} \Gamma\vdash A,B,\Delta\quad\Sigma,A,B\vdash\Pi\\ \hline \Gamma,\Sigma\vdash A\oplus B,\Delta,\Pi \end{array}(\oplus R)

The rules for equivalence were taken from here, where they are also proved. Observe that (A\equiv B)\equiv C=(A\oplus B)\oplus C = A\oplus(B\oplus C). Clearly A_1 \oplus \dots \oplus A_r is the parity function, which is famous for its computational complexity. It is easy to derive the rules for it (from the rules given above), but those rules have 2^r sequents above the line.

The rules for A_1 \lor \dots \lor A_r and A_1 \land \dots \land A_r are obvious. The rules for A_1 \to \dots \to A_r \to B and A - B_1 - \dots - B_r are also obvious, if we agree that this should be read as A_1 \to (\dots \to (A_r \to B).) and (.(A - B_1) - \dots) - B_r. Then A|B=A\to B\to\bot and A\overline{\lor}B=\top-A-B, so there is no need for multi-argument versions of nand and nor.

Omitting falsehood from classical logic

If we want to omit falsehood, it is not sufficient to just omit falsehood (\bot). The operations minus (-) and xor (\oplus) must be omitted too, since \bot = A-A = A\oplus A. Also nand (|) and nor (\overline{\lor}) must be omitted, since \bot = A \overline{\lor}(A\overline{\lor}A) and \bot = (A|(A|A))|(A|(A|A)). (They are functionally complete, so the only surprise is the length of the formulas). And if we want to keep truth (\top) or any of the remaining binary operations (or (\lor), and (\land), implication (\to), or equivalence (\equiv)), then also negation (\lnot) must be omitted.

But without negation, how can we express the law of excluded middle or double negation? Both A\lor \lnot A and \begin{array}{c} \lnot\lnot A \\ \hline A \end{array} use negation. If we look at proof by cases, then \begin{array}{c} \lnot A \to A \\ \hline A \end{array} or rather its generalization \begin{array}{c} (A\to B) \to A \\ \hline A \end{array} suggests itself, which is called Peirce’s law. And the sequent calculus suggests \begin{array}{c} A\to(B\lor C) \\ \hline (A\to B)\lor C\end{array}, which expresses a nice algebraic property of classical logic. But how do we prove that we get classical logic, and what does that mean?

At least classical tautologies like ((A\to B)\to B) \to (A\lor B) not using any of the omitted operations should still be tautologies. If “still be tautologies” is interpreted as being provable in the classical sequent calculus, then this follows from cut elimination. The cut elimination theorem says that the rule (Cut) can be avoided or removed from proofs in the sequent calculus. Since for all other rules, the formulas above the line are subformulas of the formulas below the line, only operations occurring in the sequent to be proved can occur in a cut-free proof.

However, my intended interpretation of “still be tautologies” is being provable in the natural deduction calculus. But let us not dive into the natural deduction calculus here, since this post will get long anyway. We still need to clarify what it means that we get classical logic. The basic idea is that logical expressions involving falsehood can be translated into expressions avoiding falsehood, which still mean essentially the same thing. But what exactly can be achieved by such a translation? Time for a step back.

Boolean functions of arbitrary arity

A Boolean function is a function of the form f : B^k \to B, where B = \{0, 1\} is a Boolean domain and k is a non-negative integer called the arity of the function. Any Boolean function can be represented by a formula in classical logic. If we omit falsehood, a Boolean function with f(1,\dots,1)=0 certainly cannot be represented, for otherwise our goal to remove falsehood would have failed. The question is whether we can represent all Boolean functions with f(1,\dots,1)=1, and whether the complexity of this representation is still comparable to the complexity of the representation where falsehood is available.

A two step approach allows to convert a given representation to one where falsehood is avoided. The first step is to replace A-B by (A\to B)\to\bot, A\oplus B by (A\equiv B)\to\bot, A|B by (A\land B)\to\bot, A\overline{\lor}B by (A\lor B)\to\bot, and \lnot A by A\to\bot. This step increases the size of the representation only by a small constant factor. The second step is to replace \bot by the conjunction of all k input variables. This step increases the size of the representation by a factor k in the worst case. It works, since at least one of the input variables is 0, if f is evaluated at an argument different from (1,\dots,1).

If one can define a new variable, then the factor k can be avoided by defining p:=\bigwedge_{i=1}^kA_i. However, the underlying idea is have a logical constant p together with axioms p\to A_i. This is equivalent to the single axiom \bigwedge_{i=1}^k(p\to A_i) which simplifies to p\to\bigwedge_{i=1}^k A_i.

Interpretation of logic without falsehood

When I initially wondered about omitting negation and falsehood from classical logic, I didn’t worry too much about “what it means that we get classical logic” and said

For a specific formula, falsehood gets replaced by the conjunction of all relevant logical expressions.

Noah Schweber disagreed with my opinion that “you can omit negation and falsehood, and still get essentially the same classical logic” and said

First, I would like to strongly disagree with the third sentence of your recent comment – just because (basically) the same proof system is complete for a restricted logical language when restricted appropriately, doesn’t mean that that restricted logic is in any way similar to what you started with.

So I replaced my harmless initial words by

One approach might be to translate \lnot A as A \to p, where p is a free propositional variable. It should be possible to show that this works, but if not then we just keep falsehood in the language.

However, these new words are problematic, since \lnot\lnot A \to A is a classical tautology, but ((A\to p)\to p)\to A is not. My initial words were not ideal too, since the meaning of “the conjunction of all relevant logical expressions” remained unclear.

The intuition behind my initial words was that falsehood is just the bottom element of the partial order from the Boolean algebra. Removing the explicit symbol (name) for the bottom element from the language should be harmless, since the bottom element can still be described as the greatest lower bound of “some relevant elements”. (However, removing the symbol does affect the allowed homomorphisms, since the bottom element is no longer forced to be mapped to another bottom element.)

The intuition behind my new words was an unlucky mix up of ideas from Curry’s paradox and the idea to use a free variable as a substitute for an arbitrary symbol acting as a potentially context dependent replacement for falsehood. We will come back to those ideas in the section on axiomatic theories in predicate logic without falsehood.

What are the “relevant logical expressions” for the conjunction mentioned above? Are A\lor B, A\to B, or \forall x\forall y\ x=y relevant logical expressions? The first two are not, since (A\lor B)\land B = B =(A\to B)\land B, i.e. the propositional variable B itself is already sufficient. The third one is relevant, at least for predicate logic with equality. For propositional logic, it is sufficient to replace falsehood by the conjunction of all proposition variables occurring in the formula (and the assumptions of its proof). For predicate logic, it is sufficient to replace falsehood by the conjunction of the universally quantified predicate symbols, including the equality predicate as seen above (if present).

Axiomatic theories not using negation or falsehood

If the axioms or axiom schemes of an axiomatic theory (in classical first order predicate logic) don’t use negation or falsehood or any of the other operations we had to remove together with falsehood, then we don’t need to explicitly remove falsehood.

Let us look at the theory of groups as one example. If we denote the group multiplication by \circ and the unit by e, then the axioms are

  • \forall{} x \forall{} y \forall{} z\ (x\circ y)\circ z=x\circ (y\circ z)
  • \forall x\ x\circ e = x
  • \forall x\exists y\ x\circ y=e

We see that neither falsehood nor negation occurs in those axioms. This is actually a bit cheated, since it would seem that implication also doesn’t occur. It actually does occurs in the axioms governing equality. Here are some axioms for equality:

  • \forall x\ x=x
  • \forall x\forall y\ (x=y \to y=x)
  • \forall x\forall y\forall z\ (x=y \to (y=z \to x=z))

They only use implication, but something like the axiom schema

  • x=y \to (\phi(x,x)\to\phi(x,y))

is still missing. Here \phi(x,y) is any formula, and may contain more free variables in addition to x and y. This doesn’t work for us, since this could also include formulas \phi which use negation or falsehood! But at least for the theory of groups, the following two axioms (derived from instances of the axiom scheme) should be sufficient.

  • \forall x\forall y\forall z\ (x=y \to x\circ z=y\circ z)
  • \forall x\forall y\forall z\ (x=y \to z\circ x=z\circ y)

Here is a typical theorem of the theory of groups that should now be provable.

\forall x\forall y\forall z\ (x\circ y = e \to (z\circ x = e \to y=z))

Here is an informal proof: Since x\circ y = e and \forall x\forall y\ (x=y \to z\circ x=z\circ y), we have z\circ(x\circ y) = z\circ e. Since z\circ(x\circ y)=(z\circ x)\circ y and z\circ e = z, we have (z\circ x)\circ y = z. Since z\circ x = e and \forall x\forall z\ (x=z \to x\circ y=z\circ y), we have e\circ y = z. If we could show e\circ y = y (this was omitted in axioms given above), then the proof would be complete.

Axiomatic theories in predicate logic without falsehood

Let us next look at Robinson arithmetic, which is essentially Peano arithmetic without the induction axiom schema.

  1. (Sx=0)\to\bot
  2. (Sx=Sy)\to x=y
  3. y=0\lor\exists x\ Sx=y
  4. x+0 = x
  5. x + Sy = S(x+y)
  6. x\cdot 0 = 0
  7. x\cdot Sy = (x\cdot y) + x

Here are the additional axioms for equality.

  • x=y \to S(x)=S(y)
  • x=y \to x+z=y+z
  • x=y \to z+x=z+y
  • x=y \to x\cdot z=y\cdot z
  • x=y \to z\cdot x=z\cdot y

Universal quantification has been omitted, but that should not worry us here. We see that the first axiom is the only one using falsehood. We could replace it by

  1. (Sx=0)\to\forall x\forall y\ x=y

Or we could also introduce a constant F and replace by the axioms

  1. F\to\forall x\forall y\ x=y
  2. (Sx=0)\to F

The second approach has the advantage that if we add the induction axiom schema to get Peano arithmetic, we can just replace falsehood by the constant F. The first approach has the advantage that is shows that falsehood was never really required for the induction axiom schema.

The second approach comes closer to my motivations to remove falsehood from the language. I wanted to avoid negation and falsehood, because if there is no (bottom) element connecting everything together, then the damage caused by the principle of explosion can be restricted to specific domains of discourse. I also had the idea to use more specific propositional constants instead of falsehood. For ZFC set theory, one could use one propositional constant F_= for the axiom asserting the existence of the empty set, and another propositional constant F_\in for the axiom asserting regularity. However, as soon as one adds the axioms F_= \to \forall x\forall y\ x=y and F_\in \to \forall x\forall y\ x\in y, both constants will probably turn out to be equivalent.

Conclusions

Such a long post, for such a trivial thing as removing falsehood from the language. At least it should be clear now that one can really remove falsehood if desired, and what it means that one doesn’t loose anything essential. But is there anything to be gained by removing falsehood? The initial idea to remove falsehood came from the rule \begin{array}{c} A\to(B\lor C) \\ \hline (A\to B)\lor C\end{array} (suggested by the sequent calculus), which characterises classical logic by a nice algebraic property (not involving negation or falsehood). The motivation to finish this post came from the realisation that implication is the central part of logic. I just read this recent post by Peter Smith on the material conditional, and realised that one really gains something by removing falsehood: the counterintuitive material conditional P\to Q = \lnot P \lor Q goes away, and implication becomes the first class citizen it should have always been.

The counterintuitive part didn’t really disappear. The material conditional decomposes into the two sequents P\to Q \vdash (P\to F)\lor Q and (P\to F)\lor Q, F\to Q\vdash P\to Q. The proof sketch for those is \begin{array}{c} P\to Q \\ \hline P\to(F\lor Q) \\ \hline (P\to F)\lor Q\end{array} and \begin{array}{c}\begin{array}{c} (P\to F)\lor Q \\ \hline P\to(F\lor Q)\end{array} \quad \begin{array}{c} \ \\ F\to Q\end{array} \\ \hline P\to Q\end{array}. The proof of the second doesn’t even use the special properties of classical logic, but this part is the counterintuitive one: Why should P\to Q be true, if P\to F? Well, the intuitive everyday logic is more a sort of modal logic than a propositional or predicate logic.

Posted in logic | Tagged | 3 Comments

Logic without truth

Pi is wrong! But so what? It is neither new, nor complicated enough to count as real math! And suggestions that 2\pi i or \pi/2 might be even better show that it not clear-cut either.

I recently invested sufficient energy into some logical questions to make real progress. But while telling friends and fellow logicians about it, I realized how irrelevant all my results and conclusions will be. I will have to publish them in an appropriate journal nevertheless, since they continue David Ellerman’s work on partition logic. Not publishing them would be dismissive towards the value of David Ellerman’s work, and he really invested lots of effort into it, and believes in its value. I won’t talk about those results here, since I don’t know how it would impact my ability to publish them.

Let’s have some fun instead, stay extremely close to classical logic, and still demonstrate a logic without truth. And let’s get back to Gerhard Gentzen.

Partial truths

I am fond of partial functions. For a partial function p:X\to Y, we have

  • p^{-1}(A\cap B)=p^{-1}(A)\cap p^{-1}(B)
  • p^{-1}(A\cup B)=p^{-1}(A)\cup p^{-1}(B)
  • p^{-1}(A {}\setminus{} B)=p^{-1}(A) {}\setminus{} p^{-1}(B)
  • p^{-1}(A \Delta B)=p^{-1}(A) \Delta p^{-1}(B) where A \Delta B := (A {}\setminus{} B) \cup (B {}\setminus{} A)

But p^{-1}(Y)=X is only true if p is a total function. Especially p^{-1}(A^c)=p^{-1}(A)^c is only true (even for specific A) if p is total, since otherwise
p^{-1}(Y)=p^{-1}(A\cup A^c)=p^{-1}(A)\cup p^{-1}(A^c) \quad = \quad p^{-1}(A)\cup p^{-1}(A)^c=X
Let’s also prove one of the other claims:
x {}\in{} p^{-1}(A {}\setminus{} B)  \Leftrightarrow p(x) {}\in{} A {}\setminus{} B \Leftrightarrow p(x) {}\in{} A \land  p(x) {}\notin{} B \Leftrightarrow x {}\in{} p^{-1}(A) {}\setminus{} p^{-1}(B)

Note that A {}\setminus{} B = A \triangle (A \cap B). So the preserved operations are “and”, “or”, and “xor” if interpreted from a logical perspective. It would be nice if implication were preserved too. This seems hopeless, since A \to A is always true, and truth is not preserved. But we do have A \subseteq B \Rightarrow p^{-1}(A) \subseteq p^{-1}(B), which means that the external implication given by the order is preserved. So we should be able to turn this into a logic, where internal truth is not preserved under context morphisms.

Gerhard Gentzen’s sequent calculus

The sequent calculus is a proof calculus with significant practical and theoretical advantages compared to more obvious proof calculus systems. It works with sequents A_{1},\ldots,A_{r}\vdash B_{1},\ldots,B_{s}. The propositions A_i (and B_j) could be logical formulas like S(x)=S(y) \to x = y (the 4-th Peano axiom). They can also be interpreted as subsets of some universe set X, which is sufficient for understanding the basics of the sequent calculus. Then the sequent itself is interpreted as [A_{1}]\cap\ldots\cap [A_{r}]\ \subseteq\ [B_{1}]\cup\ldots\cup [B_{s}].

Left structural rules Right structural rules
\begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(WL) \begin{array}{c} \Gamma\vdash\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(WR)
\begin{array}{c} \Gamma,A,A\vdash\Delta\\ \hline \Gamma,A\vdash\Delta \end{array}(CL) \begin{array}{c} \Gamma\vdash A,A,\Delta\\ \hline \Gamma\vdash A,\Delta \end{array}(CR)
\begin{array}{c} \Gamma_{1},A,B,\Gamma_{2}\vdash\Delta\\ \hline \Gamma_{1},B,A,\Gamma_{2}\vdash\Delta \end{array}(PL) \begin{array}{c} \Gamma\vdash\Delta_{1},A,B,\Delta_{2}\\ \hline \Gamma\vdash\Delta_{1},B,A,\Delta_{2} \end{array}(PR)

Here \Gamma,\Delta, ... stand for arbitrary finite sequences of propositions. The structural rules may be relatively boring. The following global rules are slightly more interesting

Axiom Cut
\begin{array}{c} \ \\ \hline A\vdash A \end{array}(I) \begin{array}{c} \Gamma\vdash\Delta,A\quad A,\Sigma\vdash\Pi\\ \hline \Gamma,\Sigma\vdash\Delta,\Pi \end{array}(Cut)

None of the rules up to now has used any logical constant or connective. They can be verified directly for the subset interpretation. The following logical rules can only be verified after the (intended) interpretation of the logical connectives has been fixed.

Left logical rules Right logical rules
\begin{array}{c} \Gamma,A,B\vdash\Delta\\ \hline \Gamma,A\land B\vdash\Delta \end{array}(\land L) \begin{array}{c} \Gamma\vdash A,B,\Delta\\ \hline \Gamma\vdash A\lor B,\Delta \end{array}(\lor R)
\begin{array}{c} \ \\ \hline \bot\vdash\Delta \end{array}(\bot L) \begin{array}{c} \ \\ \hline \Gamma\vdash\top \end{array}(\top R)
\begin{array}{c} \Gamma,A\vdash\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\lor B\vdash\Delta,\Pi \end{array}(\lor L) \begin{array}{c} \Gamma\vdash A,\Delta\quad \Sigma\vdash B,\Pi\\ \hline \Gamma,\Sigma\vdash A\land B,\Delta,\Pi \end{array}(\land R)
\begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash\Pi\\ \hline \Gamma,\Sigma,A\to B\vdash\Delta,\Pi \end{array}(\to L) \begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma\vdash A\to B,\Delta \end{array}(\to R)

One possible interpretation for these connectives in terms of subsets would be [\bot]:=\emptyset, [\top]:=X, [A\land B]:=[A]\cap [B], [A\lor B]:=[A]\cup [B], and [A\to B]:=[A]^c\cup [B].

But it may be more instructive to see an interpretation where one of the classical logical rules is violated. So let us use [A\to B]:=\mathrm{int}([A]^c\cup [B]) instead, where \mathrm{int} is the interior operator of some topological space. The propositions A_i (and B_j) are interpreted as open subsets in this case. The rule \begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma\vdash A\to B,\Delta \end{array}(\to R) is violated now, and has to be replaced by the rule \begin{array}{c} \Gamma,A\vdash B\\ \hline \Gamma\vdash A\to B \end{array}(\to R_J). This gives us the intuitionistic sequent calculus, which exactly characterizes the valid conclusions of intuitionistic logic.

To see that (\to R) is violated, let \Gamma=\top, A correspond to [A]=(0,\infty), B=\bot, and \Delta = A. Above the line we have \mathbb R \cap (0,\infty) \subseteq \emptyset \cup (0,\infty), which is true. Below the line we have \mathbb R \subseteq \mathrm{int}((-\infty,0])\cup (0,\infty), which is false.

An evil twin of sequent calculus

Note that implication satisfies C\land A \vdash B \Leftrightarrow C \vdash (A\to B) or rather [C]\cap [A] \subseteq [B] \Leftrightarrow [C] \subseteq [A\to B]. Let us replace implication by minus. Note that minus satisfies [A] \subseteq [B]\cup [C] \Leftrightarrow [A-B] \subseteq [C] with [A-B]:=[A]{}\setminus{}[B]. Then we get the following two rules instead of (\to L) and (\to R).

\begin{array}{c} \Gamma,A\vdash B,\Delta\\ \hline \Gamma,A- B\vdash \Delta \end{array}(- L) \begin{array}{c} \Gamma\vdash A,\Delta\quad\Sigma,B\vdash \Pi\\ \hline \Gamma,\Sigma\vdash A-B,\Delta,\Pi \end{array}(- R)

Of course, we also remove \to and \top from the language together with the rule (\top R). This sequent calculus is still as sound and complete as the original sequent calculus. But we no longer reason about implication, but only about minus. Some sort of implication is still present in \vdash, but it is no longer mirrored internal in the language of the logic itself. So this is our logic without truth.

I don’t really know (or understand) whether this sort of context morphism has any sort of relevance, and whether that logic without truth occurs anywhere in the real world. Is there any relation to the fact that it is easier to deny the relevance or truth of a given conclusion than to prove that it is important and true? What I like about that logic is the asymmetry between implication and falsehood, because I wanted to find naturally occurring asymmetries in mathematical hierarchies and logic. Even for the results that I do want to publish, I have the same problem that I don’t really understand the relevance of the corresponding context morphisms, whether there even should be context morphisms, and whether my proposed context morphisms are the correct ones.

Conclusions?

That post initially also contained a logic without falsehood, or rather a logic where falsehood is not used. But it started to get long, and this post is already long enough. Not sure whether this was really a good idea, since the explanation of the sequence calculus was also intended to better understand how such a logic with a reduced set of logical constants and connectives still maintains its main features. Maybe I will manage to create another blog post from the removed material. Or maybe nobody including myself cares anyway, as already indicated at the beginning. Or maybe I should better use my time to finish the paper about the results I wrote about at the beginning, and submit them to…

Posted in logic, partial functions | Tagged , | 6 Comments