Our protagonists are the following (DLogTime-) uniform circuit classes

- NC
^{1}(ALogTime) - SAC
^{1}(LogCFL) - TC
^{0}(threshold circuits)

Interesting things can be said about those classes. Things like Barrington’s theorem (NC^{1}), closure under complementation by inductive counting (SAC^{1}), or circuits for division and iterated multiplication (TC^{0}). Things like TC^{0} recognises the Dyck language with two types of parentheses, NC^{1} recognises visibly pushdown languages, or SAC^{1} recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space).

But how are those classes related to dreams about speed? There are sometimes situations where both fixed width integers and floating point numbers are problematic, but much faster than better alternatives because of explicit hardware support. The dream is to allow arbitrary alternative number formats getting sufficient hardware support to remain reasonable alternatives to the number formats supported explicitly by current hardware.

### ALogTime – fast solutions for predictable stack height

Considering how parameter passing to subroutines and a huge part of memory management in mainstream computer languages is stack based, an obvious and natural variation is to restrict the unbounded memory of a Turing machine to be a stack. With those words, I justified copying a part of my answer to my own question yet another time. During my research for that answer, I also learned about Nested Words and Visibly Pushdown Languages and that they can be recognized in LOGSPACE. Recently, I read

The Boolean formula value problem is in ALOGTIME (25 pages)

https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean/paper.pdf

by S Buss – Jan 1987

I immediately suspected that Visibly Pushdown Languages can also be recognized in ALogTime, and that recognition in LOGSPACE is just a simple corollary. Indeed it is:

Complexity of Input-Driven Pushdown Automata (21 pages)

http://research.cs.queensu.ca/~ksalomaa/julk/p47-okhotin.pdf

by A Okhotin, K Salomaa – 2014

The remarks following the proof of theorem 8 (with the beautiful Figure 3 shown above) say: “Finally, Dymond [12] applied the method of Buss [7] to implement the Mehlhorn–Rytter algorithm [36] in NC^{1}“. Dymond’s paper is short and sweet:

Input-driven languages are in log n depth (4 pages)

https://www.sciencedirect.com/science/article/pii/0020019088901482?via%3Dihub

by P Dymond – 1988

Polynomial size log depth circuits: Between NC^{1} and AC^{1} by Meena Mahajan gives a different summary of Dymond’s paper:

Dymond gave a nice construction [14] showing that they are in fact in NC^{1}. His approach is generic and works not just for VPAs but for any PDA satisfying the following:

- no ϵ moves,
- an accepting run should end with an empty stack
- the height of the pushdown, after processing i letters of the input, should be computable in NC
^{1}. If there is more than one run (nondeterministic PDA), and if the height profiles across different runs are different, then the heights computed should be consistent with a single run. Furthermore, if there is an accepting run, then the heights computed should be consistent with some accepting run.

For such PDA, Dymond transforms the problem of recognition to an instance of formula value problem, and then invokes Buss’s ALogTime algorithm [8] for it.

This post links extensively to external material. It is clear that the reader of this post will not read most of it. The more interesting question is whether I have read that material, or at least plan to read it. The initial plan of this post was to indicate which material I have already read and understood, and which material I still plan to read. But this is unpractical and boring for the reader. What I want to say is that I have read Buss’ paper in principle, but didn’t yet work thoroughly enough through section 7, and I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive? In some of the other papers quoted above, I only read the parts which interested me. And here is another paper by Buss et al which I still plan to read:

An optimal parallel algorithm for formula evaluation (42 pages)

https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean2/finalversion.pdf

by S Buss, S Cook, A Gupta, V Ramachandran – Nov 20, 1990

### Parallel and distributed computing

The main conclusion of the last section is that ALogTime is quite powerful, because it can recognize languages accepted by input driven stack automata. I wondered whether this result is taught in typical introductions to parallel computing. Here is my sample:

Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (104 pages)

http://www.umiacs.umd.edu/users/vishkin/PUBLICATIONS/classnotes.pdf

by Uzi Vishkin – October 12, 2010

Parallel Algorithms (64 pages)

https://www.cs.cmu.edu/~guyb/papers/BM04.pdf

by Guy E. Blelloch and Bruce M. Maggs – 1996

Limits to Parallel Computation: P-Completeness theory (327 pages)

https://homes.cs.washington.edu/~ruzzo/papers/limits.pdf

by R Greenlaw, H Hoover, W Ruzzo – 1995

Parallel Algorithms (347 pages)

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.466.8142&rep=rep1&type=pdf

by H Casanova, A Legrand, Y Robert – 2008

My conclusion is that it is not taught, at least not explicitly. And I already took care that they would talk about ALogTime by saying (uniform) NC^{1}. Chapter 10 where Uzi Vishkin talks about tree contraction is related, and NC^{1} is discussed in some of the texts. (It should be clear that I didn’t really try to read those texts, and just browsed through them.)

### Some words on LogCFL and threshold circuits

I looked for hardware support for LogCFL, as seems possible for NC^{1}

A more promising approach could be to look for problems complete under appropriate many-one reductions for these classes. The word problem for (or ) would be such a problem for NC

^{1}. The solution of that word problem is nicely scalable, since one can subdivide the word into smaller parts, and thereby generate a smaller word on which the same procedure can be applied again.

That many-one reduction to that word problem is Barrington’s theorem proved in

Bounded-width polynomial-size branching programs recognize exactly those languages in NC^{1} (15 pages)

https://people.cs.umass.edu/~barring/publications/bwbp.pdf

https://www.sciencedirect.com/science/article/pii/0022000089900378

by David A.Barrington – 1987

One thing which could still go wrong when trying to use TC^{1}, NC^{1}, and SAC^{1} for actual efficient computer architectures is that those fast and nice many-one reductions are studied to decision problems, but in the end we are much more interested in computing solutions to function problems. This came up when I described my plans to a friend. But I am not the first to dream about hardware support for those theoretical fast solutions:

Explicit Multi-Threading (XMT): A PRAM-On-Chip Vision

A Desktop Supercomputer (4 pages)

http://www.umiacs.umd.edu/users/vishkin/XMT/index.shtml

http://www.umiacs.umd.edu/users/vishkin/XMT/ICCD09proc-version.pdf

by Uzi Vishkin – 2009

Back to LogCFL: for “closure under complementation by inductive counting,” see:

Two applications of inductive counting for complementation problems (20 pages)

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.394.1662&rep=rep1&type=pdf

by A Borodin, S Cook, P Dymond, W Ruzzo, M Tompa – 1989

and for: “SAC^{1} recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space),” note that LogCFL = AuxPDA(log n, n^{O(1)}) = SAC^{1} see here. This means the class of log n auxiliary pushdown automaton which run in non-deterministic polynomial time. (Probably “non-deterministic” can also be replaced by “deterministic”. The direct meaning of LogCFL is the class of problems which can be many-one reduced to CFL in deterministic logspace.) I also guess that LogCFL can be solved by a dynamic programming algorithm, since both NL and CFL allow such a solution. (Probably an easy exercise. I just wanted to have LogCFL in this post, since it is part of my dream, and allows to solve interesting problems. And it seems close to ALogTime from a hardware perspective.)

Now to threshold circuits: “TC^{0} recognises the Dyck language with two types of parentheses”. I don’t have a good primary reference at the moment. But this shows that threshold circuits are surprisingly close to ALogTime with respect to their capabilities. And “circuits for division and iterated multiplication,” has some references in the wikipedia article, but I liked the following lecture notes:

Lecture 8: Multiplication ∈ TC^{0} (4 pages)

https://users-cs.au.dk/arnsfelt/CT08/scribenotes/lecture8.pdf

by Kristoffer Arnsfelt Hansen – 2008

The paper by P Dymond – 1988 ends with the following speculation:

This result shows that a large class of cfl’s is in log depths, and it is natural to inquire about the possibility that all deterministic cfl’s, which are known to be in log space, might also be in log depth. One dcfl in log space which is not currently known to be recognizable in log depth is the two-sided Dyck language on four letters [5].

Word Problems Solvable in Logspace

https://rjlipton.wordpress.com/2009/04/16/the-word-problem-for-free-groups/

http://cpsc.yale.edu/sites/default/files/files/tr60.pdf

by RJ Lipton, Y Zalcstein – 1977

It is an interesting question whether ALogTime and LOGSPACE are different for naturally occurring problems. Including LogCFL in the hardware acceleration tries to evade this question. But there are programming approaches specifically focusing on LOGSPACE, since it may model an important aspect of practical algorithms for really big data:

Computation-by-Interaction for Structuring Low-Level Computation

http://www2.tcs.ifi.lmu.de/~schoepp/habil.pdf

by U Schöpp – 2014

### Understanding whether ALogTime != PH is hard to prove

I was wondering about how to define function classes for ALogTime: How to transform words over an input alphabet to words over an output alphabet. I wondered whether I should write a mail to Emil Jeřábek. Maybe I only wanted to have a reason to write him, but I didn’t. Instead, I googled my way until I understood even much weaker function classes. Then a naive idea for proving ALogTime != PH crossed my mind: Prove some problem complete for ALogTime under some extremely weak many-one reduction. If that would work, it would be sort of a “fool’s mate”, since no genuinely new idea is needed, just a careful application of well known techniques. So when Scott Aaronson said

On reflection, though, Mubos has a point: all of us, including me, should keep an open mind. Maybe P≠NP (or P=NP!) is vastly easier to prove than most experts think, and is susceptible to a “fool’s mate.”

in HTTPS / Kurtz / eclipse / Charlottesville / Blum / P vs. NP, I couldn’t help but ask him about whether my naive idea has any chance of proving ALogTime != PH, even if just for testing whether he really keeps such an open mind. His answer was: “I don’t know enough to give you detailed feedback about your specific approach”. So he keeps an open mind, even if Stephen A. Cook is slightly more open with respect to the possibility that separating P from weaker classes like NL or ALogTime could be within reach of current techniques:

The Tree Evaluation Problem: Towards Separating P from NL (8 pages)

https://www.cs.toronto.edu/~toni/Courses/PvsNP/CS2429.html

https://www.cs.toronto.edu/~toni/Courses/PvsNP/Lectures/lecture6.pdf

by Stephen A. Cook – 2014

This post has many links to papers like the one above. As if I could justify myself for believing that ALogTime can be separated from P by current techniques. But the links show at least that investing time into this topic is an interesting learning experience. The following sections **Weak reductions and isomorphism theorems**, and **DAG automata** link to material which I read (or still plan to read) in order to make progress on ALogTime != P.

### Weak reductions and isomorphism theorems

DLogTime many-one reduction already look quite weak. (I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive?) One could go weaker by looking at DLogTime-uniform NC^{0} reduction or even DLogTime-uniform projections (each output symbol/bit depends at most on a single input symbol/bit). But recently, people proposed even weaker uniformity notions than DLogTime:

A circuit uniformity sharper than DLogTime (19 pages)

https://hal.archives-ouvertes.fr/hal-00701420/document

by G Bonfante, V Mogbil – May 25, 2012

This proposed rational-uniformity can be summarized as as writing in binary and applying a finite state transducer to that input for computing the circuit description. I haven’t read the full paper yet. Another recently proposed approach uses descriptive complexity for defining very weak uniformity notions:

The lower reaches of circuit uniformity (16 pages)

https://eccc.weizmann.ac.il/report/2011/095/

by C Behle, A Krebs, K-J Lange, P McKenzie – May 12, 2012

I didn’t understand too much of that paper yet. I tried reading revision 0 (didn’t notice that it wasn’t the latest revision), but revision 1 might turn out to be easier to read: “Results are stated in more clear way, and overall readability was improved.”

Another line of attack for making the many-one reductions weaker is to only allow isomorphisms. It may seem counter-intuitive first that this should be possible, but the Schröder-Bernstein theorem can sometimes be adapted. This is described in two classic papers:

Reductions in Circuit Complexity: An Isomorphism Theorem and a Gap Theorem (17 pages)

https://www.sciencedirect.com/science/article/pii/S0022000098915835

http://ftp.cs.rutgers.edu/pub/allender/switch3.pdf

www.cs.cmu.edu/~rudich/papers/ac0reductions.ps

by M Agrawala, E Allender, S Rudich – 1998

Reducing the Complexity of Reductions (9 pages or 22 pages)

https://cseweb.ucsd.edu/~russell/reductions.ps

http://ftp.cs.rutgers.edu/pub/allender/stoc97.pdf

by M Agrawala, E Allender, R Impagliazzo, T Pitassi, S Rudich – 2001

I didn’t read those papers yet, but Algebra, Logic and Complexity in Celebration of Eric Allender and Mike Saks by Neil Immerman gives a nice survey of those results, which I read for a start.

### DAG automata

One important property of input driven stack automata is that the dependency graphs will be planar DAGs. For general polynomial time algorithms, the dependency graphs will be general non-planar DAGs (due to variable binding with an unbounded number of variables). In order to learn what is already known about this, I googled “DAG automata”. The first three hits are awesome:

DAG Automata – Variants, Languages and Properties (48 pages)

http://www8.cs.umu.se/education/examina/Rapporter/JohannesBlum_submitted.pdf

by J Blum – May 25, 2015

How Useful are Dag Automata? (26 pages)

https://hal.archives-ouvertes.fr/hal-00077510/document

by S Anantharaman, P Narendran, M Rusinowitch – 2004

DAG Automata for Meaning Representation (12 pages)

http://aclweb.org/anthology/W17-3409

by F Drewes – London, UK, July 13–14, 2017

The first and third hit are related, Frank Drewes was the adviser of Johannes Blum. I read both, and they were really worth it. (I only browsed the paper from 2004, just to see that the newer papers really achieved a breakthrough.) They reduce determining which rules are useless to a question about Petri nets, and then provide a solution for that problem: “A Petri net is structurally cyclic if every configuration is reachable from itself in one or more steps. We show that structural cyclicity is decidable in deterministic polynomial time.”

Structurally Cyclic Petri Nets (9 pages)

https://arxiv.org/abs/1510.08331

by F Drewes, J Leroux – 2015

### On understanding a problem, without solving it

One can understand a problem without solving it. So one can understand why something is a problem and why a solution would be challenging, without being able to come up with a solution. I don’t know whether the best experts in computer science understand P != NP in this sense. I don’t understand it yet, but I have the deeply rooted belief that it can be understood.

And understanding for me also means to be able to read work that others have done, and to see and feel the connection to the questions which interest me.

### Conclusions?

This post is too long, but I didn’t really care this time. Since the reader is expected to read at least some of the linked material, it is implicitly even much longer than its nominal length (which is too long) indicates. The initial goal was to write it in order to be able to focus again on more urgent topics. But of course, this didn’t really worked out that way. Still, this post was always intended to be mainly a collection of links to papers I already read or still plan to read in order make progress on understanding ALogTime, LogCFL, and threshold circuits. And I also knew that I would try to justify myself for believing that this topic is worth studying.

Do I believe that the dreams of fast solutions can come true? I don’t know. Uzi Vishkin seems to have made a much more serious effort than I ever will, and even argued his case from an economic perspective:

Alas, the software spiral is now broken: (a) nobody is building hardware that provides improved performance on the old serial software base; (b) there is no broad parallel computing application software base for which hardware vendors are committed to improve performance; and (c) no agreed-upon architecture currently allows application programmers to build such software base for the future

Could proof of work problems for crytocurrency mining provide the incentive to make dreams come true? Who knows, money is important, after all. But personally, I am rather a believer in “constant dripping wears away the stone,” so I think in the long run, it is quite possible that things progress. Probably in ways I cannot even imagine today.

Surprisingly, this post has been a continuous joy for me to read, since I wrote it one month ago. Of course, I don’t read the post itself, but the papers it mentions. Let me begin by an update on the following open question:

I worked thoroughly through section 7 in the meantime, and I understand now why DLogTime is not transitive. I also learned to appreciate Emil Jeřábek’s detailed answer on the fine grained relation between LH and AC0 in this context. (Especially the part: “Additionally, we require that the ATM makes only

onesuch query, at the very end of the computation, and returns the answer of the oracle as the result of the computation.”) And I found a later paper by S. Buss on the same subject:Algorithms for Boolean Formula Evaluation and for Tree Contraction (19 pages)

https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean3/paper.pdf

by S Buss – Oct 1991

Those papers would have been easy to find, if…

https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean

https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean2

https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean3

At the end of the section “Parallel and distributed computing”, I wrote:

Well, I worked through the survey “Parallel Algorithms” by G. Blelloch and B. Maggs in the meantime, and finished the first quarter of “Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques” by U. Vishkin. The terms EREW, CREW, CRCW (E=exclusive, C=concurent, R=read, W=write) are explained, but their detailed relation to NC1, L, NL, SAC1, and AC1 is not discussed. I found such a discussion (including the interesting fact that L is in EREW1 by viewing L as directed tree reachability) in section 3 “Models of Parallel Computation” of

A Survey of Parallel Algorithms for Shared-Memory Machines (70 pages)

https://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/5865.html

https://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/CSD-88-408.pdf

by Richard M. Karp and Vijaya Ramachandran – March 1988

At the end of section “Weak reductions and isomorphism theorems”, I wrote:

Immerman’s survey presentation was from Jan 26, 2017. But the subject is actually much wider than I gathered from that survey presentation and the two papers I linked to. There are good recent survey papers which give a better impression how wide that subject really is:

The Isomorphism Conjecture for NP (23 pages)

https://www.cse.iitk.ac.in/users/manindra/survey/Isomorphism-Conjecture.pdf

by Manindra Agrawal – Dec 19, 2009

Investigations Concerning the Structure of Complete Sets (14 pages)

http://ftp.cs.rutgers.edu/pub/allender/isomorphism.pdf

by Eric Allender – 2014

Pingback: Theory and practice of signed-digit representations | Gentzen translated

In section “3.9 Parallel Computation Thesis” in “A Survey of Parallel Algorithms for Shared-Memory Machines” by R. Karp and V. Ramachandran linked to in one of the comments above, there is the following paragraph:

The paragraph confused me, because both expression evaluation and tree contraction can be done in NC^1. However, NC^1 = Logspace would be a major result. And I didn’t see how those arguments would suffice to show that Logspace is in EREW^1, if they should be insufficient to show that NC^1 = Logspace. Now I finally took the time to investigate this question in detail. My conclusion is that Logspace is really in EREW^1, but that the proof is missing a sort step at the beginning, which makes crucial use of the fact that EREW^1 can compare two numbers in constant time, which explains why that proof doesn’t go through for NC^1.

A smaller issue with that paragraph was that the tree contraction algorithm of Section 2.2 only worked for binary trees, even so there was a remark that it is easy to generalize for arbitrary trees. OK, but would that generalized algorithm still be in EREW^1? A newer paper made the generalization precise and also highlighted the points where things can go wrong:

A Practical Tree Contraction Algorithm for Parallel Skeletons on Trees of Unbounded Degree

Akimasa Morihata, Kiminori Matsuzaki – 2011

The conclusion is that it can indeed be done in EREW^1. Can this algorithm also be implemented in NC^1? I guess so. However, I will first work through the three papers by Sam Buss (et al.) on that problem again, before I form my own opinion and conclusion. Basically, my guess is that EREW^1 like pointer jumping/manipulation can be emulated in NC^1 while CREW^1 like pointer jumping cannot.

Unrelated, already in 2. Jan. 2018, the following paragraph from section “3.4 Relation between circuits and PRAMS” caught my interest:

Then I read the references paper. At the moment, that paper is freely available:

Bounding Fan-out in Logical Networks

Howard James Hoover, Maria M Klawe, Nicholas J Pippenger – 1984

It is funny. It uses a sort of Huffman compression to determine the optimal ordering that minimizes the depth.