Our protagonists are the following (DLogTime-) uniform circuit classes
Interesting things can be said about those classes. Things like Barrington’s theorem (NC1), closure under complementation by inductive counting (SAC1), or circuits for division and iterated multiplication (TC0). Things like TC0 recognises the Dyck language with two types of parentheses, NC1 recognises visibly pushdown languages, or SAC1 recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space).
But how are those classes related to dreams about speed? There are sometimes situations where both fixed width integers and floating point numbers are problematic, but much faster than better alternatives because of explicit hardware support. The dream is to allow arbitrary alternative number formats getting sufficient hardware support to remain reasonable alternatives to the number formats supported explicitly by current hardware.
ALogTime – fast solutions for predictable stack height
Considering how parameter passing to subroutines and a huge part of memory management in mainstream computer languages is stack based, an obvious and natural variation is to restrict the unbounded memory of a Turing machine to be a stack. With those words, I justified copying a part of my answer to my own question yet another time. During my research for that answer, I also learned about Nested Words and Visibly Pushdown Languages and that they can be recognized in LOGSPACE. Recently, I read
The Boolean formula value problem is in ALOGTIME (25 pages)
by S Buss – Jan 1987
I immediately suspected that Visibly Pushdown Languages can also be recognized in ALogTime, and that recognition in LOGSPACE is just a simple corollary. Indeed it is:
Complexity of Input-Driven Pushdown Automata (21 pages)
by A Okhotin, K Salomaa – 2014
The remarks following the proof of theorem 8 (with the beautiful Figure 3 shown above) say: “Finally, Dymond  applied the method of Buss  to implement the Mehlhorn–Rytter algorithm  in NC1“. Dymond’s paper is short and sweet:
Input-driven languages are in log n depth (4 pages)
by P Dymond – 1988
Polynomial size log depth circuits: Between NC1 and AC1 by Meena Mahajan gives a different summary of Dymond’s paper:
Dymond gave a nice construction  showing that they are in fact in NC1. His approach is generic and works not just for VPAs but for any PDA satisfying the following:
- no ϵ moves,
- an accepting run should end with an empty stack
- the height of the pushdown, after processing i letters of the input, should be computable in NC1. If there is more than one run (nondeterministic PDA), and if the height profiles across different runs are different, then the heights computed should be consistent with a single run. Furthermore, if there is an accepting run, then the heights computed should be consistent with some accepting run.
For such PDA, Dymond transforms the problem of recognition to an instance of formula value problem, and then invokes Buss’s ALogTime algorithm  for it.
This post links extensively to external material. It is clear that the reader of this post will not read most of it. The more interesting question is whether I have read that material, or at least plan to read it. The initial plan of this post was to indicate which material I have already read and understood, and which material I still plan to read. But this is unpractical and boring for the reader. What I want to say is that I have read Buss’ paper in principle, but didn’t yet work thoroughly enough through section 7, and I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive? In some of the other papers quoted above, I only read the parts which interested me. And here is another paper by Buss et al which I still plan to read:
An optimal parallel algorithm for formula evaluation (42 pages)
by S Buss, S Cook, A Gupta, V Ramachandran – Nov 20, 1990
Parallel and distributed computing
The main conclusion of the last section is that ALogTime is quite powerful, because it can recognize languages accepted by input driven stack automata. I wondered whether this result is taught in typical introductions to parallel computing. Here is my sample:
Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (104 pages)
by Uzi Vishkin – October 12, 2010
Parallel Algorithms (64 pages)
by Guy E. Blelloch and Bruce M. Maggs – 1996
Limits to Parallel Computation: P-Completeness theory (327 pages)
by R Greenlaw, H Hoover, W Ruzzo – 1995
Parallel Algorithms (347 pages)
by H Casanova, A Legrand, Y Robert – 2008
My conclusion is that it is not taught, at least not explicitly. And I already took care that they would talk about ALogTime by saying (uniform) NC1. Chapter 10 where Uzi Vishkin talks about tree contraction is related, and NC1 is discussed in some of the texts. (It should be clear that I didn’t really try to read those texts, and just browsed through them.)
Some words on LogCFL and threshold circuits
I looked for hardware support for LogCFL, as seems possible for NC1
A more promising approach could be to look for problems complete under appropriate many-one reductions for these classes. The word problem for (or ) would be such a problem for NC1. The solution of that word problem is nicely scalable, since one can subdivide the word into smaller parts, and thereby generate a smaller word on which the same procedure can be applied again.
That many-one reduction to that word problem is Barrington’s theorem proved in
Bounded-width polynomial-size branching programs recognize exactly those languages in NC1 (15 pages)
by David A.Barrington – 1987
One thing which could still go wrong when trying to use TC1, NC1, and SAC1 for actual efficient computer architectures is that those fast and nice many-one reductions are studied to decision problems, but in the end we are much more interested in computing solutions to function problems. This came up when I described my plans to a friend. But I am not the first to dream about hardware support for those theoretical fast solutions:
Explicit Multi-Threading (XMT): A PRAM-On-Chip Vision
A Desktop Supercomputer (4 pages)
by Uzi Vishkin – 2009
Back to LogCFL: for “closure under complementation by inductive counting,” see:
Two applications of inductive counting for complementation problems (20 pages)
by A Borodin, S Cook, P Dymond, W Ruzzo, M Tompa – 1989
and for: “SAC1 recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space),” note that LogCFL = AuxPDA(log n, nO(1)) = SAC1 see here. This means the class of log n auxiliary pushdown automaton which run in non-deterministic polynomial time. (Probably “non-deterministic” can also be replaced by “deterministic”. The direct meaning of LogCFL is the class of problems which can be many-one reduced to CFL in deterministic logspace.) I also guess that LogCFL can be solved by a dynamic programming algorithm, since both NL and CFL allow such a solution. (Probably an easy exercise. I just wanted to have LogCFL in this post, since it is part of my dream, and allows to solve interesting problems. And it seems close to ALogTime from a hardware perspective.)
Now to threshold circuits: “TC0 recognises the Dyck language with two types of parentheses”. I don’t have a good primary reference at the moment. But this shows that threshold circuits are surprisingly close to ALogTime with respect to their capabilities. And “circuits for division and iterated multiplication,” has some references in the wikipedia article, but I liked the following lecture notes:
Lecture 8: Multiplication ∈ TC0 (4 pages)
by Kristoffer Arnsfelt Hansen – 2008
The paper by P Dymond – 1988 ends with the following speculation:
This result shows that a large class of cfl’s is in log depths, and it is natural to inquire about the possibility that all deterministic cfl’s, which are known to be in log space, might also be in log depth. One dcfl in log space which is not currently known to be recognizable in log depth is the two-sided Dyck language on four letters .
Word Problems Solvable in Logspace
by RJ Lipton, Y Zalcstein – 1977
It is an interesting question whether ALogTime and LOGSPACE are different for naturally occurring problems. Including LogCFL in the hardware acceleration tries to evade this question. But there are programming approaches specifically focusing on LOGSPACE, since it may model an important aspect of practical algorithms for really big data:
Computation-by-Interaction for Structuring Low-Level Computation
by U Schöpp – 2014
Understanding whether ALogTime != PH is hard to prove
I was wondering about how to define function classes for ALogTime: How to transform words over an input alphabet to words over an output alphabet. I wondered whether I should write a mail to Emil Jeřábek. Maybe I only wanted to have a reason to write him, but I didn’t. Instead, I googled my way until I understood even much weaker function classes. Then a naive idea for proving ALogTime != PH crossed my mind: Prove some problem complete for ALogTime under some extremely weak many-one reduction. If that would work, it would be sort of a “fool’s mate”, since no genuinely new idea is needed, just a careful application of well known techniques. So when Scott Aaronson said
On reflection, though, Mubos has a point: all of us, including me, should keep an open mind. Maybe P≠NP (or P=NP!) is vastly easier to prove than most experts think, and is susceptible to a “fool’s mate.”
in HTTPS / Kurtz / eclipse / Charlottesville / Blum / P vs. NP, I couldn’t help but ask him about whether my naive idea has any chance of proving ALogTime != PH, even if just for testing whether he really keeps such an open mind. His answer was: “I don’t know enough to give you detailed feedback about your specific approach”. So he keeps an open mind, even if Stephen A. Cook is slightly more open with respect to the possibility that separating P from weaker classes like NL or ALogTime could be within reach of current techniques:
The Tree Evaluation Problem: Towards Separating P from NL (8 pages)
by Stephen A. Cook – 2014
This post has many links to papers like the one above. As if I could justify myself for believing that ALogTime can be separated from P by current techniques. But the links show at least that investing time into this topic is an interesting learning experience. The following sections Weak reductions and isomorphism theorems, and DAG automata link to material which I read (or still plan to read) in order to make progress on ALogTime != P.
Weak reductions and isomorphism theorems
DLogTime many-one reduction already look quite weak. (I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive?) One could go weaker by looking at DLogTime-uniform NC0 reduction or even DLogTime-uniform projections (each output symbol/bit depends at most on a single input symbol/bit). But recently, people proposed even weaker uniformity notions than DLogTime:
A circuit uniformity sharper than DLogTime (19 pages)
by G Bonfante, V Mogbil – May 25, 2012
This proposed rational-uniformity can be summarized as as writing in binary and applying a finite state transducer to that input for computing the circuit description. I haven’t read the full paper yet. Another recently proposed approach uses descriptive complexity for defining very weak uniformity notions:
The lower reaches of circuit uniformity (16 pages)
by C Behle, A Krebs, K-J Lange, P McKenzie – May 12, 2012
I didn’t understand too much of that paper yet. I tried reading revision 0 (didn’t notice that it wasn’t the latest revision), but revision 1 might turn out to be easier to read: “Results are stated in more clear way, and overall readability was improved.”
Another line of attack for making the many-one reductions weaker is to only allow isomorphisms. It may seem counter-intuitive first that this should be possible, but the Schröder-Bernstein theorem can sometimes be adapted. This is described in two classic papers:
Reductions in Circuit Complexity: An Isomorphism Theorem and a Gap Theorem (17 pages)
by M Agrawala, E Allender, S Rudich – 1998
Reducing the Complexity of Reductions (9 pages or 22 pages)
by M Agrawala, E Allender, R Impagliazzo, T Pitassi, S Rudich – 2001
I didn’t read those papers yet, but Algebra, Logic and Complexity in Celebration of Eric Allender and Mike Saks by Neil Immerman gives a nice survey of those results, which I read for a start.
One important property of input driven stack automata is that the dependency graphs will be planar DAGs. For general polynomial time algorithms, the dependency graphs will be general non-planar DAGs (due to variable binding with an unbounded number of variables). In order to learn what is already known about this, I googled “DAG automata”. The first three hits are awesome:
DAG Automata – Variants, Languages and Properties (48 pages)
by J Blum – May 25, 2015
How Useful are Dag Automata? (26 pages)
by S Anantharaman, P Narendran, M Rusinowitch – 2004
DAG Automata for Meaning Representation (12 pages)
by F Drewes – London, UK, July 13–14, 2017
The first and third hit are related, Frank Drewes was the adviser of Johannes Blum. I read both, and they were really worth it. (I only browsed the paper from 2004, just to see that the newer papers really achieved a breakthrough.) They reduce determining which rules are useless to a question about Petri nets, and then provide a solution for that problem: “A Petri net is structurally cyclic if every configuration is reachable from itself in one or more steps. We show that structural cyclicity is decidable in deterministic polynomial time.”
Structurally Cyclic Petri Nets (9 pages)
by F Drewes, J Leroux – 2015
On understanding a problem, without solving it
One can understand a problem without solving it. So one can understand why something is a problem and why a solution would be challenging, without being able to come up with a solution. I don’t know whether the best experts in computer science understand P != NP in this sense. I don’t understand it yet, but I have the deeply rooted belief that it can be understood.
And understanding for me also means to be able to read work that others have done, and to see and feel the connection to the questions which interest me.
This post is too long, but I didn’t really care this time. Since the reader is expected to read at least some of the linked material, it is implicitly even much longer than its nominal length (which is too long) indicates. The initial goal was to write it in order to be able to focus again on more urgent topics. But of course, this didn’t really worked out that way. Still, this post was always intended to be mainly a collection of links to papers I already read or still plan to read in order make progress on understanding ALogTime, LogCFL, and threshold circuits. And I also knew that I would try to justify myself for believing that this topic is worth studying.
Do I believe that the dreams of fast solutions can come true? I don’t know. Uzi Vishkin seems to have made a much more serious effort than I ever will, and even argued his case from an economic perspective:
Alas, the software spiral is now broken: (a) nobody is building hardware that provides improved performance on the old serial software base; (b) there is no broad parallel computing application software base for which hardware vendors are committed to improve performance; and (c) no agreed-upon architecture currently allows application programmers to build such software base for the future
Could proof of work problems for crytocurrency mining provide the incentive to make dreams come true? Who knows, money is important, after all. But personally, I am rather a believer in “constant dripping wears away the stone,” so I think in the long run, it is quite possible that things progress. Probably in ways I cannot even imagine today.