## ALogTime, LogCFL, and threshold circuits: dreams of fast solutions

Our protagonists are the following (DLogTime-) uniform circuit classes

Interesting things can be said about those classes. Things like Barrington’s theorem (NC1), closure under complementation by inductive counting (SAC1), or circuits for division and iterated multiplication (TC0). Things like TC0 recognises the Dyck language with two types of parentheses, NC1 recognises visibly pushdown languages, or SAC1 recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space).

But how are those classes related to dreams about speed? There are sometimes situations where both fixed width integers and floating point numbers are problematic, but much faster than better alternatives because of explicit hardware support. The dream is to allow arbitrary alternative number formats getting sufficient hardware support to remain reasonable alternatives to the number formats supported explicitly by current hardware.

### ALogTime – fast solutions for predictable stack height

Considering how parameter passing to subroutines and a huge part of memory management in mainstream computer languages is stack based, an obvious and natural variation is to restrict the unbounded memory of a Turing machine to be a stack. With those words, I justified copying a part of my answer to my own question yet another time. During my research for that answer, I also learned about Nested Words and Visibly Pushdown Languages and that they can be recognized in LOGSPACE. Recently, I read

The Boolean formula value problem is in ALOGTIME (25 pages)
https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean/paper.pdf
by S Buss – Jan 1987

I immediately suspected that Visibly Pushdown Languages can also be recognized in ALogTime, and that recognition in LOGSPACE is just a simple corollary. Indeed it is:

Complexity of Input-Driven Pushdown Automata (21 pages)
http://research.cs.queensu.ca/~ksalomaa/julk/p47-okhotin.pdf
by A Okhotin, K Salomaa – 2014

The remarks following the proof of theorem 8 (with the beautiful Figure 3 shown above) say: “Finally, Dymond [12] applied the method of Buss [7] to implement the Mehlhorn–Rytter algorithm [36] in NC1“. Dymond’s paper is short and sweet:

Polynomial size log depth circuits: Between NC1 and AC1 by Meena Mahajan gives a different summary of Dymond’s paper:

Dymond gave a nice construction [14] showing that they are in fact in NC1. His approach is generic and works not just for VPAs but for any PDA satisfying the following:

1. no ϵ moves,
2. an accepting run should end with an empty stack
3. the height of the pushdown, after processing i letters of the input, should be computable in NC1. If there is more than one run (nondeterministic PDA), and if the height profiles across different runs are different, then the heights computed should be consistent with a single run. Furthermore, if there is an accepting run, then the heights computed should be consistent with some accepting run.

For such PDA, Dymond transforms the problem of recognition to an instance of formula value problem, and then invokes Buss’s ALogTime algorithm [8] for it.

An optimal parallel algorithm for formula evaluation (42 pages)
https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean2/finalversion.pdf
by S Buss, S Cook, A Gupta, V Ramachandran – Nov 20, 1990

### Parallel and distributed computing

The main conclusion of the last section is that ALogTime is quite powerful, because it can recognize languages accepted by input driven stack automata. I wondered whether this result is taught in typical introductions to parallel computing. Here is my sample:

Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques (104 pages)
http://www.umiacs.umd.edu/users/vishkin/PUBLICATIONS/classnotes.pdf
by Uzi Vishkin – October 12, 2010

Parallel Algorithms (64 pages)
https://www.cs.cmu.edu/~guyb/papers/BM04.pdf
by Guy E. Blelloch and Bruce M. Maggs – 1996

Limits to Parallel Computation: P-Completeness theory (327 pages)
https://homes.cs.washington.edu/~ruzzo/papers/limits.pdf
by R Greenlaw, H Hoover, W Ruzzo – 1995

Parallel Algorithms (347 pages)
by H Casanova, A Legrand, Y Robert – 2008

My conclusion is that it is not taught, at least not explicitly. And I already took care that they would talk about ALogTime by saying (uniform) NC1. Chapter 10 where Uzi Vishkin talks about tree contraction is related, and NC1 is discussed in some of the texts. (It should be clear that I didn’t really try to read those texts, and just browsed through them.)

### Some words on LogCFL and threshold circuits

I looked for hardware support for LogCFL, as seems possible for NC1

A more promising approach could be to look for problems complete under appropriate many-one reductions for these classes. The word problem for $\bar{A}_5$ (or $\bar{S}_5$) would be such a problem for NC1. The solution of that word problem is nicely scalable, since one can subdivide the word into smaller parts, and thereby generate a smaller word on which the same procedure can be applied again.

That many-one reduction to that word problem is Barrington’s theorem proved in

Bounded-width polynomial-size branching programs recognize exactly those languages in NC1 (15 pages)
https://people.cs.umass.edu/~barring/publications/bwbp.pdf
https://www.sciencedirect.com/science/article/pii/0022000089900378
by David A.Barrington – 1987

One thing which could still go wrong when trying to use TC1, NC1, and SAC1 for actual efficient computer architectures is that those fast and nice many-one reductions are studied to decision problems, but in the end we are much more interested in computing solutions to function problems. This came up when I described my plans to a friend. But I am not the first to dream about hardware support for those theoretical fast solutions:

Explicit Multi-Threading (XMT): A PRAM-On-Chip Vision
A Desktop Supercomputer (4 pages)
http://www.umiacs.umd.edu/users/vishkin/XMT/index.shtml
http://www.umiacs.umd.edu/users/vishkin/XMT/ICCD09proc-version.pdf
by Uzi Vishkin – 2009

Back to LogCFL: for “closure under complementation by inductive counting,” see:

Two applications of inductive counting for complementation problems (20 pages)
by A Borodin, S Cook, P Dymond, W Ruzzo, M Tompa – 1989

and for: “SAC1 recognises context free languages. (and also the languages recognisable in nondeterministic logarithmic-space),” note that LogCFL = AuxPDA(log n, nO(1)) = SAC1 see here. This means the class of log n auxiliary pushdown automaton which run in non-deterministic polynomial time. (Probably “non-deterministic” can also be replaced by “deterministic”. The direct meaning of LogCFL is the class of problems which can be many-one reduced to CFL in deterministic logspace.) I also guess that LogCFL can be solved by a dynamic programming algorithm, since both NL and CFL allow such a solution. (Probably an easy exercise. I just wanted to have LogCFL in this post, since it is part of my dream, and allows to solve interesting problems. And it seems close to ALogTime from a hardware perspective.)

Now to threshold circuits: “TC0 recognises the Dyck language with two types of parentheses”. I don’t have a good primary reference at the moment. But this shows that threshold circuits are surprisingly close to ALogTime with respect to their capabilities. And “circuits for division and iterated multiplication,” has some references in the wikipedia article, but I liked the following lecture notes:

Lecture 8: Multiplication ∈ TC0 (4 pages)
https://users-cs.au.dk/arnsfelt/CT08/scribenotes/lecture8.pdf
by Kristoffer Arnsfelt Hansen – 2008

The paper by P Dymond – 1988 ends with the following speculation:

This result shows that a large class of cfl’s is in log depths, and it is natural to inquire about the possibility that all deterministic cfl’s, which are known to be in log space, might also be in log depth. One dcfl in log space which is not currently known to be recognizable in log depth is the two-sided Dyck language on four letters [5].

Word Problems Solvable in Logspace
https://rjlipton.wordpress.com/2009/04/16/the-word-problem-for-free-groups/
http://cpsc.yale.edu/sites/default/files/files/tr60.pdf
by RJ Lipton, Y Zalcstein – 1977

It is an interesting question whether ALogTime and LOGSPACE are different for naturally occurring problems. Including LogCFL in the hardware acceleration tries to evade this question. But there are programming approaches specifically focusing on LOGSPACE, since it may model an important aspect of practical algorithms for really big data:

Computation-by-Interaction for Structuring Low-Level Computation
http://www2.tcs.ifi.lmu.de/~schoepp/habil.pdf
by U Schöpp – 2014

### Understanding whether ALogTime != PH is hard to prove

I was wondering about how to define function classes for ALogTime: How to transform words over an input alphabet to words over an output alphabet. I wondered whether I should write a mail to Emil Jeřábek. Maybe I only wanted to have a reason to write him, but I didn’t. Instead, I googled my way until I understood even much weaker function classes. Then a naive idea for proving ALogTime != PH crossed my mind: Prove some problem complete for ALogTime under some extremely weak many-one reduction. If that would work, it would be sort of a “fool’s mate”, since no genuinely new idea is needed, just a careful application of well known techniques. So when Scott Aaronson said

On reflection, though, Mubos has a point: all of us, including me, should keep an open mind. Maybe P≠NP (or P=NP!) is vastly easier to prove than most experts think, and is susceptible to a “fool’s mate.”

in HTTPS / Kurtz / eclipse / Charlottesville / Blum / P vs. NP, I couldn’t help but ask him about whether my naive idea has any chance of proving ALogTime != PH, even if just for testing whether he really keeps such an open mind. His answer was: “I don’t know enough to give you detailed feedback about your specific approach”. So he keeps an open mind, even if Stephen A. Cook is slightly more open with respect to the possibility that separating P from weaker classes like NL or ALogTime could be within reach of current techniques:

The Tree Evaluation Problem: Towards Separating P from NL (8 pages)
https://www.cs.toronto.edu/~toni/Courses/PvsNP/CS2429.html
https://www.cs.toronto.edu/~toni/Courses/PvsNP/Lectures/lecture6.pdf
by Stephen A. Cook – 2014

This post has many links to papers like the one above. As if I could justify myself for believing that ALogTime can be separated from P by current techniques. But the links show at least that investing time into this topic is an interesting learning experience. The following sections Weak reductions and isomorphism theorems, and DAG automata link to material which I read (or still plan to read) in order to make progress on ALogTime != P.

### Weak reductions and isomorphism theorems

DLogTime many-one reduction already look quite weak. (I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive?) One could go weaker by looking at DLogTime-uniform NC0 reduction or even DLogTime-uniform projections (each output symbol/bit depends at most on a single input symbol/bit). But recently, people proposed even weaker uniformity notions than DLogTime:

A circuit uniformity sharper than DLogTime (19 pages)
https://hal.archives-ouvertes.fr/hal-00701420/document
by G Bonfante, V Mogbil – May 25, 2012

This proposed rational-uniformity can be summarized as as writing $n$ in binary and applying a finite state transducer to that input for computing the circuit description. I haven’t read the full paper yet. Another recently proposed approach uses descriptive complexity for defining very weak uniformity notions:

The lower reaches of circuit uniformity (16 pages)
https://eccc.weizmann.ac.il/report/2011/095/
by C Behle, A Krebs, K-J Lange, P McKenzie – May 12, 2012

I didn’t understand too much of that paper yet. I tried reading revision 0 (didn’t notice that it wasn’t the latest revision), but revision 1 might turn out to be easier to read: “Results are stated in more clear way, and overall readability was improved.”

Another line of attack for making the many-one reductions weaker is to only allow isomorphisms. It may seem counter-intuitive first that this should be possible, but the Schröder-Bernstein theorem can sometimes be adapted. This is described in two classic papers:

Reductions in Circuit Complexity: An Isomorphism Theorem and a Gap Theorem (17 pages)
https://www.sciencedirect.com/science/article/pii/S0022000098915835
http://ftp.cs.rutgers.edu/pub/allender/switch3.pdf
www.cs.cmu.edu/~rudich/papers/ac0reductions.ps
by M Agrawala, E Allender, S Rudich – 1998

Reducing the Complexity of Reductions (9 pages or 22 pages)
https://cseweb.ucsd.edu/~russell/reductions.ps
http://ftp.cs.rutgers.edu/pub/allender/stoc97.pdf
by M Agrawala, E Allender, R Impagliazzo, T Pitassi, S Rudich – 2001

I didn’t read those papers yet, but Algebra, Logic and Complexity in Celebration of Eric Allender and Mike Saks by Neil Immerman gives a nice survey of those results, which I read for a start.

### DAG automata

One important property of input driven stack automata is that the dependency graphs will be planar DAGs. For general polynomial time algorithms, the dependency graphs will be general non-planar DAGs (due to variable binding with an unbounded number of variables). In order to learn what is already known about this, I googled “DAG automata”. The first three hits are awesome:

DAG Automata – Variants, Languages and Properties (48 pages)
http://www8.cs.umu.se/education/examina/Rapporter/JohannesBlum_submitted.pdf
by J Blum – May 25, 2015

How Useful are Dag Automata? (26 pages)
https://hal.archives-ouvertes.fr/hal-00077510/document
by S Anantharaman, P Narendran, M Rusinowitch – ‎2004

DAG Automata for Meaning Representation (12 pages)
http://aclweb.org/anthology/W17-3409
by F Drewes – London, UK, July 13–14, 2017

The first and third hit are related, Frank Drewes was the adviser of Johannes Blum. I read both, and they were really worth it. (I only browsed the paper from 2004, just to see that the newer papers really achieved a breakthrough.) They reduce determining which rules are useless to a question about Petri nets, and then provide a solution for that problem: “A Petri net is structurally cyclic if every configuration is reachable from itself in one or more steps. We show that structural cyclicity is decidable in deterministic polynomial time.”

Structurally Cyclic Petri Nets (9 pages)
https://arxiv.org/abs/1510.08331
by F Drewes, J Leroux – ‎2015

### On understanding a problem, without solving it

One can understand a problem without solving it. So one can understand why something is a problem and why a solution would be challenging, without being able to come up with a solution. I don’t know whether the best experts in computer science understand P != NP in this sense. I don’t understand it yet, but I have the deeply rooted belief that it can be understood.

And understanding for me also means to be able to read work that others have done, and to see and feel the connection to the questions which interest me.

### Conclusions?

This post is too long, but I didn’t really care this time. Since the reader is expected to read at least some of the linked material, it is implicitly even much longer than its nominal length (which is too long) indicates. The initial goal was to write it in order to be able to focus again on more urgent topics. But of course, this didn’t really worked out that way. Still, this post was always intended to be mainly a collection of links to papers I already read or still plan to read in order make progress on understanding ALogTime, LogCFL, and threshold circuits. And I also knew that I would try to justify myself for believing that this topic is worth studying.

Do I believe that the dreams of fast solutions can come true? I don’t know. Uzi Vishkin seems to have made a much more serious effort than I ever will, and even argued his case from an economic perspective:

Alas, the software spiral is now broken: (a) nobody is building hardware that provides improved performance on the old serial software base; (b) there is no broad parallel computing application software base for which hardware vendors are committed to improve performance; and (c) no agreed-upon architecture currently allows application programmers to build such software base for the future

Could proof of work problems for crytocurrency mining provide the incentive to make dreams come true? Who knows, money is important, after all. But personally, I am rather a believer in “constant dripping wears away the stone,” so I think in the long run, it is quite possible that things progress. Probably in ways I cannot even imagine today.

Logic, Logic, and Logic
This entry was posted in automata. Bookmark the permalink.

### 5 Responses to ALogTime, LogCFL, and threshold circuits: dreams of fast solutions

1. gentzen says:

Surprisingly, this post has been a continuous joy for me to read, since I wrote it one month ago. Of course, I don’t read the post itself, but the papers it mentions. Let me begin by an update on the following open question:

What I want to say is that I have read Buss’ paper in principle, but didn’t yet work thoroughly enough through section 7, and I don’t understand yet why Buss says (page 10): “However, we shall use the yet stronger property (c) of deterministic log time reducibility. Although it is not transitive, it is perhaps the strongest possible reasonable notion of reducibility.” Why is DLogTime not transitive?

I worked thoroughly through section 7 in the meantime, and I understand now why DLogTime is not transitive. I also learned to appreciate Emil Jeřábek’s detailed answer on the fine grained relation between LH and AC0 in this context. (Especially the part: “Additionally, we require that the ATM makes only one such query, at the very end of the computation, and returns the answer of the oracle as the result of the computation.”) And I found a later paper by S. Buss on the same subject:

Algorithms for Boolean Formula Evaluation and for Tree Contraction (19 pages)
https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean3/paper.pdf
by S Buss – Oct 1991

Those papers would have been easy to find, if…
https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean
https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean2
https://www.math.ucsd.edu/~sbuss/ResearchWeb/Boolean3

2. gentzen says:

At the end of the section “Parallel and distributed computing”, I wrote:

(It should be clear that I didn’t really try to read those texts, and just browsed through them.)

Well, I worked through the survey “Parallel Algorithms” by G. Blelloch and B. Maggs in the meantime, and finished the first quarter of “Thinking in Parallel: Some Basic Data-Parallel Algorithms and Techniques” by U. Vishkin. The terms EREW, CREW, CRCW (E=exclusive, C=concurent, R=read, W=write) are explained, but their detailed relation to NC1, L, NL, SAC1, and AC1 is not discussed. I found such a discussion (including the interesting fact that L is in EREW1 by viewing L as directed tree reachability) in section 3 “Models of Parallel Computation” of

A Survey of Parallel Algorithms for Shared-Memory Machines (70 pages)
https://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/5865.html
https://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/CSD-88-408.pdf
by Richard M. Karp and Vijaya Ramachandran – March 1988

3. gentzen says:

At the end of section “Weak reductions and isomorphism theorems”, I wrote:

Another line of attack for making the many-one reductions weaker is to only allow isomorphisms.

I didn’t read those papers yet, but … by Neil Immerman gives a nice survey of those results, which I read for a start.

Immerman’s survey presentation was from Jan 26, 2017. But the subject is actually much wider than I gathered from that survey presentation and the two papers I linked to. There are good recent survey papers which give a better impression how wide that subject really is:

The Isomorphism Conjecture for NP (23 pages)
https://www.cse.iitk.ac.in/users/manindra/survey/Isomorphism-Conjecture.pdf
by Manindra Agrawal – Dec 19, 2009

Investigations Concerning the Structure of Complete Sets (14 pages)
http://ftp.cs.rutgers.edu/pub/allender/isomorphism.pdf
by Eric Allender – 2014

4. gentzen says:

In section “3.9 Parallel Computation Thesis” in “A Survey of Parallel Algorithms for Shared-Memory Machines” by R. Karp and V. Ramachandran linked to in one of the comments above, there is the following paragraph:

A similar technique shows that deterministic space S(n) is in EREW(c^S(n), S(n)). In this case the computation of the space-bounded Turing machine can be modeled as a directed tree, and an input is accepted if and only if the final configuration is reachable from the initial one. Since reachability in a directed tree is a special case of expression evaluation, the tree contraction algorithm of Section 2.2 gives the required result.

The paragraph confused me, because both expression evaluation and tree contraction can be done in NC^1. However, NC^1 = Logspace would be a major result. And I didn’t see how those arguments would suffice to show that Logspace is in EREW^1, if they should be insufficient to show that NC^1 = Logspace. Now I finally took the time to investigate this question in detail. My conclusion is that Logspace is really in EREW^1, but that the proof is missing a sort step at the beginning, which makes crucial use of the fact that EREW^1 can compare two numbers in constant time, which explains why that proof doesn’t go through for NC^1.

A smaller issue with that paragraph was that the tree contraction algorithm of Section 2.2 only worked for binary trees, even so there was a remark that it is easy to generalize for arbitrary trees. OK, but would that generalized algorithm still be in EREW^1? A newer paper made the generalization precise and also highlighted the points where things can go wrong:
A Practical Tree Contraction Algorithm for Parallel Skeletons on Trees of Unbounded Degree
Akimasa Morihata, Kiminori Matsuzaki – 2011
The conclusion is that it can indeed be done in EREW^1. Can this algorithm also be implemented in NC^1? I guess so. However, I will first work through the three papers by Sam Buss (et al.) on that problem again, before I form my own opinion and conclusion. Basically, my guess is that EREW^1 like pointer jumping/manipulation can be emulated in NC^1 while CREW^1 like pointer jumping cannot.

Unrelated, already in 2. Jan. 2018, the following paragraph from section “3.4 Relation between circuits and PRAMS” caught my interest:

It is shown in [HoKlPi] that any bounded fan-in circuit of size S(n) and depth D(n) can be converted into an equivalent circuit of size O(S(n)) and depth O(D(n)) having both bounded fan-in and bounded fan-out. Using this result, it is easy to see that NC^k ⊆ EREW^k.

Then I read the references paper. At the moment, that paper is freely available:
Bounding Fan-out in Logical Networks
Howard James Hoover, Maria M Klawe, Nicholas J Pippenger – 1984
It is funny. It uses a sort of Huffman compression to determine the optimal ordering that minimizes the depth.