![]() |
Computer Science-Related Articles
This page contains Computer Science related papers or articles I have
produced, many derived ultimately from articles posted on the USENET throughout
the 1980's, 1990's and 2000's.
This site is currently in the 2nd phase of construction. There is currently, the equivalent of only 900 pages of material are on-line, with much more awaiting conversion and inclusion, and further material still associated with 5 software distributions that are in the process of being prepared for release or re-release, including the C-BC Programming Language and C-AS 8051 assembler.
Comments and feedback.
Return to the Federation Archive.
From the Older UWM Site
8051 Software Developer's Toolkit
Regular Expression and Finite State Automata Software
The C-BC Programming Language
Contents
A Treatise on Formal Language and Automata Theory
Relational Methods in Computer Science (10) and Applications of Kleene Algebra (5), 2008
The Algebraic Approach to Formal Language and Automata Theory
Background
This archive is an outgrowth of a long-term programme whose goal is to recast
the entire field of formal language and automata theory (along with the rest
of computer science), once again, onto a purely algebraic foundation, as a
branch of mathematics, as had originally been intended throughout the 1950's
and early 1960's.
In large measure the developments that have taken place here, dating from the 1980's on, were a reaction to the terribly inadequate state that the general field had been in throughout the 1970's onward. At a critical juncture, in the period 1988-1989, I decided to take on the task of reformulating (and significantly generalizing) the classical theory from scratch. To put it bluntly: Computer Science did not exist as anything other than a woefully insufficient presaging shadow of what was (and still is) yet to be, at that time.
There are notable exceptions to that generalization, including those who had already been pursuing the field from an algebraic foundation (e.g. Kozen), or from a category-theoretic perspective (e.g. Curien, Lambek). I am not acting alone. But others of this or like mind are still few and far between. Moreover, many of those who had started out in the right direction (e.g. Sch�tzenberger, Brzozowski, Landin) ended up frequently taking a left turn at some critical juncture, frequently missing important insights and syntheses which lay directly ahead.
Counted in this list is the field of Denotational Semantics, which can only be described as a mistaken reaction to the apparent failures of Landin's programme of the 1960's and a complete turn in the wrong direction. Though Denotational Semantics does bear fruit; the fruit it bears is like the labored growth of a tree standing alone in the middle of a desert — just a few paces away from the fringes of a vast oasis. It would also not be going too far to say that this is also the perception of the field, from within.
The following excerpt underscores the nature of the situation and illustrates both the importance and significance of the enterprise that have taken place here and elsewhere. It gives a snapshot of the state of the field at the turn of the 21st century:
It is this vision that the Treatise on Formal Language and Automata Theory and its precursors The Algebraic Approach (1999-2003) and The Untold Story of Formal Languages (1993-1996) ultimately provide the fulfillment of.
Though the Computer Science stands over (and subsumes) its applications in various engineering fields; it is not a branch of engineering. The distinction becomes particularly clear in times such as these, where the day-to-day particulars of technology are and have been undergoing rapid transition. Those who are couched in an engineering world will be, and continually are, subject to the doom of rapid obsolescence, because of the very premise of adopting that vantage point. The field, itself, however is evanescent and has been around, with much the same issues as preoccupy the world at present, for over 2000 years; counting amongst its earliest forebears the logicians who were the Stoic philosophers of old (the real-life basis of the Vulcans of Star Trek).
An excellent case in point is the approach that had been adopted to the field of database engineering. At the turn of the 21st century, the field resides in a state in which it is locked up into a de facto standard, which reads like a Computer Science or "language lawyer" text borne of the 1960's or 1970's, borne of those who came from that period (and are still in it). It Byzantine virtues make the recently dumped European constitution look like light reading on the beach in comparison.
The consequences of this state of affairs with the ossified geekification of the field is that it is very difficult for anyone to make entry into the field, or anyone who has entered the field to make headway. This is not only reflected, for instance, in the high pay (i.e. hazard pay) nowadays assigned to the task; but the pandemic of failures, lockups, overruns, etc. that have been bought with that pay. For instance, in 2007 in the state of Wisconsin, the databases for the Department of Motor Vehicles, the Department of Revenue, for the system to be used in elections in the state, for parts of the Wisconsin Job Network have all been curtailed. At the federal level, the problems the IRS has had with its database system are well-known.
Meanwhile the promise of the future that lies in the paradigms of Categorical Programming (which the so-called Object Oriented revolution was never anything more than a caricature of) and Logic Programming (e.g. Datalog) remain unfulfilled.
One of the most difficult aspects of this long-term programme has been degeekifying the underlying languages and notations used in the formalisms in the field at present, and divesting them of the particulars that had dominated the computer engineering world of the late 20th and early 21st century. Over time, particularly since the 1990's, others have headed toward the same conclusion and have begun to take on the same effort (notably in Physics). Not the least of these reasons has been the emergence of numerous representations (outside the classical world of the digital electronics of boolean processors) in the natural world of the general phenomenon, captured by the underlying theory, of computation; the most significant example being quantum computation or photonics.
People are slowly coming to realise that computation is not about the latest day-to-day affairs in the programming language du jour, or widget of the hour. These things have come and gone in a kaleidoscope of changes dominating the latter part of the previous century and earliest part of the present century. It is about what pervades, and subsumes, all these developments: that which mathematicians had proceeded to capture in the earliest parts of the 20th century, and which others (notably including Leibnitz) had concerned themselves with long before.
A Treatise on Formal Language and Automata Theory
(Early draft of chapter/topic outline)
This includes only a topic outline. The material slated for evetual inclusion
is being drawn from the various articles currently in the process of being
converted and uploaded to this site. The central focus of the treatise is the
consummation of two separate programmes pursued over the past 20 years:
The Untold Story of Formal Languages
This is a compilation of the following sets of articles series from USENET
| The Equational Approach | comp.compilers, comp.theory | 1993 September 28 - October 4 | (parts 1, 2, 3) |
| The Untold Story of Formal Language Theory | comp.theory | 1995 July 20 - August 6 | (parts 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11) |
| The Untold Story of Formal Languages | sci.math.research | 1996 January 31 - February 27 | (parts 1, 2, 3, 4, 5) |
(Preface, sections 1-2; PDF, 281k)
Preface and USENET Sources
1. The Algebraic Representation of Formal Languages
An analogy exists between the Fock spaces of classical and quantum physics
and the state spaces of automata based on stacks, queues and other lists.
Corresponding, respectively, to the create and annihilate operators are the
"enqueue" and "dequeue" processes; to the vacuum state, the empty queue.
Based on this correspondence, algebras can be devised that embody the actions
of machine and automata models that employ such devices. This section reviews
two such algebras, termed the polycyclic and Bra-Ket
algebras, as well as providing the background to the underlying algebraic
theories employed throughout this treatment. A remarkable property of the
Bra-Ket algebra, explored at the end of this section, is its isomorphism to
its own matrix algebras.
Contents
1.1. Background
1.2. Quantales, Dioids and Kleene Algebras
1.3. Algebraic Preliminaries
1.4. Extending Dioids to Quantales
1.5. The Fock Space Representation
1.6. Stack Algebra and Matrix Representations
1.7. An Expression Algebra for Context-Free Languages and Transductions
1.8. The Bra-Ket Algebra
1.9. The Self-Similarity Property
2. Regular Expressions -> DFA's
The conversion of regular expressions to finite automata may be regarded as
the process of transforming the expression into a linear system of inequalities
which the expression is the least solution of. The process leads directly to
algorithms that effect the conversion by carrying out algebraic computation on
regular expressions. The algorithms have served as the basis of
the (RegEx software cache, and the REX utility).
REX is a utility whose functionality is a large superset of the standard POSIX
utility GREP, as well as that of GNU-GREP. Amongst its features is its ability
to process intersections, differences and even anagrams. (The software set is
slated for re-release in this archive and is currently undergoing upgrade to a
new cache for processing context-free expressions).
Contents
2.1. Preliminaries
2.2. Introduction
2.3. Automata as Systems of Equations
2.4. The Factoring Property ("Taylor's Theorem")
2.5. Factoring Regular Expressions
2.6. Calculation Method
2.7. Examples
2.8. Complexity
2.9. Conversion to an NFA
2.10. Derivation Sequences as Inequalities
2.11. Algebraic Manipulation of Systems
2.12. Infinite Automata and Recursive Systems of Equations
(Sections 3-5; PDF, 313k)
3. The Algebraic Representation of Automata
One of the central features of the Untold Story series is that it represents both a continuation and completion of the vision originally laid out by von Neumann, himself. This is true both in the sense just cited and in the sense of incorporating both of the primary strands of von Neumann's thought — Mathematical Physics and Computer Science — in the overall synthesis that this series presents.
This section features the introduction of an algebra for state diagrams. Treating a state diagram as a graphical representation of a matrix, one can define the operations of addition and multiplication over them. The result is an extension of the cycle notation used for representing groups.
The classes of automata seen in classical formal language and automata theory may all be seen as instances of a general "infinite state automata" model. What distinguishes each class is (a) the factoring of its state space into the product of a "control" space and "device" space; (b) the imposition of "selection" rules constraining the allowable transitions in the device space and (c) the imposition of "symmetry" rules governing the transitions in device space that allow an set of transitions to be generated from a finite set. Features (b) and (c) play roles analogous to that played by selection rules and symmetry in non-linear dynamics.
In this section, the notion of automata and state diagrams as graphical representations of systems of inequalities is developed. This concept leads us directly to the issue of operator algebras and their matrix representations.
Contents
3.1. State Diagrams, Automata and Matrices
3.2. Infinite State Automata
3.3. Varieties of Infinite State Automata
3.4. Representation of an Automaton as a System of Inequalities
3.5. Connection to the Operator and Matrix Algebras
4. Context-Free Expressions and the Bra-Ket Algebra
The class of infinite state automata corresponding to the 1-stack automata can
be directly converted via into a finite linear system of inequalities over an
algebra that incorporates the Bra-Ket algebra. The resulting expressions
extend the classical theory of regular expressions up to type 2 in the
Chomsky hierarchy and so may be termed context-free expressions.
A process for converting the non-linear systems of inequalities that represent grammars in extended BNF is also developed. A central feature of this conversion is the process of linearization, which may be likened to that seen in mathematical physics of taking the spinor "square root" of a vector. With the use of the bra-ket operators, the non-linear systems representing context-free grammars are decoupled into first-order systems.
Contents
4.1. The Bra-Ket Algbera; Tensor Products
4.2. Infinite State Automata and Bra-Ket Notation: Example
4.3. Reduction of Push-Down Automata to Regular Bra-Ket Grammars
4.4. Embedding Context-Free Languages in C2 >< RX*
4.5. Example
4.6. Context-Free Expressions and Fixed-Point Systems: Examples
4.7. Converting EBNF Grammars and Fixed-Point Systems to Bra-Ket Form
4.8. Optimizations
5. Turing Expressions and Translation Expressions
Taking this process one step further, we can go further and also treat
machine and automata models that employ multiple stacks or double-ended
queues. Along the way we reproduce the classical result that links open-ended
queues to stacks, and double-ended tapes to twin-stacks.
An operator algebra for Turing machines and Turing transducers is constructed from the Bra-Ket algebra. The parsing/translation problem is formulated in the general setting, where we show that the complexity of the representation of a set of outputs corresponding to a given input under translation is linear. Though not well-known, this is actually a somewhat trivial consequence of Kolmogorov complexity theory. The representation provided here is a realization of the theoretical result.
Contents
5.1. An Expression Algebra for Turing Languages and Transductions
5.2. Conversion of an Open-Ended Queue to a Stack
5.3. Reduction of Turing Machines via the Bra-Ket Algebra
5.4. Linear Bounded Turing Machines and Context-Sensitive Expressions
5.5. Translation Systems and the Markov Machine
(Sections 6-9; PDF, 326k)
6. The Hardest Context-Free Language
A context-free language exists over a monoid M, which we call the "Hardest CFL", HL,
that possesses a universal property that for every other context-free language L over every
other monoid N maps into HL under a suitably-defined monoid homomorphism hL: N -> M,
while the complement of L maps outside of HL. This result was originally
demonstrated by Greibach in the 1960's. A more direct proof is provided here
that employs context-free expressions.
Contents
6.1. History
6.2. The Mapping
6.3. Equivalence of CFL Membership to Membership in HL
6.4. Examples
7. Grammars as Fixed-Point Systems
Previous attempts at developing a formalism for context-free expressions
in the 1970's (Yntema, Gruska and McWhirtier, independently of one another),
centered on the notion of fixed-point systems and the least fixed point
operator. We state the fundamental properties of the "fixed point" calculus,
and then provide a means of calculating fixed points within the algebra of
context-free expressions, based on the method developed in section 4.
It is here that we also begin to see that link between the context-free expression algebra and the celebrated Chomsky-Sch�tzenberger theorem. Here, we explain how such an algebra arises from the theorem itself.
The difference between the two methods of representation may be likened to that of solving an equation 4x = 3 by writing x = 3/4, versus that of solving the equation by writing out a decimal expansion 1.333... Here, the analogous role for decimals is played by context-free expressions. Just as it is true that linear systems will lead to the repeating decimals that identify rational numbers, so it is that the linear systems here will lead to rational or regular expressions. The innovation we present is that we go one step further and also write down "decimal" solutions for non-linear systems, where the corresponding expressions need not be regular or rational.
Along the way, we also find that the notion of parse trees and parse forests comes under the scope of our synthesis. Regarding each branch of a parse tree as representing an inequality, a single parse tree, and AND-OR tree and even an infinite set of parse trees can be represented by a single (generally non-linear) system. This system, in turn, possesses a least fixed-point solution. This solution is none other than the representation of set of parses or translations, alluded to in section 5.
Contents
7.1. History
7.2. Fixed-Point Systems
7.3. Generalizing Parse Trees and Forests as Systems of Inequalities and O(n) Parsing
7.4. Solving Context-Free Systems: History
7.5. The Gruska/McWhirtier/Yntema Approach
7.6. The Current Approaches
7.7. The Fixed-Point Operator
7.8. Basic Properties of the Least Fixed Point Operator
7.9. Schur's Lemma
7.10. Fixed-Point Solution: Example
7.11. Solving a Fixed-Point System
7.12. Fixed-Point Solutions: Further Examples
8. Direct Proof of Parikh's Theorem
Published also as Parikh's Theorem in Commutative Kleene Algebra.
The commutative version of the Chomsky-Sch�tzenberger theorem is the Parikh theorem. The methods developed for solving fixed point systems for non-linear systems substantially simplify when brought over the setting of commutative algebras. In addition, the Brzozowski derivatives of the non-commutative theory now become bona fide derivation operators.
The result is that we are able to formally define a calculus for commutative Kleene algebras in which a version of Taylor's Theorem, the Chain Rule and the theorem on total differentials all hold true. With this new background, the least fixed-point operator of section 7 becomes trivial to implement. In detail, if f(x) is an algebraic function of x, then the least solution to the inequality x ≥ f(x) will just be f'(f(0))* f(0).
We conjecture that a similar result also holds for systems of dimension 2 or more, namely that the 2nd order iteration of the operator x |-> f'(x)* f(0) starting from x = f(0) will suffice, where f'(x) now denotes the Jacobian of f(x). This requires the notion of solving systems under a homomorphic reduction of a given algebra. Though the concepts have not yet been developed to completion (neither here nor in the Parikh paper 2006 update), it seems clear that the second order iteration will suffice to solve the general system. This has been verified in the case of 2-dimensional system by Kozen c. 2000
Contents
8.1. Background
8.2. Kleene Algebras
8.3. Abelian Kleene Algebras
8.4. Differential Operators
8.5. Parikh's Theorem
9. Properties of Context-Free Systems
This section has not yet been incorporated into this archive (though the
USENET orignals can be found via the links provided). The normal form theorem
states that every context-free expressions can be converted to a regular
expression over the operators and "operator-free" context-free expressions,
in which "the kets occur to the left of the bras". Thus, for instance, a
normal form for (a <1| + |1> b)* would be (N |1> b)* N (a <1| N)*,
where N is the "operator-free" expression given by <0| (a <1| + |1> b)* |0>.
The conversion implements a process of transforming context-free expressions into context-free grammars. The process generalizes the classical result whereby pushdown automata are transformed into grammars. Here, the notion of grammar is replaced by that of the "operator-free" expression, while the automaton is replaced by a linear system that has the expression to be converted as its least solution.
Over a pure operator algebra, this conversion in turn leads to a representation that implements a shortest-paths formulation, reminiscent of Valiant's O(nlog 7) complexity solution for the context-free recognition problem.
Contents
9.1. D-CFL < N-CFL
9.2. Converting CFE --> CFG: The Normal Form Theorem
9.3. Closure Properties for CFL's
9.4. CFL-RL Intersection
The Algebraic Approach to Formal Language and Automata Theory
Derived ultimately from the Algebraic Approach, present on the
web in 1999-2003, this is the core of what will eventually become the
treatise on formal language and automata theory. This material is also drawn, in part,
from the Algebraic Foundation for Formal Language Theory series
from comp.theory, 2004 May 5-20.
Related articles
Algebraic LR Parsing (PDF, 63k)
Notes on the reformulation of the LR parsing paradigm within the algebraic approach.
The Syntax of the Programming Language C
Full Prolog Syntax
These formulations of the syntax of various programming languages will be used as case study examples, in followup articles, to illustrate the algebraic methods developed here.
Preface and References (PDF, 46k)
0. The Algebraic Point of View (PDF, 250k)
Supplement:: Measures, Normed Monoids and P = NP (PDF, 31k)
There are three fundamental modes of representation that appear in the theory of formal languages and automata: (E) the language or translation expression (e.g. regular expressions); (G) the grammar or transition system; and (A) the automaton, machine or transducer. All three modes may be brought within a unifying framework, in which modes (G) and (A) serve as various means of representing systems of inequalities between the objects that reside in partially ordered algebras that house the expressions of mode (E).
The principle of finiteness or minimality is generally asserted in association with these modes of representation, stating that the entity being generated or processed by the grammar, transition system, automaton, machine or transducer, should comprise no more than what can be derived by a finite number of steps. Rendered in algebraic form, this principle states that the entity should be identified as the system's least solution.
The expressions of the first mode of representation are then the entities, themselves: it is the partially ordered algebras from which the expressions are drawn. The ordering relation then serves to encapculate the concepts of derivation, inference, reduction or instantiation, bringing them all under a common framework.
Contents
0.0. Set-Theoretical Preliminaries and Notational Conventions
0.1. Algebraic Preliminaries
0.2. Grammars as Systems of Inequalities
0.3. Automata as Systems of Inequalities
1. Dioids and Quantales (PDF, 283k)
The notion of the "word", prevalent in formal language and automata theory
is encapsulated algebraically by a structure known as a semigroup, whose
distinguishing feature is that it possesses a product that satisfies the
law of associativity (ab)c=a(bc). The product is naturally intepreted as
the operation of combining words together, or of sequentially combining
processes. A dioid contains, in addition, the partial ordering relation
that serves to encapsulate the notions of derivations or reductions. Its
distinguishing feature is that is defines a "least ordered" entity, which
might be likened to the notion of failure, or error, and
a "least upper bound", which takes the "sum" or "least" of two or more
entities. This might be regarded as the precursor, at a primitive level,
of the notion of non-deterministic branching.
The different varieties of dioid are distinguished by what range of entities shall have least upper bounds. The results are algebras, with the sum distributing over the product, but satisfying the identity a + a = a. In the most general case, if one allows sums over arbitrary (even infinite) sets, the resulting structure will be what is already known from quantum theory and non-linear dynamics as a quantale. Other more restricted alternatives, however, may be defined. Indeed, the totality of all the possibilities forms a hierarchy that mimics the Chomsky hierarchy and recalls the concept of "abstract families of language" from the classical theory of formal languages and automata.
In this section, the hierarchy is shown to possess a family of "extension" maps, which allow any of the varieties of dioid to be extended to a larger variety, including all the way to the level of a quantale. The process of "quantale completion" recalls, in many ways, that of completion of lattices or partial orderings by the use of ideals. Indeed, a simple formulation for the notion of ideals exists here, which allows this construction to be carried out in full generality. That is what this section will focus on.
Of later interest will be two particular constructions: those that exist between the varieties of dioids that play the analogous role of all the types 0-3 of the Chomsky hierarchy. The actual extension maps will serve as the basis for "fixed-point" theorems that embody the classical Chomsky-Sch�tzenberger theorems and provide the foundation for extending the classical theory of regular expressions to a general theory of expressions for languages, translations and other processes.
Contents
1.1. Dioids, Quantales and Kleene Algebras
1.2. Ideals and Quantales
1.3. The Dioid-Quantale Hierarchy
1.4. Analytic Operators and Fixed-Point Theorems
1.5. Monads, Analytic Dioids and Power Series
2. Kleene Algebras and Regular Expressions
(2.1-2.7.2 PDF, 501k)
One of the more fundamental discoveries made in recent times in the field of
formal language and automata theory was the existence of a finite axiomatization
for the theory of regular expressions. Though it had been known since the 1970's
that no finite set of identity equations can completely produce all the
identities that hold for regular expressions; it turns out that the dioid
is sufficient to accomplish the task — provided one adopts as an axiom
that the least solution to the inequality x ≥ a + bx + xc should be
x = b* a c*. In fact, that's all that's required!
(Well, actually, we'd also need 0* = 1). The distinguishing feature
of Kleene algebras, with respect to this axiom set, is the solvability of
more general finite linear systems within them (as well as non-linear systems
of a restricted type).
The first proof that this axiom base, or any equivalent to it, sufficed to completely embody the regular expressions emerged in the mid 1990's. However, instead of directly incorporating the classical proof of Kleene's theorem given by Brzozowski in the early 1960's, it employed a somewhat standard textbook technique for converting regular expressions to automata. The method used here algebraizes the construction by Brzozowski. What we will find is that any map h from a set X to a Kleene algebra K will lead to a linear system of inequalities over the algebra K which, though infinite, will still possess a least solution. That solution, in fact, is none other than the map which converts a regular expression over the alphabet X into an element of the Kleene algebra K. This actually shows that the equations between regular expressions are precisely those equations provable under the Kleene algebra axioms.
The extra operator, called the Kleene star may be thought of as embodying an infinite branching so that x* would mean 1 + x + x2 + x3 + .... In a process point of view, this would be interpreted as the action of repeating x a undetermined number of times — thus encapculating the notion of iteration at a very primitive level.
However, the process point of view does not really fit the Kleene algebra mould. Instead, it is more in line with the notion of processes as changing or transforming states. This naturally leads to a generalization of the Kleene algebra concept analogous to the generalization of groups to groupoids. Hence, the name Kleene algebroid. It also serves as a necessary precursor to rounding out the completeness proof. A benefit that emerges from this more generalized setting is that we now have a direct means of embodying the graphs, called state diagrams, which are central to the theory of automata, machines and transducers. Thus, whereas the theory of Kleene algebras serves as a foundation for expressions (our first mode of representation, (E)), the theory of Kleene algebroids serves as a foundation for automata, transducers and machines (our third mode, (A)).
Contents
2.1. Basic Kleene Algbera Theory
2.2. Kleene Algebroids
2.3. Linear Systems and Matrix Algebroids
2.4. Derivatives and Brzozowski Systems
2.5. Intertwining Maps; Subsets and Minimalization
2.6. Completeness of Kleene Algebras
2.7. Derivatives and Kleene Algebroids
2.8. Kleene Algebroid Completeness
2.9. Free Extensions and Substitutions
3. Context-Free Expressions (PDF, 353k)
The theory of context-free expressions is formulated here. The central feature
of this theory is the appearance of an algebra which ultimately arises out of
an analogy between stacks and queues and Fock spaces in mathematical Physics.
These notes, originated from section 3 of the 1999-2003 Algebraic
Approach web series; which, here, form sections 3.1 - 3.7. The original
treatement made tacit use of the contents that later came to be developed in
section 1, and needed to be held off pending the completion of section 1.
Contents
3.1. Introduction
3.2. Basic Constructions
3.3. Conversion of Grammars to Expressions
3.4. The Normal Form Theorem
3.5. Schur's Lemma
3.6. The Fixed Point Theorem
3.7. Fixed-Point Calculus
3.8. Algebraic Chomsky-Sch�tzenberger Theorems
3.9. O(n) Parsing and Recognition by "Shortest Paths" Bra-Ket Reduction
4. Algebraic Parsing Theory and Translation Expressions
Since the algebraic approach programme is ultimately grounded in the study
of parsing theory, it is only natural that the focus be brought back
to these applications.
The methodology of LR parsing originated in the 1960's by Donald Knuth. It provides a means of representing deterministic context-free languages and processes (indeed, it is complete in this sense), as well as that of effectively handling non-deterministic languages and processes. Expanding on the latter application, the approach was generalized by Tomita in the mid 1980's to what is known as generalized LR (or GLR) parsing. A distinguishing feature of the method was the efficient handling of the potentially large number of structures that may be derived from the parse or translation of a highly ambiguous input.
One can go much further with this development. In the general theory of context-free languages and translations, the possibility of infinite ambiguity arises. Various ad hoc restrictions have been employed in the numerous methods in existence to deal with or otherwise evade this issue. However, the notion underlying the possibility actually holds the kernel of the resolution: the infinity comprising the translations or parses of a given item are of such a form that they, themselves, form a language — a context-free language. Thus, a means that provides an omnibus representation for outputs, translations or parses is tantamount to a means for embodying context-free languages, themselves; i.e. context-free expressions.
It turns out that such a structure, no matter what the size of the set it denotes, can be represented by a context-free expression that, itself, possesses a complexity linear in the size of the item being processed. The existence of linear complexity representations, of course, is already a well-known consequence of Kolmogorov complexity theory. Here, we find a realization of the theoretical result.
Context-free expressions, in fact, will serve to directly embody such "output" or "translation" sets. What this means is that there is an effective procedure for enumerating the elements of a set represented by a context-free expression or testing for its emptiness that matches the best-known process for context-free recognition — Valiant's algorithm. In fact, the algorithm is naturally represented by a process for reducing expressions in the pure bra-ket operator algebra, and arises as a consequence of the normal form theorem.
Another important application of the foregoing is the mathematical derivation or "calculation" of parsers or processing mechanisms for languages and processes specified by context-free languages or transductions. An important case is the design and implementation of the syntax for a given language. This application is illustrated in detail in several case studies.
Contents
4.1. Parsing, Translation and Enumeration
4.2. Enumeration and O(n) Parsing (Supplement: PDF, 70k)
4.3. Algebraic LR-Infinity Parsing (Supplement: PDF, 206k)
4.4. Parsing as Context-Free/Regular Intersection (Supplement: PDF, 51k)
4.5. Valiant, Shortest-Paths and Evaluation of Bra-Ket Expressions
4.6. Solving Context-Free Systems — Case Studies
5. Turing and Translation Expressions
This section, like the previous, has yet to be compiled and included here.
However, some of what is to appear is already present under the
Untold Story of Formal Languages series.
Parikh's Theorem in Commutative Kleene Algebra
(PDF, 242K)
1999 Original:
(bibtex,
abstract,
postscript)
This is a 2006 update of the 1999 paper presented to the LICS '99 conference by
the co-author Dexter Kozen. Working within the algebraic approach to formal
language and automata theory, the 1999 paper presents Parikh's Theorem as
a fixed point theorem for commutative Kleene algebras and then provides the
closed form solution. An alternate closed form solution is provided in the
2006 update as a conjecture. The 2006 update also incorporates some of the
elements of the non-commutative generalizations of the Parikh theorem known
as the Chomsky-Sch�tzenberger theorems, and discusses the key issues
surrounding this prospective generalization. These Chomsky-Sch�tzenberger theorems, in turn, serve as the algebraic foundation for a generalized theory
of language expressions.
Much of the original paper, sections 1-4, arose in the much larger context that comprised the 1996 sci.math.research article series Untold Story of Formal Languages, its predecessor 1995 series from comp.theory, and the 1993 Equational Approach series, also from comp.theory; all of which has been integrated here into the Untold Story of Formal Languages series. Section 5 was entirely from Kozen, who explicitly formulated and resolved the problem of finding a direct closed-form solution to the multivariate system. The paper was all written up and presented by Kozen in the 1999 IEEE LICS conference, and in a seminar at Cornell. Even at the present time, the content stands out as distinctly futuristic in flavor � as also does much of the development in Kleene algebra theory by Kozen � and caught the reviewers by surprise.
Additional elements of the content from the 1996 edition absent in the 1999 paper, as well as other elements from the 1995 series, which contains a still-unpublished proof of another closed-form solution have been added and are still being added. This is, therefore, being updated substantially from the original 1999 LICS paper.
In addition a new element, section 6, has been added to incorporate the various non-commutative fixed point theorems that can be established in the finitary Kleene algebra without resort to the infinitary power of *-continuous Kleene algebras and to begin to address the not-so-well-known non-commutative generalizations of Parikh�s Theorem which go under the name of the Chomsky-Sch�tzenberger Theorems.
Contents
Abstract
1. Introduction
2. Commutative Kleene Algebra
3. Polynomials and Derivations
4. A Generalization of Parikh's Theorem
5. A Closed Form Solution
6. A Non-Commutative Parikh's Theorem?
Context-Free Expressions
(PDF, 171k)
This is a translation from German and revision of the Dominik L�cke's 2003
Context-Free Subsets of Monoids carried out in early 2007. In
turn, the 2003 paper is largely drawn off of part 3 of The Algebraic
Approach. Numerous comments and clarifying remarks have been added.
Only the first 3 sections, and part of the 4 are translated here, so far. It's
doubtful that much of anything beyond section 4 will be translated here since
(a) the material is already being subsumed by the Untold Story of Formal
Languages series, (b) by the Algebraic Approach series.
Much of what is here, below, in the translation (particularly the material on
the Fock space representations) is additional material not present in the
2003 original
This paper introduces the notion of Context-Free Expressions in a form developed originally in a series of USENET articles in comp.theory, comp.compilers, sci.math.research and elsewhere over a period of time ranging from 1991 up to the present.
Contents
1. Introduction
2. The von Neumann Correspondence
3. Matrix Algebras
4. Bra-Ket Notation
Stacks as Algebras
(PDF, 99k)
In this article, the genesis of a formal "Dirac" bra-ket notation as an
algebraic representation for stacks is discussed in detail. The bra and
kets in this formalism are analogous to the raising and lowering operators
of an N-body Maxwell-Boltzmann Fock extension of a Hilbert space. Here, an
analogy is drawn between raising operator | v> corresponding to a Hilbert
space vector v and the push operation. The analogue of the
the lowering operator <v| is the dual pop and test operator.
The basis of the analogy is the following characterization result. If T is the basis of a Hilbert space, then the basis of the corresponding Maxwell-Boltzmann Fock space is the stack language whose symbol set is T. Unlike the Fermi-Dirac and Einstein-Boltzmann Fock spaces, there are no special (anti-)symmetry properties, so that the vectors of the Maxwell-Boltzmann Fock space are just words formed out of the vectors of the original Hilbert space. The raising operator then adds a new element to a given word, the lowering operator erases the last element added — i.e., the last-in-first-out protocol that characterizes a stack.
It is ultimately out of this that the analogue Dirac notation for context-free expressions, Turing expressions and translation expressions arise.
Notes on the Algebraic Theory of Context Free, Translation and Turing Expressions
(PDF, 97k)
Partly on the inspiration of the 1987 edition of Derick Wood's Theory of Computation
(particularly, the sections on Brzozowski derivatives, context-free expressions
and translation expressions), a programme was established here to reinvent
formal language and automata theory on an algebraic foundation suitable for the
incorporation of context-free expressions and other language and transduction
expressions higher up the Chomsky hierarchy.
This article, derived from a letter sent to Derick Wood, provides the outline of the key elements of this programme.
The Algebraic Basis of Context-Free and Turing Expressions
(PDF, 69k)
(Supplement to section 3 of the Algebraic Approach)
Corresponding to the Chomsky hierarchy is a much more expansive hierarchy of algebraic structures known as dioids. This hierarchy provides an ordering of dioid categories related to one another by adjunctions. At the bottom are the dioids (idempotent semirings) and at the top, quantales. The algebra corresponding to Chomsky type 3 is what is already known as the *-continuous Kleene algebra. Those corresponding to Chomsky types 2 and 0 are heretofore unnamed.
The extensions of the theory of regular expressions to Chomsky types 0 and 2 amounts to writing down the algebraic analogue of the Chomsky-Sch�tzenberger theorems. They may be realized as instances of a large family of adjunctions relating a hierarchy of algebras known as dioids; and the theorems, themselves, results governing the solutions of non-commutative fixed-point systems or of more general non-linear systems.
SEQCD - Where Landin Went Wrong
(PDF, 41k)
This article discusses a significant departure from and correction to Landin's
failed attempt in the 1960's to estblish a correspondence between
functional-imperative programming and a procedural semantics for the Lambda
Calculus. The failure in Landin's approach centered on the issue of
transparently resolving cyclic control flow structures. The insight developed
here lies in the realisation that the consummation of the programme necessarily
involves an infinitary extension of the Lambda calculus. Ultimately, this serves
as the basis of the SEQCD machine.
Eager and Normal-Order Evaluation and the SECD-2 Machine
(PDF, 76k)
A formal definition of eager and normal order evaluation is presented and, from it, a definition for the SECD-2 machine rigorously derived.
The SECD-2 machine is the non-cyclic core of the SEQCD machine.
Stochastic Expressions and State Spaces
(PDF, 64k)
This article further expands the von Neumann correspondence, linking von Neumann's formalism for state spaces to a power series representation for expressions in formal language theory.
Contents
1. Convex Algebra
2. States and Stochastic Automata
3. The von Neumann Correspondence
4. Stochastic Expressions as Mixed States
Stochastic Finite Automata
(PDF, 78k)
This model is developed as a generalization both of Markov Models and Hidden Markov Models. A stochastic finite automaton may be thought of as a finite automaton whose transition arcs are labelled with probabilities, and its states with "final state" probabilities, subject to the normalization condition that the total of all the transition probabilities for transitions emanating from a state, plus the state's final state probability should be 1. A Hidden Markov model is conceived of as a limiting case where the final state probabilities tend to 0.
The typical applications of the Hidden Markov Model — (1) estimating transition probabilities from sample sets, (2) finding the most likely path, (3) determining the probability of a given sequence — all generalize to stochastic finite automata. In particular, the Baum-Welch algorithm has both a generalization and cleaner formulation in this broader context.
Contents
1. Definitions
2. Forward and Backward Probabilities
3. Standard Applications
4. Factorable Stochastic Automata
5. Hidden Markov Models
Typed Lambda Calculus and Totally Recursive Functions
(PDF, 91k)
The representation of recursive functions within the Lambda calculus as combinators is discussed here in depth.
Adjoints, Monads and Comonads
(PDF, 45k)
The definition of adjoints, monads and comonads and the related elements of category theory.
CAM-2 - The Categorical Abstract Machine(s)
(PDF, 68k)
The CAM can be rigorously defined without having to post its elements ab initio.
Doing so will not only eliminate its awkward handling of the combonators
associated with the functional type; but also provide a general template that
will allow a suitable edition of a CAM to easily be derived for general
extensions or modifications of the underlying language.
Categorical Lambda Calculus Without Types
(PDF, 67k)
A discussion of an elaboration on the Lambek-Scott treatment of the untyped
categorical Lambda Calculus using C-monoids.
The Syntax of the Programming Language C
(PDF, 45k)
This article provides a remarkably simple and concise formulation of the syntax
for the programming language C in tabular form. The contrasting feature is that
all the redundancies of a typical LALR(1) grammar have been cleared up. This is
not meant to be used with the older parser generator technology (e.g. YACC, Bison
or any LR-based parser), but rather in conjunction with the algebraic methods
developed elsewhere throughout this archive.
The Syntax of SQL
(PDF, 54k)
The syntax of a subset of the programming language SQL, which is the standard
language for relational databases, is presented here.
Full Prolog Syntax
(PDF, 46k)
A syntax of the Prolog programming language. There are significant
simplifications, compared to the standard, incorporated here, without a loss
of content.
Combinatory Logic
(PDF, 143k)
An introductory article, dating from 1984-1985, on combinatory logic, with
a particular focus on the development of a functional calculus for recursive
functions, with a formal definition of differentiation and integration
included. A significant novelty, also, is the omnibus treatment of the
abstraction algorithm.
Related article: What is KI? (PDF, 26k)
Contents
1. Combinator Types
2. Recursion
What is an Operating System?
(PDF, 97k)
This is a prescient account, written in 2001-2002, on the present state and
future of operating systems. Much of what was foretold has already come to
pass, particularly with the evolution of cell phones and other hand-held
devices into minicomputers and web-accessing devices; and with the
integration of the cell phone network and internet.
The WSMK is the "World's Smallest Multitasking Kernel". Originating in the 8-bit 8051 world, it has been used in a variety of embedded applications where extensive multitasking has been a central feature or requirement. Included in this portfolio are the designs of multiprocessor factory testing systems, multi-modular electronic displays, stepper motor controllers running multiple units concurrently, progressive jackpot displays and other Las Vegas style gaming displays, etc.
Contents
1. The Present State: UNIX/LINUX
2. WSMK: A Pre-Kernel for an OS
3. The Future of Operating Systems
Garbage Collection with Cyclic Structures
(PDF, 143k)
Memory management and recycling with applications that have cyclic structures
has been a major issue in Computer Science. When there are cycles involved,
it's difficult to tell if something is linked to something that's actually
being used, or whether it's part of an orphan cycle.
A language that synthesizes imperative and functional programming must be grounded in an environment that has this kind of memory management. This article is, therefore, a precursor to the developments that are the subject matter of the second part of the treatise.
This is the description, written c. 1995, of a method, which combines the Martinez mark-scan method with the Brownbridge method involving strong and weak pointers. The method is one amongst many cut from the same cloth that solve the orphan cycle problem by assigning and maintaining an acyclic subgraph amongst the set of linked data.
Contents
1. Eager Reference Count Garbage Collection
2. Lazy Reference Count Garbage Collection (with memoizing)
3. GC References