free web hosting | website hosting | Web Hosting | Free Website Submission | shopping cart | php hosting

Computer Science-Related Articles
This page contains Computer Science related papers or articles I have produced, many derived ultimately from articles posted on the USENET throughout the 1980's, 1990's and 2000's.

This site is currently in the 2nd phase of construction. There is currently, the equivalent of only 900 pages of material are on-line, with much more awaiting conversion and inclusion, and further material still associated with 5 software distributions that are in the process of being prepared for release or re-release, including the C-BC Programming Language and C-AS 8051 assembler.

Comments and feedback.
Return to the Federation Archive.

From the Older UWM Site
8051 Software Developer's Toolkit
Regular Expression and Finite State Automata Software
The C-BC Programming Language

Contents
A Treatise on Formal Language and Automata Theory
Relational Methods in Computer Science (10) and Applications of Kleene Algebra (5), 2008
The Algebraic Approach to Formal Language and Automata Theory
0. The Algebraic Point of View 1. Dioids and Quantales 2. Kleene Algebras and Regular Expressions
3. Context Free Expressions 4. Algebraic Parsing Theory and Translation Expressions 5. Turing and Translation Expressions
The Untold Story of Formal Languages
0. Preface and USENET Sources 1. The Algebraic Representation of Formal Languages
2. Regular Expressions -> DFA's 3. The Algebraic Representation of Automata
4. Context-Free Expressions and the Bra-Ket Algebra 5. Turing Expressions and Translation Expressions
6. The Hardest Context-Free Language 7. Grammars as Fixed-Point Systems
8. Direct Proof of Parikh's Theorem 9. Properties of Context-Free Systems
Measures, Normed Monoids and P = NP
Parikh's Theorem in Commutative Kleene Algebras
Context-Free Expressions
Stacks as Algebras
Notes on the Algebraic Theory of Context Free, Translation and Turing Expressions
The Algebraic Basis of Context-Free and Turing Expressions
SEQCD - Where Landin Went Wrong
Eager and Normal-Order Evaluation and the SECD-2 Machine
Stochastic Expressions and State Spaces
Stochastic Finite Automata
Typed Lambda Calculus and Totally Recursive Functions
Adjoints, Monads and Comonads
CAM-2 - The Categorical Abstract Machine(s)
Categorical Lambda Calculus Without Types
The Syntax of the Programming Language C
The Syntax of SQL
Full Prolog Syntax
Combinatory Logic
What is an Operating System?
Garbage Collection with Cyclic Structures


Background
This archive is an outgrowth of a long-term programme whose goal is to recast the entire field of formal language and automata theory (along with the rest of computer science), once again, onto a purely algebraic foundation, as a branch of mathematics, as had originally been intended throughout the 1950's and early 1960's.

In large measure the developments that have taken place here, dating from the 1980's on, were a reaction to the terribly inadequate state that the general field had been in throughout the 1970's onward. At a critical juncture, in the period 1988-1989, I decided to take on the task of reformulating (and significantly generalizing) the classical theory from scratch. To put it bluntly: Computer Science did not exist as anything other than a woefully insufficient presaging shadow of what was (and still is) yet to be, at that time.

There are notable exceptions to that generalization, including those who had already been pursuing the field from an algebraic foundation (e.g. Kozen), or from a category-theoretic perspective (e.g. Curien, Lambek). I am not acting alone. But others of this or like mind are still few and far between. Moreover, many of those who had started out in the right direction (e.g. Sch�tzenberger, Brzozowski, Landin) ended up frequently taking a left turn at some critical juncture, frequently missing important insights and syntheses which lay directly ahead.

Counted in this list is the field of Denotational Semantics, which can only be described as a mistaken reaction to the apparent failures of Landin's programme of the 1960's and a complete turn in the wrong direction. Though Denotational Semantics does bear fruit; the fruit it bears is like the labored growth of a tree standing alone in the middle of a desert — just a few paces away from the fringes of a vast oasis. It would also not be going too far to say that this is also the perception of the field, from within.

The following excerpt underscores the nature of the situation and illustrates both the importance and significance of the enterprise that have taken place here and elsewhere. It gives a snapshot of the state of the field at the turn of the 21st century:

2001 Dagstuhl conference on Kleene algebras
Recent algebraic versions of classical results in formal language theory, e.g. Parikh's theorem, point to the exciting possibility of a general algebraic theory that subsumes classical combinatorial automata and formal language theory. Although some movement in this direction took place in the 1970's [mainly by Sch�tzenberger, Salomaa & Soittola, Kuich], much of this work was concerned with encoding specific combinatorial models in specific algebraic models such as semirings of formal power series. In contrast, recent results point to a much more general, purely axiomatic theory in the spirit of modern algebra.
Applications of Kleene Algebra, seminar 01081, Schloss Dagstuhl, Wagern, Germany, 2001 February 18-23

It is this vision that the Treatise on Formal Language and Automata Theory and its precursors The Algebraic Approach (1999-2003) and The Untold Story of Formal Languages (1993-1996) ultimately provide the fulfillment of.

Though the Computer Science stands over (and subsumes) its applications in various engineering fields; it is not a branch of engineering. The distinction becomes particularly clear in times such as these, where the day-to-day particulars of technology are and have been undergoing rapid transition. Those who are couched in an engineering world will be, and continually are, subject to the doom of rapid obsolescence, because of the very premise of adopting that vantage point. The field, itself, however is evanescent and has been around, with much the same issues as preoccupy the world at present, for over 2000 years; counting amongst its earliest forebears the logicians who were the Stoic philosophers of old (the real-life basis of the Vulcans of Star Trek).

An excellent case in point is the approach that had been adopted to the field of database engineering. At the turn of the 21st century, the field resides in a state in which it is locked up into a de facto standard, which reads like a Computer Science or "language lawyer" text borne of the 1960's or 1970's, borne of those who came from that period (and are still in it). It Byzantine virtues make the recently dumped European constitution look like light reading on the beach in comparison.

The consequences of this state of affairs with the ossified geekification of the field is that it is very difficult for anyone to make entry into the field, or anyone who has entered the field to make headway. This is not only reflected, for instance, in the high pay (i.e. hazard pay) nowadays assigned to the task; but the pandemic of failures, lockups, overruns, etc. that have been bought with that pay. For instance, in 2007 in the state of Wisconsin, the databases for the Department of Motor Vehicles, the Department of Revenue, for the system to be used in elections in the state, for parts of the Wisconsin Job Network have all been curtailed. At the federal level, the problems the IRS has had with its database system are well-known.

Meanwhile the promise of the future that lies in the paradigms of Categorical Programming (which the so-called Object Oriented revolution was never anything more than a caricature of) and Logic Programming (e.g. Datalog) remain unfulfilled.

One of the most difficult aspects of this long-term programme has been degeekifying the underlying languages and notations used in the formalisms in the field at present, and divesting them of the particulars that had dominated the computer engineering world of the late 20th and early 21st century. Over time, particularly since the 1990's, others have headed toward the same conclusion and have begun to take on the same effort (notably in Physics). Not the least of these reasons has been the emergence of numerous representations (outside the classical world of the digital electronics of boolean processors) in the natural world of the general phenomenon, captured by the underlying theory, of computation; the most significant example being quantum computation or photonics.

People are slowly coming to realise that computation is not about the latest day-to-day affairs in the programming language du jour, or widget of the hour. These things have come and gone in a kaleidoscope of changes dominating the latter part of the previous century and earliest part of the present century. It is about what pervades, and subsumes, all these developments: that which mathematicians had proceeded to capture in the earliest parts of the 20th century, and which others (notably including Leibnitz) had concerned themselves with long before.


A Treatise on Formal Language and Automata Theory (Early draft of chapter/topic outline)
This includes only a topic outline. The material slated for evetual inclusion is being drawn from the various articles currently in the process of being converted and uploaded to this site. The central focus of the treatise is the consummation of two separate programmes pursued over the past 20 years:

  1. The algebraization of formal language and automata theory. This development is an evolution the early developments begun in the 1950's and 1960's to cast the theory of grammars in algebraic form as an equational theory. In this framework, a grammar is treated as a system of inequalities over a suitably defined partially ordered algebra. The solutions to such systems are the "expressions" representing the languages, transductions or more general relations, themselves. These expressions include and extend the well-established formalism for regular expressions further up the Chomsky hierarchy to "context-free expressions", "turing expressions" and "translation expressions".
  2. The functionalization of imperative languages and the synthesis of an encompassing framework for functional, imperative and logic programming languages based on an infinitary extension of the Lambda Calculus. The objects of this extended Lambda Calculus are infinite terms which directly embody, in their syntactic structure, control flow structures. Through this 'static' formalism, the 'dynamic' features of programming languages and programming language semantics are encapsulated within an algebraic formalism.

The Untold Story of Formal Languages
This is a compilation of the following sets of articles series from USENET
The Equational Approachcomp.compilers, comp.theory1993 September 28 - October 4 (parts 1, 2, 3)
The Untold Story of Formal Language Theorycomp.theory1995 July 20 - August 6 (parts 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
The Untold Story of Formal Languagessci.math.research1996 January 31 - February 27 (parts 1, 2, 3, 4, 5)
as well as other related material developed both before and since. This series is being prepared for incorporation into the Algebraic Approach Series. The result of this, and the addition of several other large sets of series, will be the Treatise, itself.

(Preface, sections 1-2; PDF, 281k)
Preface and USENET Sources

1. The Algebraic Representation of Formal Languages
An analogy exists between the Fock spaces of classical and quantum physics and the state spaces of automata based on stacks, queues and other lists. Corresponding, respectively, to the create and annihilate operators are the "enqueue" and "dequeue" processes; to the vacuum state, the empty queue. Based on this correspondence, algebras can be devised that embody the actions of machine and automata models that employ such devices. This section reviews two such algebras, termed the polycyclic and Bra-Ket algebras, as well as providing the background to the underlying algebraic theories employed throughout this treatment. A remarkable property of the Bra-Ket algebra, explored at the end of this section, is its isomorphism to its own matrix algebras.

Contents
1.1. Background
1.2. Quantales, Dioids and Kleene Algebras
1.3. Algebraic Preliminaries
1.4. Extending Dioids to Quantales
1.5. The Fock Space Representation
1.6. Stack Algebra and Matrix Representations
1.7. An Expression Algebra for Context-Free Languages and Transductions
1.8. The Bra-Ket Algebra
1.9. The Self-Similarity Property

2. Regular Expressions -> DFA's
The conversion of regular expressions to finite automata may be regarded as the process of transforming the expression into a linear system of inequalities which the expression is the least solution of. The process leads directly to algorithms that effect the conversion by carrying out algebraic computation on regular expressions. The algorithms have served as the basis of the (RegEx software cache, and the REX utility). REX is a utility whose functionality is a large superset of the standard POSIX utility GREP, as well as that of GNU-GREP. Amongst its features is its ability to process intersections, differences and even anagrams. (The software set is slated for re-release in this archive and is currently undergoing upgrade to a new cache for processing context-free expressions).

Contents
2.1. Preliminaries
2.2. Introduction
2.3. Automata as Systems of Equations
2.4. The Factoring Property ("Taylor's Theorem")
2.5. Factoring Regular Expressions
2.6. Calculation Method
2.7. Examples
2.8. Complexity
2.9. Conversion to an NFA
2.10. Derivation Sequences as Inequalities
2.11. Algebraic Manipulation of Systems
2.12. Infinite Automata and Recursive Systems of Equations

(Sections 3-5; PDF, 313k)
3. The Algebraic Representation of Automata

We are very far from possessing a theory of automata which deserves that name, that is, a properly mathematical-logical theory.
—John von Neumann, The General and Logical Theory of Automata

One of the central features of the Untold Story series is that it represents both a continuation and completion of the vision originally laid out by von Neumann, himself. This is true both in the sense just cited and in the sense of incorporating both of the primary strands of von Neumann's thought — Mathematical Physics and Computer Science — in the overall synthesis that this series presents.

This section features the introduction of an algebra for state diagrams. Treating a state diagram as a graphical representation of a matrix, one can define the operations of addition and multiplication over them. The result is an extension of the cycle notation used for representing groups.

The classes of automata seen in classical formal language and automata theory may all be seen as instances of a general "infinite state automata" model. What distinguishes each class is (a) the factoring of its state space into the product of a "control" space and "device" space; (b) the imposition of "selection" rules constraining the allowable transitions in the device space and (c) the imposition of "symmetry" rules governing the transitions in device space that allow an set of transitions to be generated from a finite set. Features (b) and (c) play roles analogous to that played by selection rules and symmetry in non-linear dynamics.

In this section, the notion of automata and state diagrams as graphical representations of systems of inequalities is developed. This concept leads us directly to the issue of operator algebras and their matrix representations.

Contents
3.1. State Diagrams, Automata and Matrices
3.2. Infinite State Automata
3.3. Varieties of Infinite State Automata
3.4. Representation of an Automaton as a System of Inequalities
3.5. Connection to the Operator and Matrix Algebras

4. Context-Free Expressions and the Bra-Ket Algebra
The class of infinite state automata corresponding to the 1-stack automata can be directly converted via into a finite linear system of inequalities over an algebra that incorporates the Bra-Ket algebra. The resulting expressions extend the classical theory of regular expressions up to type 2 in the Chomsky hierarchy and so may be termed context-free expressions.

A process for converting the non-linear systems of inequalities that represent grammars in extended BNF is also developed. A central feature of this conversion is the process of linearization, which may be likened to that seen in mathematical physics of taking the spinor "square root" of a vector. With the use of the bra-ket operators, the non-linear systems representing context-free grammars are decoupled into first-order systems.

Contents
4.1. The Bra-Ket Algbera; Tensor Products
4.2. Infinite State Automata and Bra-Ket Notation: Example
4.3. Reduction of Push-Down Automata to Regular Bra-Ket Grammars
4.4. Embedding Context-Free Languages in C2 >< RX*
4.5. Example
4.6. Context-Free Expressions and Fixed-Point Systems: Examples
4.7. Converting EBNF Grammars and Fixed-Point Systems to Bra-Ket Form
4.8. Optimizations

  • 4.8.1. Performing Equivalence Tests
  • 4.8.2. Removing Uninformative Operators
  • 4.8.3. Other Optimizations
  • 5. Turing Expressions and Translation Expressions
    Taking this process one step further, we can go further and also treat machine and automata models that employ multiple stacks or double-ended queues. Along the way we reproduce the classical result that links open-ended queues to stacks, and double-ended tapes to twin-stacks.

    An operator algebra for Turing machines and Turing transducers is constructed from the Bra-Ket algebra. The parsing/translation problem is formulated in the general setting, where we show that the complexity of the representation of a set of outputs corresponding to a given input under translation is linear. Though not well-known, this is actually a somewhat trivial consequence of Kolmogorov complexity theory. The representation provided here is a realization of the theoretical result.

    Contents
    5.1. An Expression Algebra for Turing Languages and Transductions
    5.2. Conversion of an Open-Ended Queue to a Stack
    5.3. Reduction of Turing Machines via the Bra-Ket Algebra
    5.4. Linear Bounded Turing Machines and Context-Sensitive Expressions
    5.5. Translation Systems and the Markov Machine

    (Sections 6-9; PDF, 326k)
    6. The Hardest Context-Free Language
    A context-free language exists over a monoid M, which we call the "Hardest CFL", HL, that possesses a universal property that for every other context-free language L over every other monoid N maps into HL under a suitably-defined monoid homomorphism hL: N -> M, while the complement of L maps outside of HL. This result was originally demonstrated by Greibach in the 1960's. A more direct proof is provided here that employs context-free expressions.

    Contents
    6.1. History
    6.2. The Mapping
    6.3. Equivalence of CFL Membership to Membership in HL
    6.4. Examples

    7. Grammars as Fixed-Point Systems
    Previous attempts at developing a formalism for context-free expressions in the 1970's (Yntema, Gruska and McWhirtier, independently of one another), centered on the notion of fixed-point systems and the least fixed point operator. We state the fundamental properties of the "fixed point" calculus, and then provide a means of calculating fixed points within the algebra of context-free expressions, based on the method developed in section 4.

    It is here that we also begin to see that link between the context-free expression algebra and the celebrated Chomsky-Sch�tzenberger theorem. Here, we explain how such an algebra arises from the theorem itself.

    The difference between the two methods of representation may be likened to that of solving an equation 4x = 3 by writing x = 3/4, versus that of solving the equation by writing out a decimal expansion 1.333... Here, the analogous role for decimals is played by context-free expressions. Just as it is true that linear systems will lead to the repeating decimals that identify rational numbers, so it is that the linear systems here will lead to rational or regular expressions. The innovation we present is that we go one step further and also write down "decimal" solutions for non-linear systems, where the corresponding expressions need not be regular or rational.

    Along the way, we also find that the notion of parse trees and parse forests comes under the scope of our synthesis. Regarding each branch of a parse tree as representing an inequality, a single parse tree, and AND-OR tree and even an infinite set of parse trees can be represented by a single (generally non-linear) system. This system, in turn, possesses a least fixed-point solution. This solution is none other than the representation of set of parses or translations, alluded to in section 5.

    Contents
    7.1. History
    7.2. Fixed-Point Systems
    7.3. Generalizing Parse Trees and Forests as Systems of Inequalities and O(n) Parsing
    7.4. Solving Context-Free Systems: History
    7.5. The Gruska/McWhirtier/Yntema Approach
    7.6. The Current Approaches

  • 7.6.1. The Context-Free Expression Algebra
  • 7.6.2. The Chomsky-Sch�tzenberger Theorem

  • 7.7. The Fixed-Point Operator
    7.8. Basic Properties of the Least Fixed Point Operator
    7.9. Schur's Lemma
    7.10. Fixed-Point Solution: Example
    7.11. Solving a Fixed-Point System
    7.12. Fixed-Point Solutions: Further Examples
  • 7.12.1. A Simple Expressions Grammar
  • 7.12.2. A Non-Trivial Optimization
  • 8. Direct Proof of Parikh's Theorem
    Published also as Parikh's Theorem in Commutative Kleene Algebra.

    The commutative version of the Chomsky-Sch�tzenberger theorem is the Parikh theorem. The methods developed for solving fixed point systems for non-linear systems substantially simplify when brought over the setting of commutative algebras. In addition, the Brzozowski derivatives of the non-commutative theory now become bona fide derivation operators.

    The result is that we are able to formally define a calculus for commutative Kleene algebras in which a version of Taylor's Theorem, the Chain Rule and the theorem on total differentials all hold true. With this new background, the least fixed-point operator of section 7 becomes trivial to implement. In detail, if f(x) is an algebraic function of x, then the least solution to the inequality x ≥ f(x) will just be f'(f(0))* f(0).

    We conjecture that a similar result also holds for systems of dimension 2 or more, namely that the 2nd order iteration of the operator x |-> f'(x)* f(0) starting from x = f(0) will suffice, where f'(x) now denotes the Jacobian of f(x). This requires the notion of solving systems under a homomorphic reduction of a given algebra. Though the concepts have not yet been developed to completion (neither here nor in the Parikh paper 2006 update), it seems clear that the second order iteration will suffice to solve the general system. This has been verified in the case of 2-dimensional system by Kozen c. 2000

    Contents
    8.1. Background
    8.2. Kleene Algebras
    8.3. Abelian Kleene Algebras
    8.4. Differential Operators
    8.5. Parikh's Theorem

    9. Properties of Context-Free Systems
    This section has not yet been incorporated into this archive (though the USENET orignals can be found via the links provided). The normal form theorem states that every context-free expressions can be converted to a regular expression over the operators and "operator-free" context-free expressions, in which "the kets occur to the left of the bras". Thus, for instance, a normal form for (a <1| + |1> b)* would be (N |1> b)* N (a <1| N)*, where N is the "operator-free" expression given by <0| (a <1| + |1> b)* |0>.

    The conversion implements a process of transforming context-free expressions into context-free grammars. The process generalizes the classical result whereby pushdown automata are transformed into grammars. Here, the notion of grammar is replaced by that of the "operator-free" expression, while the automaton is replaced by a linear system that has the expression to be converted as its least solution.

    Over a pure operator algebra, this conversion in turn leads to a representation that implements a shortest-paths formulation, reminiscent of Valiant's O(nlog 7) complexity solution for the context-free recognition problem.

    Contents
    9.1. D-CFL < N-CFL
    9.2. Converting CFE --> CFG: The Normal Form Theorem
    9.3. Closure Properties for CFL's
    9.4. CFL-RL Intersection


    The Algebraic Approach to Formal Language and Automata Theory
    Derived ultimately from the Algebraic Approach, present on the web in 1999-2003, this is the core of what will eventually become the treatise on formal language and automata theory. This material is also drawn, in part, from the Algebraic Foundation for Formal Language Theory series from comp.theory, 2004 May 5-20.

    Related articles
    Algebraic LR Parsing (PDF, 63k)
    Notes on the reformulation of the LR parsing paradigm within the algebraic approach.
    The Syntax of the Programming Language C
    Full Prolog Syntax
    These formulations of the syntax of various programming languages will be used as case study examples, in followup articles, to illustrate the algebraic methods developed here.

    Preface and References (PDF, 46k)

    0. The Algebraic Point of View (PDF, 250k)
    Supplement:: Measures, Normed Monoids and P = NP (PDF, 31k)

    There are three fundamental modes of representation that appear in the theory of formal languages and automata: (E) the language or translation expression (e.g. regular expressions); (G) the grammar or transition system; and (A) the automaton, machine or transducer. All three modes may be brought within a unifying framework, in which modes (G) and (A) serve as various means of representing systems of inequalities between the objects that reside in partially ordered algebras that house the expressions of mode (E).

    The principle of finiteness or minimality is generally asserted in association with these modes of representation, stating that the entity being generated or processed by the grammar, transition system, automaton, machine or transducer, should comprise no more than what can be derived by a finite number of steps. Rendered in algebraic form, this principle states that the entity should be identified as the system's least solution.

    The expressions of the first mode of representation are then the entities, themselves: it is the partially ordered algebras from which the expressions are drawn. The ordering relation then serves to encapculate the concepts of derivation, inference, reduction or instantiation, bringing them all under a common framework.

    Contents
    0.0. Set-Theoretical Preliminaries and Notational Conventions
    0.1. Algebraic Preliminaries

  • 0.1.1. Semigroups and Monoids
  • 0.1.2. Semi-Lattices
  • 0.1.3. Semirings and Dioids
  • 0.1.4. Examples of Dioids
  • 0.1.5. Kleene Algebras

  • 0.2. Grammars as Systems of Inequalities
  • 0.2.1. Background
  • 0.2.2. Grammars
  • 0.2.3. Algebraic and Context-Free Grammars
  • 0.2.4. Fixed-Point Systems
  • 0.2.5. Normed Monoids and Context-Sensitive Grammars
  • 0.2.6. General Grammars and Relational Systems of Inequalities

  • 0.3. Automata as Systems of Inequalities
  • 0.3.1. Recognizers, Generators and Transducers
  • 0.3.2. Finite State Automata
  • 0.3.3. Varieties of Infinite State Automata
  • 1. Dioids and Quantales (PDF, 283k)
    The notion of the "word", prevalent in formal language and automata theory is encapsulated algebraically by a structure known as a semigroup, whose distinguishing feature is that it possesses a product that satisfies the law of associativity (ab)c=a(bc). The product is naturally intepreted as the operation of combining words together, or of sequentially combining processes. A dioid contains, in addition, the partial ordering relation that serves to encapsulate the notions of derivations or reductions. Its distinguishing feature is that is defines a "least ordered" entity, which might be likened to the notion of failure, or error, and a "least upper bound", which takes the "sum" or "least" of two or more entities. This might be regarded as the precursor, at a primitive level, of the notion of non-deterministic branching.

    The different varieties of dioid are distinguished by what range of entities shall have least upper bounds. The results are algebras, with the sum distributing over the product, but satisfying the identity a + a = a. In the most general case, if one allows sums over arbitrary (even infinite) sets, the resulting structure will be what is already known from quantum theory and non-linear dynamics as a quantale. Other more restricted alternatives, however, may be defined. Indeed, the totality of all the possibilities forms a hierarchy that mimics the Chomsky hierarchy and recalls the concept of "abstract families of language" from the classical theory of formal languages and automata.

    In this section, the hierarchy is shown to possess a family of "extension" maps, which allow any of the varieties of dioid to be extended to a larger variety, including all the way to the level of a quantale. The process of "quantale completion" recalls, in many ways, that of completion of lattices or partial orderings by the use of ideals. Indeed, a simple formulation for the notion of ideals exists here, which allows this construction to be carried out in full generality. That is what this section will focus on.

    Of later interest will be two particular constructions: those that exist between the varieties of dioids that play the analogous role of all the types 0-3 of the Chomsky hierarchy. The actual extension maps will serve as the basis for "fixed-point" theorems that embody the classical Chomsky-Sch�tzenberger theorems and provide the foundation for extending the classical theory of regular expressions to a general theory of expressions for languages, translations and other processes.

    Contents
    1.1. Dioids, Quantales and Kleene Algebras

  • 1.1.1. Dioids, Quantales and Adjoints
  • 1.1.2. A Hierarchy of Dioid Algebras
  • 1.1.3. Analytic Operators
  • 1.1.4. Closure Under Substitutions
  • 1.1.5. Closure Under Inverse Morphisms

  • 1.2. Ideals and Quantales
  • 1.2.1. Ideals, Basic Properties
  • 1.2.2. Defining a Quantale Structure on Ideals
  • 1.2.2.1. Products
  • 1.2.2.2. Sums
  • 1.2.2.3. Quantale Structure
  • 1.2.2.4. Morphisms
  • 1.2.2.5. Free Quantale Extensions

  • 1.3. The Dioid-Quantale Hierarchy
    1.4. Analytic Operators and Fixed-Point Theorems
  • 1.4.1. Rational Dioids
  • 1.4.2. *-Complete Kleene Algebras
  • 1.4.3. Context-Free and Turing Dioids: Chomsky-Sch�tzenberger Adjunctions
  • 1.4.4. The Chomsky Hierarchy
  • 1.4.5. Rational Transductions and Inverse Morphisms in Algebraic Form
  • 1.4.5.1. The Classical Rational Transduction Construction
  • 1.4.5.2. The Classical Inverse Morphism Construction

  • 1.5. Monads, Analytic Dioids and Power Series

    2. Kleene Algebras and Regular Expressions (2.1-2.7.2 PDF, 501k)
    One of the more fundamental discoveries made in recent times in the field of formal language and automata theory was the existence of a finite axiomatization for the theory of regular expressions. Though it had been known since the 1970's that no finite set of identity equations can completely produce all the identities that hold for regular expressions; it turns out that the dioid is sufficient to accomplish the task — provided one adopts as an axiom that the least solution to the inequality x ≥ a + bx + xc should be x = b* a c*. In fact, that's all that's required! (Well, actually, we'd also need 0* = 1). The distinguishing feature of Kleene algebras, with respect to this axiom set, is the solvability of more general finite linear systems within them (as well as non-linear systems of a restricted type).

    The first proof that this axiom base, or any equivalent to it, sufficed to completely embody the regular expressions emerged in the mid 1990's. However, instead of directly incorporating the classical proof of Kleene's theorem given by Brzozowski in the early 1960's, it employed a somewhat standard textbook technique for converting regular expressions to automata. The method used here algebraizes the construction by Brzozowski. What we will find is that any map h from a set X to a Kleene algebra K will lead to a linear system of inequalities over the algebra K which, though infinite, will still possess a least solution. That solution, in fact, is none other than the map which converts a regular expression over the alphabet X into an element of the Kleene algebra K. This actually shows that the equations between regular expressions are precisely those equations provable under the Kleene algebra axioms.

    The extra operator, called the Kleene star may be thought of as embodying an infinite branching so that x* would mean 1 + x + x2 + x3 + .... In a process point of view, this would be interpreted as the action of repeating x a undetermined number of times — thus encapculating the notion of iteration at a very primitive level.

    However, the process point of view does not really fit the Kleene algebra mould. Instead, it is more in line with the notion of processes as changing or transforming states. This naturally leads to a generalization of the Kleene algebra concept analogous to the generalization of groups to groupoids. Hence, the name Kleene algebroid. It also serves as a necessary precursor to rounding out the completeness proof. A benefit that emerges from this more generalized setting is that we now have a direct means of embodying the graphs, called state diagrams, which are central to the theory of automata, machines and transducers. Thus, whereas the theory of Kleene algebras serves as a foundation for expressions (our first mode of representation, (E)), the theory of Kleene algebroids serves as a foundation for automata, transducers and machines (our third mode, (A)).

    Contents
    2.1. Basic Kleene Algbera Theory

  • 2.1.1. Kleene Algebra Axiomatizations
  • 2.1.2. Generalized Kleene Algebras
  • 2.1.2.1. One-Sided Kleene Algebras
  • 2.1.2.2. Equational and Implicational Theories of Regular Expressions
  • 2.1.3. Basic Kleene Algebra Properties
  • 2.1.4. Regular Expressions and Rational Subsets
  • 2.1.5. *-Completeness and Continuity
  • 2.1.6. Commutative Kleene Algebras
  • 2.1.7. Subtractive Kleene Algebras
  • 2.1.8. Matrix Algebras and Linear Systems
  • 2.1.9. Kleene's Theorem and Equational Completeness

  • 2.2. Kleene Algebroids
  • 2.2.1. State Diagrams, Rational Closures and Automata
  • 2.2.2. Categories, Di-Algebroids, and Kleene Algebroids
  • 2.2.3. Examples of Kleene Algebroids
  • 2.2.4. Basic Kleene Algeboid Properties
  • 2.2.5. Analytic Di-Algebroids and Quantaloids

  • 2.3. Linear Systems and Matrix Algebroids
  • 2.3.1. Matrix Algebroids
  • 2.3.2. Solutions of Linear Systems
  • 2.3.3. Block Decompositions and the Lifting Theorem

  • 2.4. Derivatives and Brzozowski Systems
  • 2.4.1. Equational Completeness of Kleene Algebras
  • 2.4.2. Basic Properties of the Brzozowski Derivative
  • 2.4.3. Closed Families
  • 2.4.4. Brzozowski Systems

  • 2.5. Intertwining Maps; Subsets and Minimalization
  • 2.5.1. Intertwining Maps
  • 2.5.2. Applications of Intertwining Maps to Automata Constructions
  • 2.5.2.1. Subset Construction
  • 2.5.2.2. Equivalence Reduction

  • 2.6. Completeness of Kleene Algebras
  • 2.6.1. Preservation of Sums
  • 2.6.2. Preservation of Products and Stars
  • 2.6.3. Generalized Kleene's Theorem

  • 2.7. Derivatives and Kleene Algebroids
  • 2.7.1. Brzozowski Derivatives
  • 2.7.2. Generalized Closure
  • 2.7.3. Generalized Brzozowski Systems

  • 2.8. Kleene Algebroid Completeness
    2.9. Free Extensions and Substitutions

    3. Context-Free Expressions (PDF, 353k)
    The theory of context-free expressions is formulated here. The central feature of this theory is the appearance of an algebra which ultimately arises out of an analogy between stacks and queues and Fock spaces in mathematical Physics. These notes, originated from section 3 of the 1999-2003 Algebraic Approach web series; which, here, form sections 3.1 - 3.7. The original treatement made tacit use of the contents that later came to be developed in section 1, and needed to be held off pending the completion of section 1.

    Contents
    3.1. Introduction
    3.2. Basic Constructions

  • 3.2.1. Tensor Products
  • 3.2.2. Matrix Algebras over 2
  • 3.2.3. The Bra-Ket Algebra

  • 3.3. Conversion of Grammars to Expressions
    3.4. The Normal Form Theorem
  • 3.4.1. Fixed-Point Lemmas
  • 3.4.2. Normal Form Decomposition
  • 3.4.3. A Calculus for Normal Forms
  • 3.4.3.1. Decomposition
  • 3.4.3.2. Fixed-Point Property

  • 3.5. Schur's Lemma
  • 3.5.1. Canonical Matrix Representation
  • 3.5.2. Reduced Normal Forms

  • 3.6. The Fixed Point Theorem
    3.7. Fixed-Point Calculus
  • 3.7.1. CEX - The Context-Free Expression Filter
  • 3.7.1.1. CEX Syntax
  • 3.7.1.2. CEX Semantics
  • 3.7.2. A Calculus for the Fixed-Point Operator
  • 3.7.3. Context-Free Kleene Algebras
  • 3.7.3.1. Context-Freeness and Continuity
  • 3.7.3.2. Construction of Fixed-Point Closures

  • 3.8. Algebraic Chomsky-Sch�tzenberger Theorems
  • 3.8.1. The Classical Chomsky-Sch�tzenberger Theorem
  • 3.8.2. The Chomsky-Sch�tzenberger Construction for Monoids
  • 3.8.3. The Chomsky-Sch�tzenberger Construction for General Algebras
  • 3.8.4. Turing Machines and Turing Expressions

  • 3.9. O(n) Parsing and Recognition by "Shortest Paths" Bra-Ket Reduction

    4. Algebraic Parsing Theory and Translation Expressions
    Since the algebraic approach programme is ultimately grounded in the study of parsing theory, it is only natural that the focus be brought back to these applications.

    The methodology of LR parsing originated in the 1960's by Donald Knuth. It provides a means of representing deterministic context-free languages and processes (indeed, it is complete in this sense), as well as that of effectively handling non-deterministic languages and processes. Expanding on the latter application, the approach was generalized by Tomita in the mid 1980's to what is known as generalized LR (or GLR) parsing. A distinguishing feature of the method was the efficient handling of the potentially large number of structures that may be derived from the parse or translation of a highly ambiguous input.

    One can go much further with this development. In the general theory of context-free languages and translations, the possibility of infinite ambiguity arises. Various ad hoc restrictions have been employed in the numerous methods in existence to deal with or otherwise evade this issue. However, the notion underlying the possibility actually holds the kernel of the resolution: the infinity comprising the translations or parses of a given item are of such a form that they, themselves, form a language — a context-free language. Thus, a means that provides an omnibus representation for outputs, translations or parses is tantamount to a means for embodying context-free languages, themselves; i.e. context-free expressions.

    It turns out that such a structure, no matter what the size of the set it denotes, can be represented by a context-free expression that, itself, possesses a complexity linear in the size of the item being processed. The existence of linear complexity representations, of course, is already a well-known consequence of Kolmogorov complexity theory. Here, we find a realization of the theoretical result.

    Context-free expressions, in fact, will serve to directly embody such "output" or "translation" sets. What this means is that there is an effective procedure for enumerating the elements of a set represented by a context-free expression or testing for its emptiness that matches the best-known process for context-free recognition — Valiant's algorithm. In fact, the algorithm is naturally represented by a process for reducing expressions in the pure bra-ket operator algebra, and arises as a consequence of the normal form theorem.

    Another important application of the foregoing is the mathematical derivation or "calculation" of parsers or processing mechanisms for languages and processes specified by context-free languages or transductions. An important case is the design and implementation of the syntax for a given language. This application is illustrated in detail in several case studies.

    Contents
    4.1. Parsing, Translation and Enumeration
    4.2. Enumeration and O(n) Parsing (Supplement: PDF, 70k)
    4.3. Algebraic LR-Infinity Parsing (Supplement: PDF, 206k)
    4.4. Parsing as Context-Free/Regular Intersection (Supplement: PDF, 51k)
    4.5. Valiant, Shortest-Paths and Evaluation of Bra-Ket Expressions
    4.6. Solving Context-Free Systems — Case Studies

    5. Turing and Translation Expressions
    This section, like the previous, has yet to be compiled and included here. However, some of what is to appear is already present under the Untold Story of Formal Languages series.


    Parikh's Theorem in Commutative Kleene Algebra (PDF, 242K)
    1999 Original: (bibtex, abstract, postscript)
    This is a 2006 update of the 1999 paper presented to the LICS '99 conference by the co-author Dexter Kozen. Working within the algebraic approach to formal language and automata theory, the 1999 paper presents Parikh's Theorem as a fixed point theorem for commutative Kleene algebras and then provides the closed form solution. An alternate closed form solution is provided in the 2006 update as a conjecture. The 2006 update also incorporates some of the elements of the non-commutative generalizations of the Parikh theorem known as the Chomsky-Sch�tzenberger theorems, and discusses the key issues surrounding this prospective generalization. These Chomsky-Sch�tzenberger theorems, in turn, serve as the algebraic foundation for a generalized theory of language expressions.

    Much of the original paper, sections 1-4, arose in the much larger context that comprised the 1996 sci.math.research article series Untold Story of Formal Languages, its predecessor 1995 series from comp.theory, and the 1993 Equational Approach series, also from comp.theory; all of which has been integrated here into the Untold Story of Formal Languages series. Section 5 was entirely from Kozen, who explicitly formulated and resolved the problem of finding a direct closed-form solution to the multivariate system. The paper was all written up and presented by Kozen in the 1999 IEEE LICS conference, and in a seminar at Cornell. Even at the present time, the content stands out as distinctly futuristic in flavor � as also does much of the development in Kleene algebra theory by Kozen � and caught the reviewers by surprise.

    Additional elements of the content from the 1996 edition absent in the 1999 paper, as well as other elements from the 1995 series, which contains a still-unpublished proof of another closed-form solution have been added and are still being added. This is, therefore, being updated substantially from the original 1999 LICS paper.

    In addition a new element, section 6, has been added to incorporate the various non-commutative fixed point theorems that can be established in the finitary Kleene algebra without resort to the infinitary power of *-continuous Kleene algebras and to begin to address the not-so-well-known non-commutative generalizations of Parikh�s Theorem which go under the name of the Chomsky-Sch�tzenberger Theorems.

    Contents
    Abstract

  • Update Notes

  • 1. Introduction
    2. Commutative Kleene Algebra
    3. Polynomials and Derivations
  • 3.1 Polynomials over a commutative Kleene algebra
  • 3.2 Kleene Derivations

  • 4. A Generalization of Parikh's Theorem
    5. A Closed Form Solution
  • 5.1 Taylor Iteration
  • 5.2 Convergence of the Taylor Iteration
  • 5.3 General Solution (Contains a new conjecture, still unproven)

  • 6. A Non-Commutative Parikh's Theorem?
  • 6.1 Non-Commutative Fixed-Point Solutions
  • 6.2 The Chomsky-Sch�tzenberger Theorem
  • 6.3 Context-Free Expressions
  • Acknowledgements and References

    Context-Free Expressions (PDF, 171k)
    This is a translation from German and revision of the Dominik L�cke's 2003 Context-Free Subsets of Monoids carried out in early 2007. In turn, the 2003 paper is largely drawn off of part 3 of The Algebraic Approach. Numerous comments and clarifying remarks have been added. Only the first 3 sections, and part of the 4 are translated here, so far. It's doubtful that much of anything beyond section 4 will be translated here since (a) the material is already being subsumed by the Untold Story of Formal Languages series, (b) by the Algebraic Approach series. Much of what is here, below, in the translation (particularly the material on the Fock space representations) is additional material not present in the 2003 original

    This paper introduces the notion of Context-Free Expressions in a form developed originally in a series of USENET articles in comp.theory, comp.compilers, sci.math.research and elsewhere over a period of time ranging from 1991 up to the present.

    Contents
    1. Introduction
    2. The von Neumann Correspondence
    3. Matrix Algebras

  • 3.1. Dioids and Kleene Algebras
  • 3.2. Matrix Algebras
  • 3.3. Matrix Representations of Type 3 Languages

  • 4. Bra-Ket Notation
  • 4.0. Background: The Chomsky-Sch�tzenberger Theorem
  • 4.1. Hilbert Spaces

  • Stacks as Algebras (PDF, 99k)
    In this article, the genesis of a formal "Dirac" bra-ket notation as an algebraic representation for stacks is discussed in detail. The bra and kets in this formalism are analogous to the raising and lowering operators of an N-body Maxwell-Boltzmann Fock extension of a Hilbert space. Here, an analogy is drawn between raising operator | v> corresponding to a Hilbert space vector v and the push operation. The analogue of the the lowering operator <v| is the dual pop and test operator.

    The basis of the analogy is the following characterization result. If T is the basis of a Hilbert space, then the basis of the corresponding Maxwell-Boltzmann Fock space is the stack language whose symbol set is T. Unlike the Fermi-Dirac and Einstein-Boltzmann Fock spaces, there are no special (anti-)symmetry properties, so that the vectors of the Maxwell-Boltzmann Fock space are just words formed out of the vectors of the original Hilbert space. The raising operator then adds a new element to a given word, the lowering operator erases the last element added — i.e., the last-in-first-out protocol that characterizes a stack.

    It is ultimately out of this that the analogue Dirac notation for context-free expressions, Turing expressions and translation expressions arise.


    Notes on the Algebraic Theory of Context Free, Translation and Turing Expressions (PDF, 97k)
    Partly on the inspiration of the 1987 edition of Derick Wood's Theory of Computation (particularly, the sections on Brzozowski derivatives, context-free expressions and translation expressions), a programme was established here to reinvent formal language and automata theory on an algebraic foundation suitable for the incorporation of context-free expressions and other language and transduction expressions higher up the Chomsky hierarchy.

    This article, derived from a letter sent to Derick Wood, provides the outline of the key elements of this programme.


    The Algebraic Basis of Context-Free and Turing Expressions (PDF, 69k)
    (Supplement to section 3 of the Algebraic Approach)

    Corresponding to the Chomsky hierarchy is a much more expansive hierarchy of algebraic structures known as dioids. This hierarchy provides an ordering of dioid categories related to one another by adjunctions. At the bottom are the dioids (idempotent semirings) and at the top, quantales. The algebra corresponding to Chomsky type 3 is what is already known as the *-continuous Kleene algebra. Those corresponding to Chomsky types 2 and 0 are heretofore unnamed.

    The extensions of the theory of regular expressions to Chomsky types 0 and 2 amounts to writing down the algebraic analogue of the Chomsky-Sch�tzenberger theorems. They may be realized as instances of a large family of adjunctions relating a hierarchy of algebras known as dioids; and the theorems, themselves, results governing the solutions of non-commutative fixed-point systems or of more general non-linear systems.


    SEQCD - Where Landin Went Wrong (PDF, 41k)
    This article discusses a significant departure from and correction to Landin's failed attempt in the 1960's to estblish a correspondence between functional-imperative programming and a procedural semantics for the Lambda Calculus. The failure in Landin's approach centered on the issue of transparently resolving cyclic control flow structures. The insight developed here lies in the realisation that the consummation of the programme necessarily involves an infinitary extension of the Lambda calculus. Ultimately, this serves as the basis of the SEQCD machine.


    Eager and Normal-Order Evaluation and the SECD-2 Machine (PDF, 76k)
    A formal definition of eager and normal order evaluation is presented and, from it, a definition for the SECD-2 machine rigorously derived. The SECD-2 machine is the non-cyclic core of the SEQCD machine.


    Stochastic Expressions and State Spaces (PDF, 64k)
    This article further expands the von Neumann correspondence, linking von Neumann's formalism for state spaces to a power series representation for expressions in formal language theory.

    Contents
    1. Convex Algebra
    2. States and Stochastic Automata
    3. The von Neumann Correspondence
    4. Stochastic Expressions as Mixed States


    Stochastic Finite Automata (PDF, 78k)
    This model is developed as a generalization both of Markov Models and Hidden Markov Models. A stochastic finite automaton may be thought of as a finite automaton whose transition arcs are labelled with probabilities, and its states with "final state" probabilities, subject to the normalization condition that the total of all the transition probabilities for transitions emanating from a state, plus the state's final state probability should be 1. A Hidden Markov model is conceived of as a limiting case where the final state probabilities tend to 0.

    The typical applications of the Hidden Markov Model — (1) estimating transition probabilities from sample sets, (2) finding the most likely path, (3) determining the probability of a given sequence — all generalize to stochastic finite automata. In particular, the Baum-Welch algorithm has both a generalization and cleaner formulation in this broader context.

    Contents
    1. Definitions
    2. Forward and Backward Probabilities
    3. Standard Applications

  • 3.1. Word Probabilities
  • 3.2. Finding the Most Likely Path
  • 3.3. Parameter Estimation (or: Training a Stochastic Automaton)

  • 4. Factorable Stochastic Automata
    5. Hidden Markov Models

    Typed Lambda Calculus and Totally Recursive Functions (PDF, 91k)
    The representation of recursive functions within the Lambda calculus as combinators is discussed here in depth.


    Adjoints, Monads and Comonads (PDF, 45k)
    The definition of adjoints, monads and comonads and the related elements of category theory.


    CAM-2 - The Categorical Abstract Machine(s) (PDF, 68k)
    The CAM can be rigorously defined without having to post its elements ab initio. Doing so will not only eliminate its awkward handling of the combonators associated with the functional type; but also provide a general template that will allow a suitable edition of a CAM to easily be derived for general extensions or modifications of the underlying language.


    Categorical Lambda Calculus Without Types (PDF, 67k)
    A discussion of an elaboration on the Lambek-Scott treatment of the untyped categorical Lambda Calculus using C-monoids.


    The Syntax of the Programming Language C (PDF, 45k)
    This article provides a remarkably simple and concise formulation of the syntax for the programming language C in tabular form. The contrasting feature is that all the redundancies of a typical LALR(1) grammar have been cleared up. This is not meant to be used with the older parser generator technology (e.g. YACC, Bison or any LR-based parser), but rather in conjunction with the algebraic methods developed elsewhere throughout this archive.


    The Syntax of SQL (PDF, 54k)
    The syntax of a subset of the programming language SQL, which is the standard language for relational databases, is presented here.


    Full Prolog Syntax (PDF, 46k)
    A syntax of the Prolog programming language. There are significant simplifications, compared to the standard, incorporated here, without a loss of content.


    Combinatory Logic (PDF, 143k)
    An introductory article, dating from 1984-1985, on combinatory logic, with a particular focus on the development of a functional calculus for recursive functions, with a formal definition of differentiation and integration included. A significant novelty, also, is the omnibus treatment of the abstraction algorithm.

    Related article: What is KI? (PDF, 26k)
    Contents
    1. Combinator Types

  • 1.1. Permuters
  • 1.2. Groupers
  • 1.3. Replicators
  • 1.4. Deletors
  • 1.5. Summary

  • 2. Recursion
  • 2.1. The Recursion Hierarchy
  • 2.2. Formal Integration and Differentiation
  • 2.3. Representation of Primitive Recursion as Integration

  • What is an Operating System? (PDF, 97k)
    This is a prescient account, written in 2001-2002, on the present state and future of operating systems. Much of what was foretold has already come to pass, particularly with the evolution of cell phones and other hand-held devices into minicomputers and web-accessing devices; and with the integration of the cell phone network and internet.

    The WSMK is the "World's Smallest Multitasking Kernel". Originating in the 8-bit 8051 world, it has been used in a variety of embedded applications where extensive multitasking has been a central feature or requirement. Included in this portfolio are the designs of multiprocessor factory testing systems, multi-modular electronic displays, stepper motor controllers running multiple units concurrently, progressive jackpot displays and other Las Vegas style gaming displays, etc.

    Contents
    1. The Present State: UNIX/LINUX

  • 1.1. Basic Structures
  • 1.2. Command Shell
  • 1.3. Programmer's Interface

  • 2. WSMK: A Pre-Kernel for an OS
  • 2.1. WSMK as an Event-Driven Pre-Kernel
  • 2.2. The Kernel Routines and Configurations
  • 2.2.1. Stand-Alone
  • 2.2.2. Hosted
  • 2.2.3. Stand-Alone/Hosted with Foreground
  • 2.3. The Kernel Architecture
  • 2.4. Including a Priority Structure

  • 3. The Future of Operating Systems
  • 3.1. Radically Event-Driven, Real-Time Architecture
  • 3.2. No File System
  • 3.3. Concurrent Multithreaded Systems Language, Unified with Command Layer Shell
  • 3.4. An Imperative-Functional Language in a Category-Theoretic Paradigm
  • 3.5. Algebraic Programming
  • 3.6. Devices/Stream Integration, Concurrency, "Channels", etc.
  • 3.7. A Synthesis of a Network and Publishing Layer + Integration of the Web
  • 3.8. Applications Primarily for Off-Desktop Real-Time Usage

  • Garbage Collection with Cyclic Structures (PDF, 143k)
    Memory management and recycling with applications that have cyclic structures has been a major issue in Computer Science. When there are cycles involved, it's difficult to tell if something is linked to something that's actually being used, or whether it's part of an orphan cycle.

    A language that synthesizes imperative and functional programming must be grounded in an environment that has this kind of memory management. This article is, therefore, a precursor to the developments that are the subject matter of the second part of the treatise.

    This is the description, written c. 1995, of a method, which combines the Martinez mark-scan method with the Brownbridge method involving strong and weak pointers. The method is one amongst many cut from the same cloth that solve the orphan cycle problem by assigning and maintaining an acyclic subgraph amongst the set of linked data.

    Contents
    1. Eager Reference Count Garbage Collection
    2. Lazy Reference Count Garbage Collection (with memoizing)
    3. GC References

  • 3.1. Proceedings
  • 3.2. Articles
  • 3.3. Technical Reports