# Dynamic Membership for Regular Languages

Antoine Amarilli

Louis Jachiet

JNMI 2022

### Organization of the talk

• Context
• Classifying regular languages
• Relying on semigroups theory
• Goal
• Present the dynamic word problems
• Present the overall methodology on the way

### Some definitions

• A regular language is a language that can be described by a simple expression
• e.g. $$(ab)^* = \{ ab, abab, ababab, \cdots \}$$
• and computed by a simple machine

### Setting

• A word of fixed length $$n>0$$
• … over a fixed size alphabet $$\Sigma=\{a,b,c,\ldots\}$$
• but the word is dynamic

### Goal

• Check that the dynamic word satisfies some constraints.
• Constraints are defined by languages.

### Example

Checking membership to the regular language $$(ab)^*$$.

### Modelization of the problem

• Complexity of maintaining the information:

• Fix a language $$L$$
• Evaluate the cost of checking if the dynamic word is in $$L$$ at each change
• The cost is with respect to the dynamic word size $$n$$
• The word is initialized at some value
• we perform a linear preprocessing
• We use the RAM model with $$O(\log n)$$ register size

• It is the dynamic word problem for regular languages.

### A first (inefficient) algorithm

• Given a regular language $$L$$ – e.g. $$(ab)^*$$

• Execute at each update a deterministic automaton:

• Incremental Complexity: $$O(n)$$

### Transition functions

Given an automata, each word can be associated to a function from states to states

$$\tau_{\epsilon}\colon\begin{cases}0\mapsto 0\\ 1\mapsto 1\\ 2\mapsto 2\end{cases}$$

$$\tau_{a}\colon\begin{cases}0\mapsto 0\\ 1\mapsto 2\\ 2\mapsto 0\end{cases}$$

$$\tau_{b}\colon\begin{cases}0\mapsto 0\\ 1\mapsto 0\\ 2\mapsto 1\end{cases}$$

$$\tau_{ab}\colon\begin{cases} 0\mapsto 0\\ 1\mapsto 1\\ 2\mapsto 0\end{cases}$$

$$\tau_{ba}\colon\begin{cases} 0\mapsto 0\\ 1\mapsto 0\\ 2\mapsto 2\end{cases}$$

$$\tau_{bb}\colon\begin{cases} 0\mapsto 0\\ 1\mapsto 0\\ 2\mapsto 0\end{cases}$$

The set of all transitions functions is the transition monoid of the automaton.

### A more efficient strategy

• It works for all regular languages
• It provides $$O(\log n)$$ complexity
• Actually we can slightly improve it to $$O(\frac{\log n}{\log \log n})$$

### prefix-XOR problem

• $$(ab^*a+b)^*c(a+b)^* = \textrm{Parity}\cdot c (a+b)^*$$

• an incremental complexity in $$\Omega(\frac{\log n}{\log\log n})$$

• The lower bound is from Fredman and Saks, 1989 in the cell-probe model.

### Yet another implementation

• A word belongs to $$(ab)^*$$ if and only if:
• all its $$a$$’s are in even positions
• all its $$b$$’s are in odd positions
• We get a $$O(1)$$ algorithm:
• maintains counter of the number of $$a$$’s and $$b$$’s in even and odd positions

### To summarize

• The dynamic word problem for regular languages:
• All regular languages are in $$O(\frac{\log n}{\log\log n})$$
• Some are in $$O(1)$$ – e.g. $$(ab)^*$$
• Some are in $$O(\log\log n)$$ – e.g. $$(ac^*b+c)^*$$
• Some are in $$\Omega(\frac{\log n}{\log\log n})$$ – e.g. $$(ab^*a+b)^*c\Sigma^*$$

### The algebraic method

• Aim to understand properties on regular languages.

• Syntactic properties:

• Is it definable in some logics under some syntactic restrictions? – e.g. $$\text{first-order logic}$$
• Computability properties:

• Is it computable with this very nice restriction/modification of the automata model? – e.g. $$\text{2wpo-automata}$$
• Complexity properties:

• Does the language belong to some complexity class? –e.g. $$\textrm{AC}^0$$

### How it works

• The syntactic monoid is the transition monoid of the minimal automaton.
• The syntactic morphism is the mapping from words to the syntactic monoid
• Under some mild assumptions, checking properties only depends on the syntactic morphism
• We can explore these notions with my buggy online software

### The mild assumptions

• When all the following conditions are met, we have a variety of languages

• $$\cap, \cup$$
• left quotient: $$u^{-1}L = \{v \mid uv \in L\}$$
• right quotient: $$L u^{-1} = \{v \mid vu \in L\}$$
• It is better to have closure under:

• complement
• inverse image of free monoid morphisms.

### Co-example

• $$L = (aa)^* c a^*$$ is in $$O(1)$$
• $$K = (ab^*a + b)^* c (a+b)^*$$ is in $$\Omega(\frac{\log n}{\log \log n})$$
• $$\psi\colon \begin{cases}b\mapsto \epsilon\\ \sigma\mapsto \sigma\text{ otherwise }\end{cases}$$
• $$K = \psi^{-1}(L)$$

### Conclusion

• The algebraic approach will not provide exact results
• We can still relies on it to get an approximate classification

### The monoid simplification

• $$M$$ is a fixed finite monoid
• The dynamic word problem for the monoid $$M$$:
• a dynamic word in $$M^*$$
• we maintain its product in $$M$$

### Language to monoids

• Given a regular language:
• We compute its syntactic monoid
• Execute the best possible algorithm for it
• $$\to$$ The language case reduces in $$O(1)$$ to the monoid case
• We use monoid as an intermediate to find the best possible algorithm

### Frandsen, Miltersen, and Skyum. 1997

• If the monoid is commutative then it is $$O(1)$$. ($$xy=yx$$)

• If the monoid is aperiodic then it is in $$O(\log\log n)$$. ($$x^{\omega+1}=x^\omega$$)

• If the monoid is not in $$\mathbf{SG}$$ it is in $$\Omega(\frac{\log{n}}{\log\log n})$$. ($$x^{\omega+1}yx^\omega \neq x^\omega y x^{\omega+1}$$)

• They left open the full characterization and its extensions to regular languages

### What I will present now

• Found missing $$O(1)$$-cases
• Up to some conjecture, we have them all.
• Found all regular languages in $$O(\log\log n)$$, closing the gap of Frandsen et al.
• Extend it to regular languages, lifting the monoid simplification.

### Nilpotent computation – an example

• $$L=c^* a c^* a c^* b c^*$$

• A $$O(1)$$ algorithm:

• Incremental maintenance of the set of all positions with non-$$c$$ letters
• If this set has cardinality $$3$$ check membership to the language.
• Extends to any finite set of words separated by neutral letters

### The class $$\mathbf{ZG}$$

• $$\mathbf{ZG}$$ monoids deal with a mix of commutative and nilpotent computation.
• They are defined by the equation: ($$x^{\omega + 1}y = y x^{\omega + 1}$$)
• The dynamic word problem for $$\mathbf{ZG}$$ is in $$O(1)$$.

### Do we have more monoids in $$O(1)$$?

• It is a colored priority queue:

• An associative array from $$\mathbb{N} \to E$$, with $$E$$ finite.
• $$\text{Add}(k, e)$$ and $$\text{Remove}(k)$$ operations, $$k\in \mathbb{N}, e\in E$$
• $$\text{MinColor}$$ returns the color associated to the min key.
• A classical priority queue can implement a colored priority queue

• A non-constant lower bound for priority queue is a long-standing open problem.

• A direct application of the equations of $$\mathbf{ZG}$$.

• If a monoid is not in $$\mathbf{ZG}$$ then: $$O(1)$$ reduction colored priority queue

### The BAD: $$\mathbf{SG}$$

• Predecessor data structure

• a set of values $$v\in \mathbb{N}$$
• $$\text{Add}(v)$$, $$\text{Remove}(v)$$
• $$\text{Predecessor}(v)$$ returns the maximal value smaller than $$v$$
• Implemented in $$O(\log\log n)$$ with van Emde Boas trees

• We reduce in $$O(1)$$ for monoids to $$\textbf{SG}$$ to this data structure.

• The proof is technical and based on Green’s relations

• The main technical contribution.

### Summary: the monoid case

• Given a monoid:
• The Good: If it is $$\mathbf{ZG}$$ then $$O(1)$$
• The Bad: Otherwise if it is in $$\mathbf{SG}$$ then $$O(\log\log n)$$ and probably not in $$O(1)$$
• The Ugly: Otherwise in $$\Theta(\frac{\log n}{\log\log n})$$

### Does it always get the appropriate algorithm?

• Recall that we use monoid as an intermediate to find good algorithm for languages

• The following languages have the same syntactic monoids:

• $$L = (aa)^* c a^*$$ is in $$O(1)$$
• $$K = (ab^*a + b)^* c (a+b)^*$$ is in $$\Omega(\frac{\log n}{\log \log n})$$
• Not always!

### The semigroup case

• Recall that a semigroup is like a monoid but possibly without a neutral element.
• All monoids are semigroups but the converse is not true
• Given a regular languages $$L$$
• Its syntactic semigroup is the transition functions of nonempty words
• The dynamic word problem for semigroup:
• We fix a finite semigroup $$S$$
• We have a dynamic word in $$S^+$$
• we maintain the total product.

### From languages to semigroup

• given a regular language:

• we compute its syntactic monoid semigroup
• execute the best possible algorithm for it
• $$\to$$ the language case reduces in $$O(1)$$ to the monoid semigroup case
• Algorithm for monoids give algorithm for semigroups:

• by adding identity to non-monoid semigroup: $$S\mapsto S^1$$
• In the $$O(\log\log n)$$, $$S^1$$ and $$S$$ are $$O(1)$$-equivalent to each other.

• In the $$O(1)$$, it is not the case!

### Co-example

• Take the semigroup $$S=\{(1, 1), (1, 2), (2, 1), (2, 2), 0\}$$ with:
• $$(i, j)\cdot (k, p)=\begin{cases}(i, p)\text{ if }j=k\\0\text{ otherwise}\end{cases}$$
• e.g:
• $$(1, 2)\cdot (1, 2) = 0$$
• $$(1, 2)\cdot(2, 1)\cdot(1, 1) = (1, 1)$$
• The monoid $$S^1$$ is not is $$\textbf{ZG}$$ but we can find a $$O(1)$$ algorithm for $$S$$

### Co-example

• Take the semigroup $$S=\{(1, 1), (1, 2), (2, 1), (2, 2), 0\}$$ with:

• $$(i, j)\cdot (k, p) = \begin{cases}(i, p)\text{ if }j=k\\0\text{ otherwise}\end{cases}$$
• The product $$(i_0, j_0)\cdot (i_1, j_1)\cdots (i_n, j_n)$$

• is $$0$$ if $$j_i\neq i_{i+1}$$ for some $$i$$
• is equal to $$(i_0, j_n)$$ otherwise
• We can count the sliding windows of size $$2$$ and conclude

• It doesn’t work for $$S^1$$ because of the neutral element.

### What are $$O(1)$$ semigroups then?

• It is a well studied operation called the wreath product by $$\mathbf{D}$$:
• e.g. In logical fragments, this operation matches with adding successor relations.
• Intuitively, we execute the semigroups over sliding windows of letters
• It maps varieties of monoids to varieties of semigroups
• Our complexity classes are closed under this operation
• The resulting class is equivalent to a classe denoted by $$\mathbf{LZG}$$: $$(exe)^{\omega + 1}ye = ey(exe)^{\omega+ 1}$$ with $$e^2=e$$
• Result from a companion paper by Amarilli and Paperman: The locality of $$\mathbf{ZG}$$

### Summary: the semigroup case

• Given a monoid:
• The Good: If it is $$\mathbf{LZG}$$ then $$O(1)$$
• The Bad: Otherwise if it is in $$\mathbf{SG}$$ then $$O(\log\log n)$$ and probably not in $$O(1)$$
• The Ugly: Otherwise in $$\Theta(\frac{\log n}{\log\log n})$$

### Co-example

• $$(aa)^* b (aa)^*$$ semigroup is Ugly (not in $$\mathbf{SG}$$).
• $$O(1)$$ algorithm:
• count the $$b$$ at even positions and odd positions
• accept if there are one $$b$$ at an even position.

### Modulo position informations

• We can always enriche semigroups with modulo information
• It is a studied operation (lucky!)
• It related to the stable semigroup of the syntactic morphism
• We enter the twilight zone of semigroup theory
• An operator denoted by $$\mathbf{Q}$$ (for quasi)
• Here, it is enough to conclude!

### End of the story

• Given a regular language:
• The Good: If it is in $$\mathbf{QLZG}$$ then its dynamic word problem is in $$O(1)$$
• The Bad: Otherwise if it is in $$\mathbf{QSG}$$ and its dynamic word problem is in $$O(\log\log n)$$ and probably not in $$O(1)$$.
• The Ugly: Otherwise it is in $$\Theta(\frac{\log n}{\log\log n})$$
• All works well because $$\mathbf{Q}$$ and $$\mathbf{L}$$ operator behave well.

### Conclusion

• Semigroup theory tooling has been surprisingly efficient

• We managed to classify the precise complexity of algorithmic problems parametrized by regular languages

• We have linked algebraic analysis to data-structure complexity

• It is reminiscent of what happens in CSP.

• Is there an algebraic parametrized complexity theory waiting to be discovered?

### Open problems

• Is there more hierarchy within $$O(\log \log n)$$?
• Can we extend the framework and the tooling to insertion/deletion (probably yes)
• Can we generalize to non-linear structures?
• How to prove non-constant lower bounds for priority queues in the RAM model?