Dynamic Membership for Regular Languages

Organization of the talk

Context
- Classifying regular languages
- Relying on semigroups theory
Goal
- Present the dynamic word problems
- Present the overall methodology on the way

Some definitions

A regular language is a language that can be described by a simple expression
e.g. \((ab)^* = \{ ab, abab, ababab, \cdots \}\)
and computed by a simple machine

Setting

A word of fixed length \(n>0\) …
… over a fixed size alphabet \(\Sigma=\{a,b,c,\ldots\}\)
but the word is dynamic

A Dynamic Word

Goal

Check that the dynamic word satisfies some constraints.
Constraints are defined by languages.

Example

Checking membership to the regular language \((ab)^*\).

Modelization of the problem

Complexity of maintaining the information:
- Fix a language \(L\)
- Evaluate the cost of checking if the dynamic word is in \(L\) at each change
- The cost is with respect to the dynamic word size \(n\)
- The word is initialized at some value
- we perform a linear preprocessing
We use the RAM model with \(O(\log n)\) register size
It is the dynamic word problem for regular languages.

A first (inefficient) algorithm

Given a regular language \(L\) – e.g. \((ab)^*\)
Execute at each update a deterministic automaton:
Incremental Complexity: \(O(n)\)

Transition functions

Given an automata, each word can be associated to a function from states to states

\(\tau_{\epsilon}\colon\begin{cases}0\mapsto 0\\ 1\mapsto 1\\ 2\mapsto 2\end{cases}\)

\(\tau_{a}\colon\begin{cases}0\mapsto 0\\ 1\mapsto 2\\ 2\mapsto 0\end{cases}\)

\(\tau_{b}\colon\begin{cases}0\mapsto 0\\ 1\mapsto 0\\ 2\mapsto 1\end{cases}\)

\(\tau_{ab}\colon\begin{cases} 0\mapsto 0\\ 1\mapsto 1\\ 2\mapsto 0\end{cases}\)

\(\tau_{ba}\colon\begin{cases} 0\mapsto 0\\ 1\mapsto 0\\ 2\mapsto 2\end{cases}\)

\(\tau_{bb}\colon\begin{cases} 0\mapsto 0\\ 1\mapsto 0\\ 2\mapsto 0\end{cases}\)

The set of all transitions functions is the transition monoid of the automaton.

A more efficient strategy

It works for all regular languages
It provides \(O(\log n)\) complexity
Actually we can slightly improve it to \(O(\frac{\log n}{\log \log n})\)

Can we do better?

No!

prefix-XOR problem

\((ab^*a+b)^*c(a+b)^* = \textrm{Parity}\cdot c (a+b)^*\)
an incremental complexity in \(\Omega(\frac{\log n}{\log\log n})\)
The lower bound is from Fredman and Saks, 1989 in the cell-probe model.

Yet another implementation

A word belongs to \((ab)^*\) if and only if:
- all its \(a\)’s are in even positions
- all its \(b\)’s are in odd positions
We get a \(O(1)\) algorithm:
- maintains counter of the number of \(a\)’s and \(b\)’s in even and odd positions

To summarize

The dynamic word problem for regular languages:
- All regular languages are in \(O(\frac{\log n}{\log\log n})\)
- Some are in \(O(1)\) – e.g. \((ab)^*\)
- Some are in \(O(\log\log n)\) – e.g. \((ac^*b+c)^*\)
- Some are in \(\Omega(\frac{\log n}{\log\log n})\) – e.g. \((ab^*a+b)^*c\Sigma^*\)

Some languages are good, some are bad, some are ugly

Back to the algebraic approach!

The algebraic method

Aim to understand properties on regular languages.
Syntactic properties:
- Is it definable in some logics under some syntactic restrictions? – e.g. \(\text{first-order logic}\)
Computability properties:
- Is it computable with this very nice restriction/modification of the automata model? – e.g. \(\text{2wpo-automata}\)
Complexity properties:
- Does the language belong to some complexity class? –e.g. \(\textrm{AC}^0\)

How it works

The syntactic monoid is the transition monoid of the minimal automaton.
The syntactic morphism is the mapping from words to the syntactic monoid
Under some mild assumptions, checking properties only depends on the syntactic morphism
We can explore these notions with my buggy online software

The mild assumptions

When all the following conditions are met, we have a variety of languages
- \(\cap, \cup\)
- left quotient: \(u^{-1}L = \{v \mid uv \in L\}\)
- right quotient: \(L u^{-1} = \{v \mid vu \in L\}\)
It is better to have closure under:
- complement
- inverse image of free monoid morphisms.

Co-example

\(L = (aa)^* c a^*\) is in \(O(1)\)
\(K = (ab^*a + b)^* c (a+b)^*\) is in \(\Omega(\frac{\log n}{\log \log n})\)
\(\psi\colon \begin{cases}b\mapsto \epsilon\\ \sigma\mapsto \sigma\text{ otherwise }\end{cases}\)
\(K = \psi^{-1}(L)\)

Conclusion

The algebraic approach will not provide exact results
We can still relies on it to get an approximate classification

The monoid simplification

\(M\) is a fixed finite monoid
The dynamic word problem for the monoid \(M\):
- a dynamic word in \(M^*\)
- we maintain its product in \(M\)

Language to monoids

Given a regular language:
- We compute its syntactic monoid
- Execute the best possible algorithm for it
- \(\to\) The language case reduces in \(O(1)\) to the monoid case
We use monoid as an intermediate to find the best possible algorithm

Frandsen, Miltersen, and Skyum. 1997

If the monoid is commutative then it is \(O(1)\). (\(xy=yx\))
If the monoid is aperiodic then it is in \(O(\log\log n)\). (\(x^{\omega+1}=x^\omega\))
If the monoid is not in \(\mathbf{SG}\) it is in \(\Omega(\frac{\log{n}}{\log\log n})\). (\(x^{\omega+1}yx^\omega \neq x^\omega y x^{\omega+1}\))
They left open the full characterization and its extensions to regular languages

What I will present now

Found missing \(O(1)\)-cases
Up to some conjecture, we have them all.
Found all regular languages in \(O(\log\log n)\), closing the gap of Frandsen et al.
Extend it to regular languages, lifting the monoid simplification.

Nilpotent computation – an example

\(L=c^* a c^* a c^* b c^*\)
A \(O(1)\) algorithm:
- Incremental maintenance of the set of all positions with non-\(c\) letters
- If this set has cardinality \(3\) check membership to the language.
Extends to any finite set of words separated by neutral letters

The class \(\mathbf{ZG}\)

\(\mathbf{ZG}\) monoids deal with a mix of commutative and nilpotent computation.
They are defined by the equation: (\(x^{\omega + 1}y = y x^{\omega + 1}\))
The dynamic word problem for \(\mathbf{ZG}\) is in \(O(1)\).

Do we have more monoids in \(O(1)\)?

It is a colored priority queue:
- An associative array from \(\mathbb{N} \to E\), with \(E\) finite.
- \(\text{Add}(k, e)\) and \(\text{Remove}(k)\) operations, \(k\in \mathbb{N}, e\in E\)
- \(\text{MinColor}\) returns the color associated to the min key.
A classical priority queue can implement a colored priority queue
A non-constant lower bound for priority queue is a long-standing open problem.
A direct application of the equations of \(\mathbf{ZG}\).
If a monoid is not in \(\mathbf{ZG}\) then: \(O(1)\) reduction colored priority queue

The BAD: \(\mathbf{SG}\)

Predecessor data structure
- a set of values \(v\in \mathbb{N}\)
- \(\text{Add}(v)\), \(\text{Remove}(v)\)
- \(\text{Predecessor}(v)\) returns the maximal value smaller than \(v\)
Implemented in \(O(\log\log n)\) with van Emde Boas trees
We reduce in \(O(1)\) for monoids to \(\textbf{SG}\) to this data structure.
The proof is technical and based on Green’s relations
The main technical contribution.

Summary: the monoid case

Given a monoid:
- The Good: If it is \(\mathbf{ZG}\) then \(O(1)\)
- The Bad: Otherwise if it is in \(\mathbf{SG}\) then \(O(\log\log n)\) and probably not in \(O(1)\)
- The Ugly: Otherwise in \(\Theta(\frac{\log n}{\log\log n})\)

Does it always get the appropriate algorithm?

Recall that we use monoid as an intermediate to find good algorithm for languages
The following languages have the same syntactic monoids:
- \(L = (aa)^* c a^*\) is in \(O(1)\)
- \(K = (ab^*a + b)^* c (a+b)^*\) is in \(\Omega(\frac{\log n}{\log \log n})\)
… Not always!

The semigroup case

Recall that a semigroup is like a monoid but possibly without a neutral element.
All monoids are semigroups but the converse is not true
Given a regular languages \(L\)
- Its syntactic semigroup is the transition functions of nonempty words
The dynamic word problem for semigroup:
- We fix a finite semigroup \(S\)
- We have a dynamic word in \(S^+\)
- we maintain the total product.

From languages to semigroup

given a regular language:
- we compute its syntactic ~~monoid~~ semigroup
- execute the best possible algorithm for it
- \(\to\) the language case reduces in \(O(1)\) to the ~~monoid~~ semigroup case
Algorithm for monoids give algorithm for semigroups:
- by adding identity to non-monoid semigroup: \(S\mapsto S^1\)
In the \(O(\log\log n)\), \(S^1\) and \(S\) are \(O(1)\)-equivalent to each other.
In the \(O(1)\), it is not the case!

Co-example

Take the semigroup \(S=\{(1, 1), (1, 2), (2, 1), (2, 2), 0\}\) with:
- \((i, j)\cdot (k, p)=\begin{cases}(i, p)\text{ if }j=k\\0\text{ otherwise}\end{cases}\)
e.g:
- \((1, 2)\cdot (1, 2) = 0\)
- \((1, 2)\cdot(2, 1)\cdot(1, 1) = (1, 1)\)
The monoid \(S^1\) is not is \(\textbf{ZG}\) but we can find a \(O(1)\) algorithm for \(S\)

Co-example

Take the semigroup \(S=\{(1, 1), (1, 2), (2, 1), (2, 2), 0\}\) with:
- \((i, j)\cdot (k, p) = \begin{cases}(i, p)\text{ if }j=k\\0\text{ otherwise}\end{cases}\)
The product \((i_0, j_0)\cdot (i_1, j_1)\cdots (i_n, j_n)\)
- is \(0\) if \(j_i\neq i_{i+1}\) for some \(i\)
- is equal to \((i_0, j_n)\) otherwise
We can count the sliding windows of size \(2\) and conclude
It doesn’t work for \(S^1\) because of the neutral element.

What are \(O(1)\) semigroups then?

We can always do the sliding windows trick for free
It is a well studied operation called the wreath product by \(\mathbf{D}\):
- e.g. In logical fragments, this operation matches with adding successor relations.
- Intuitively, we execute the semigroups over sliding windows of letters
- It maps varieties of monoids to varieties of semigroups
Our complexity classes are closed under this operation
The resulting class is equivalent to a classe denoted by \(\mathbf{LZG}\): \((exe)^{\omega + 1}ye = ey(exe)^{\omega+ 1}\) with \(e^2=e\)
Result from a companion paper by Amarilli and Paperman: The locality of \(\mathbf{ZG}\)

Summary: the semigroup case

Given a monoid:
- The Good: If it is \(\mathbf{LZG}\) then \(O(1)\)
- The Bad: Otherwise if it is in \(\mathbf{SG}\) then \(O(\log\log n)\) and probably not in \(O(1)\)
- The Ugly: Otherwise in \(\Theta(\frac{\log n}{\log\log n})\)

Are we over yet?

Not yet, but almost!

Co-example

\((aa)^* b (aa)^*\) semigroup is Ugly (not in \(\mathbf{SG}\)).
\(O(1)\) algorithm:
- count the \(b\) at even positions and odd positions
- accept if there are one \(b\) at an even position.

Modulo position informations

We can always enriche semigroups with modulo information
- It is a studied operation (lucky!)
- It related to the stable semigroup of the syntactic morphism
- We enter the twilight zone of semigroup theory
An operator denoted by \(\mathbf{Q}\) (for quasi)
Here, it is enough to conclude!

End of the story

Given a regular language:
- The Good: If it is in \(\mathbf{QLZG}\) then its dynamic word problem is in \(O(1)\)
- The Bad: Otherwise if it is in \(\mathbf{QSG}\) and its dynamic word problem is in \(O(\log\log n)\) and probably not in \(O(1)\).
- The Ugly: Otherwise it is in \(\Theta(\frac{\log n}{\log\log n})\)
All works well because \(\mathbf{Q}\) and \(\mathbf{L}\) operator behave well.

Demo

Conclusion

Semigroup theory tooling has been surprisingly efficient
We managed to classify the precise complexity of algorithmic problems parametrized by regular languages
We have linked algebraic analysis to data-structure complexity
It is reminiscent of what happens in CSP.
Is there an algebraic parametrized complexity theory waiting to be discovered?

Open problems

Is there more hierarchy within \(O(\log \log n)\)?
Can we extend the framework and the tooling to insertion/deletion (probably yes)
Can we generalize to non-linear structures?
How to prove non-constant lower bounds for priority queues in the RAM model?

Dynamic Membership for Regular Languages

Organization of the talk

Some definitions

Setting

A Dynamic Word

Goal

Example

Modelization of the problem

A first (inefficient) algorithm

Transition functions

A more efficient strategy

A more efficient strategy

Can we do better?

No!

prefix-XOR problem

Yet another implementation

To summarize

Some languages are good, some are bad, some are ugly

Back to the algebraic approach!

The algebraic method

How it works

The mild assumptions

Co-example

Conclusion

The monoid simplification

Language to monoids

Frandsen, Miltersen, and Skyum. 1997

What I will present now

Nilpotent computation – an example

The class \(\mathbf{ZG}\)

Do we have more monoids in \(O(1)\)?

The BAD: \(\mathbf{SG}\)

Summary: the monoid case

Does it always get the appropriate algorithm?

The semigroup case

From languages to semigroup

Co-example

Co-example

What are \(O(1)\) semigroups then?

Summary: the semigroup case

Are we over yet?

Not yet, but almost!

Co-example

Modulo position informations

End of the story

Demo

Conclusion

Open problems

advertisement