Antoine Amarilli

Louis Jachiet

Charles Paperman

JNMI 2022

- Context
- Classifying
*regular languages* - Relying on
*semigroups*theory

- Classifying
- Goal
- Present the
*dynamic word problems* - Present the overall methodology on the way

- Present the

- A
*regular language*is a language that can be described by a*simple expression* - e.g. \((ab)^* = \{ ab, abab, ababab, \cdots \}\)
- and computed by a simple machine

- A word of fixed length \(n>0\) …
- … over a fixed size alphabet \(\Sigma=\{a,b,c,\ldots\}\)
- but the word is
*dynamic*

- Check that the dynamic word satisfies
*some*constraints. - Constraints are defined by
*languages*.

Checking membership to the regular language \((ab)^*\).

Complexity of

*maintaining*the information:- Fix a language \(L\)
- Evaluate the cost of checking if the dynamic word is in \(L\) at
*each change* - The cost is with respect to the
*dynamic word*size \(n\) - The word is initialized at some value
- we perform a linear preprocessing

We use the

*RAM model*with \(O(\log n)\) register size**It is the***dynamic word problem*for*regular languages*.

Given a regular language \(L\) – e.g. \((ab)^*\)

Execute at each update a

*deterministic automaton*:Incremental Complexity: \(O(n)\)

Given an automata, each word can be associated to a function from *states* to *states*

\(\tau_{\epsilon}\colon\begin{cases}0\mapsto 0\\ 1\mapsto 1\\ 2\mapsto 2\end{cases}\)

\(\tau_{a}\colon\begin{cases}0\mapsto 0\\ 1\mapsto 2\\ 2\mapsto 0\end{cases}\)

\(\tau_{b}\colon\begin{cases}0\mapsto 0\\ 1\mapsto 0\\ 2\mapsto 1\end{cases}\)

\(\tau_{ab}\colon\begin{cases} 0\mapsto 0\\ 1\mapsto 1\\ 2\mapsto 0\end{cases}\)

\(\tau_{ba}\colon\begin{cases} 0\mapsto 0\\ 1\mapsto 0\\ 2\mapsto 2\end{cases}\)

\(\tau_{bb}\colon\begin{cases} 0\mapsto 0\\ 1\mapsto 0\\ 2\mapsto 0\end{cases}\)

The set of *all* transitions functions is the *transition monoid* of the automaton.

- It works for
*all regular languages* - It provides \(O(\log n)\) complexity
- Actually we can slightly improve it to \(O(\frac{\log n}{\log \log n})\)

\((ab^*a+b)^*c(a+b)^* = \textrm{Parity}\cdot c (a+b)^*\)

an incremental complexity in \(\Omega(\frac{\log n}{\log\log n})\)

The lower bound is from

*Fredman and Saks, 1989*in the cell-probe model.

- A word belongs to \((ab)^*\) if and only if:
- all its \(a\)’s are in
*even*positions - all its \(b\)’s are in
*odd*positions

- all its \(a\)’s are in
- We get a \(O(1)\) algorithm:
- maintains counter of the number of \(a\)’s and \(b\)’s in
*even*and*odd*positions

- maintains counter of the number of \(a\)’s and \(b\)’s in

- The dynamic word problem for regular languages:
- All regular languages are in \(O(\frac{\log n}{\log\log n})\)
- Some are in \(O(1)\) – e.g. \((ab)^*\)
- Some are in \(O(\log\log n)\) – e.g. \((ac^*b+c)^*\)
- Some are in \(\Omega(\frac{\log n}{\log\log n})\) – e.g. \((ab^*a+b)^*c\Sigma^*\)

Aim to understand properties on regular languages.

Syntactic properties:

*Is it definable in some logics under some syntactic restrictions?*– e.g. \(\text{first-order logic}\)

Computability properties:

*Is it computable with this very nice restriction/modification of the automata model?*– e.g. \(\text{2wpo-automata}\)

Complexity properties:

*Does the language belong to some complexity class?*–e.g. \(\textrm{AC}^0\)

- The
*syntactic monoid*is the*transition monoid*of the*minimal automaton*. - The
*syntactic morphism*is the mapping from words to the*syntactic monoid* - Under some
*mild assumptions*, checking properties only depends on the*syntactic morphism* - We can explore these notions with my buggy online software

When all the following conditions are met, we have a

*variety of languages*- \(\cap, \cup\)
- left quotient: \(u^{-1}L = \{v \mid uv \in L\}\)
- right quotient: \(L u^{-1} = \{v \mid vu \in L\}\)

It is better to have closure under:

- complement
- inverse image of free monoid morphisms.

- \(L = (aa)^* c a^*\) is in \(O(1)\)
- \(K = (ab^*a + b)^* c (a+b)^*\) is in \(\Omega(\frac{\log n}{\log \log n})\)
- \(\psi\colon \begin{cases}b\mapsto \epsilon\\ \sigma\mapsto \sigma\text{ otherwise }\end{cases}\)
- \(K = \psi^{-1}(L)\)

- The algebraic approach will not provide exact results
- We can still relies on it to get an
*approximate classification*

- \(M\) is a fixed
*finite*monoid - The dynamic word problem for the monoid \(M\):
- a dynamic word in \(M^*\)
- we maintain its product in \(M\)

- Given a regular language:
- We compute its syntactic monoid
- Execute the best possible algorithm for it
- \(\to\) The language case reduces in \(O(1)\) to the monoid case

- We use monoid as an intermediate to find
*the best possible algorithm*

If the monoid is

*commutative*then it is \(O(1)\). (\(xy=yx\))If the monoid is

*aperiodic*then it is in \(O(\log\log n)\). (\(x^{\omega+1}=x^\omega\))If the monoid is

**not**in \(\mathbf{SG}\) it is in \(\Omega(\frac{\log{n}}{\log\log n})\). (\(x^{\omega+1}yx^\omega \neq x^\omega y x^{\omega+1}\))*They left open the full characterization and its extensions to regular languages*

- Found missing \(O(1)\)-cases
- Up to some conjecture, we have them all.
- Found all regular languages in \(O(\log\log n)\), closing the gap of Frandsen et al.
- Extend it to regular languages, lifting the monoid simplification.

\(L=c^* a c^* a c^* b c^*\)

A \(O(1)\) algorithm:

- Incremental maintenance of the set of all positions with non-\(c\) letters
- If this set has cardinality \(3\) check membership to the language.

Extends to any

*finite set of words separated by neutral letters*

- \(\mathbf{ZG}\) monoids deal with a mix of
*commutative*and*nilpotent*computation. - They are defined by the equation: (\(x^{\omega + 1}y = y x^{\omega + 1}\))
- The dynamic word problem for \(\mathbf{ZG}\) is in \(O(1)\).

It is a

*colored priority queue*:- An associative array from \(\mathbb{N} \to E\), with \(E\) finite.
- \(\text{Add}(k, e)\) and \(\text{Remove}(k)\) operations, \(k\in \mathbb{N}, e\in E\)
- \(\text{MinColor}\) returns the color associated to the min key.

A classical priority queue can implement a

*colored priority queue*A non-constant lower bound for

*priority queue*is a long-standing open problem.A direct application of the equations of \(\mathbf{ZG}\).

If a monoid is not in \(\mathbf{ZG}\) then: \(O(1)\) reduction

*colored priority queue*

*Predecessor data structure*- a set of values \(v\in \mathbb{N}\)
- \(\text{Add}(v)\), \(\text{Remove}(v)\)
- \(\text{Predecessor}(v)\) returns the maximal value smaller than \(v\)

Implemented in \(O(\log\log n)\) with

*van Emde Boas trees*We reduce in \(O(1)\) for monoids to \(\textbf{SG}\) to this data structure.

The proof is technical and based on

*Green’s relations*The main technical contribution.

- Given a monoid:
*The Good:*If it is \(\mathbf{ZG}\) then \(O(1)\)*The Bad:*Otherwise if it is in \(\mathbf{SG}\) then \(O(\log\log n)\) and*probably not in*\(O(1)\)*The Ugly:*Otherwise in \(\Theta(\frac{\log n}{\log\log n})\)

Recall that we use monoid as an intermediate to find

*good algorithm*for languagesThe following languages have the

**same**syntactic monoids:- \(L = (aa)^* c a^*\) is in \(O(1)\)
- \(K = (ab^*a + b)^* c (a+b)^*\) is in \(\Omega(\frac{\log n}{\log \log n})\)

…

*Not always!*

- Recall that
*a semigroup*is like a monoid but possibly without a neutral element. - All monoids are semigroups but the converse is not true
- Given a regular languages \(L\)
- Its
*syntactic semigroup*is the*transition functions*of nonempty words

- Its
- The dynamic word problem for semigroup:
- We fix a finite semigroup \(S\)
- We have a dynamic word in \(S^+\)
- we maintain the total product.

given a regular language:

- we compute its syntactic
~~monoid~~semigroup - execute the best possible algorithm for it
- \(\to\) the language case reduces in \(O(1)\) to the
~~monoid~~semigroup case

- we compute its syntactic
Algorithm for monoids give algorithm for semigroups:

- by adding identity to non-monoid semigroup: \(S\mapsto S^1\)

In the \(O(\log\log n)\), \(S^1\) and \(S\) are \(O(1)\)-equivalent to each other.

In the \(O(1)\), it is

*not*the case!

- Take the semigroup \(S=\{(1, 1), (1, 2), (2, 1), (2, 2), 0\}\) with:
- \((i, j)\cdot (k, p)=\begin{cases}(i, p)\text{ if }j=k\\0\text{ otherwise}\end{cases}\)

- e.g:
- \((1, 2)\cdot (1, 2) = 0\)
- \((1, 2)\cdot(2, 1)\cdot(1, 1) = (1, 1)\)

- The monoid \(S^1\) is
*not*is \(\textbf{ZG}\) but we can find a \(O(1)\) algorithm for \(S\)

Take the semigroup \(S=\{(1, 1), (1, 2), (2, 1), (2, 2), 0\}\) with:

- \((i, j)\cdot (k, p) = \begin{cases}(i, p)\text{ if }j=k\\0\text{ otherwise}\end{cases}\)

The product \((i_0, j_0)\cdot (i_1, j_1)\cdots (i_n, j_n)\)

- is \(0\) if \(j_i\neq i_{i+1}\) for some \(i\)
- is equal to \((i_0, j_n)\) otherwise

We can

*count*the sliding windows of size \(2\) and*conclude*It doesn’t work for \(S^1\) because of the

*neutral element*.

- We can always do the
*sliding windows trick*for free - It is a
*well studied operation*called the*wreath product*by \(\mathbf{D}\):- e.g. In logical fragments, this operation matches with adding
*successor relations*. - Intuitively, we execute the semigroups over
*sliding windows*of letters - It maps varieties of
*monoids*to varieties of*semigroups*

- e.g. In logical fragments, this operation matches with adding
- Our complexity classes are closed under this operation
- The resulting class is equivalent to a classe denoted by \(\mathbf{LZG}\): \((exe)^{\omega + 1}ye = ey(exe)^{\omega+ 1}\) with \(e^2=e\)
- Result from a
*companion paper*by Amarilli and Paperman:*The locality of \(\mathbf{ZG}\)*

- Given a monoid:
*The Good:*If it is \(\mathbf{LZG}\) then \(O(1)\)*The Bad:*Otherwise if it is in \(\mathbf{SG}\) then \(O(\log\log n)\) and*probably not in*\(O(1)\)*The Ugly:*Otherwise in \(\Theta(\frac{\log n}{\log\log n})\)

- \((aa)^* b (aa)^*\) semigroup is
*Ugly*(not in \(\mathbf{SG}\)). - \(O(1)\) algorithm:
- count the \(b\) at even positions and odd positions
- accept if there are one \(b\) at an even position.

- We can always enriche semigroups with
*modulo information*- It is a
*studied operation*(lucky!) - It related to the
*stable semigroup*of the*syntactic morphism* - We enter the
*twilight zone*of*semigroup theory*

- It is a
- An operator denoted by \(\mathbf{Q}\) (for
*quasi*) - Here, it is enough to conclude!

- Given a
*regular language*:*The Good:*If it is in \(\mathbf{QLZG}\) then its dynamic word problem is in \(O(1)\)*The Bad:*Otherwise if it is in \(\mathbf{QSG}\) and its dynamic word problem is in \(O(\log\log n)\) and*probably not in*\(O(1)\).*The Ugly:*Otherwise it is in \(\Theta(\frac{\log n}{\log\log n})\)

- All works well because \(\mathbf{Q}\) and \(\mathbf{L}\) operator behave well.

Semigroup theory tooling has been surprisingly efficient

We managed to classify the precise complexity of algorithmic problems parametrized by regular languages

We have linked algebraic analysis to data-structure complexity

It is reminiscent of what happens in

*CSP*.*Is there an algebraic parametrized complexity theory waiting to be discovered?*

- Is there more hierarchy within \(O(\log \log n)\)?
- Can we extend the framework and the tooling to insertion/deletion (
*probably yes*) - Can we generalize to non-linear structures?
- How to prove non-constant
*lower bounds*for priority queues in the RAM model?

*TUDASTIC*- TUtorials on DAta Structures for Text Indexation and Compression
- 1st Edition
*9-10 of May @ Lille*- stringology, bioinformatics, algorithms and databases
- free registration (call of participation soon on the list)
- Florent Capelli, Camille Marchet, Charles Paperman