Proposition Stage/PhD/Master thesis
Compilation of a DSL based on transducers to SIMD optimized programs
Efficient Data Processing
Streaming data processing is a crucial approach that focuses on traversing data to extract pertinent information. Applications ranges from network packet manipulation to analysing DNA. Modern data-processing tools heavily depend on efficient implementations that harness hardware acceleration to achieve high performance. This acceleration can sometimes be achieved through automatic compilation, but frequently demands expert developers to craft optimizations by hand.
One critical facet of this optimization process involves SIMD optimization, where data is packed into chunks and processed with minimal branching in the code, often using bitvector operations. These optimizations are at the core of numerous well-known software applications, such as regular expression matching in tools like ripgrep, JSON parsing in libraries like SimdJSON, and even fundamental operations like string encoding and decoding (unicode parsing). Developing these optimizations requires a broad skill set and is a testament to the expertise of programmers worldwide.
Exploring a Restricted Programming Language in this Internship
During this internship, we will explore the design and implementation of a specialized programming language for stream processing and its compilation to efficient SIMD code. The technics will take inspiration of real software design (such as rsonpath) and will be based on abstract automata theory and logic approach. Initially we will focus on a limited expressivity class, named LTL, whose theoretical properties are well understood.
Following this initial exploration, depending on the interest of the student, we will focus either on: - Extending our design to other language constructs that admit efficient vectorised implementation - Formally study the link between our language and finite state transducers and their expressivity - Output real-world efficient code in existing instruction sets.
Tasks
- Play with handcrafted efficient SIMD code
- Design a small subset of the language and a toy compilation scheme
- Benchmarks the resulting code on concrete example
- Eventually: Study algebraically the class of function designed
- Eventually: Expand the language with more operations
Location
This internship will be co-advise by Gabriel Radanne (INRIA Lyon) and Charles Paperman (Université de Lille) and can be located either in Lille or in Lyon or in both depending on the choice of the student.
Candidate profile
The candidate should ideally be familiar with formal approaches in programming language design, notably type systems, semantics, and logic. From the practical point of view, a basic experience in software programming and usage of collaborative tools such that git. This internship strongly relies on the fact that practical implementation should have strong theoretical foundations and that further refinements of the theory should get inspiration from the practical side. We expect the candidate to agree with this philosophy
Bibliography
- Parsing Gigabytes of JSON per Second
- Supporting Descendants in SIMD-Accelerated JSONPath
- Validating UTF-8 in less than one instruction per byte
- Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs
- Vectorial languages and linear temporal logic.
- An algebraic approach to vectorial programs
Compiled the: ven. 09 févr. 2024 10:12:27 CET