Incremental parsing and streaming
Nomadic Labs
Since Big Data has become a common use-case in the industry, and as the amount of
information transferred on the web tends to increase in general, sending large data
over the network is a frequent problem. Transmitting big payloads over HTTP con-
nections, for instance, is in general a poor solution since it monopolizes bandwidth
(causing a lack of responsiveness for real-time applications), and raises memory is-
sues. Cornerstone of the Tezos blockchain, Tezos nodes are using web services that
rely on a streaming solution to fulfill this need.
However, when it comes to processing, streams add a certain complexity on the re-
ceiving end, as it requires to be able to consume and process data incrementally as
the data is received in order to be efficient. For this purpose, incremental parsers are
tools able to perform a chunk-by-chunk syntax analysis. They are able to partially
parse incomplete documents, waiting for the rest of the input to be provided in or-
der to fulfill the decoding. In contrast, a non-incremental parser has to repeatedly
re-parse stream prefixes until it successfully parses a complete document, which is
much less efficient.
Internships goals
The goal of this internship is to improve Tezos node’s deserialization layer to enable
incremental parsing features. Building upon existing work, the intern will design an
incremental parser and exploit it to optimize stream processing in the Tezos client
component.
Requirements
The successful applicant should have a good knowledge of the OCaml programming
language, be able to work independently and understand academic papers. The pur-
pose of their work will be to propose solutions to the different problems they will
encounter and to implement these solutions.
Sources
Popular implementations
• AttoParsec (Haskell)
• Atto (Scala)
Papers and Articles
• Differentiating parsers
• Parsec
Internship Context
You will work at the Nomadic Labs’ offices in Paris.
Participating in a large scale open-source project you will have to rapidly learn to
use collaborative tools (Git, merge request, issues, gitlab, continuous integration,
documentation) and to communicate about your work. The final results might be
presented at an international conference or workshop.
You will have a designated advisor at Nomadic Labs and will have to work indepen-
dently and to propose thoroughly-considered solutions to the different problems you
will have to solve. You will be encouraged to seek advice from members of the team.
Intellectual Property
All material produced (essays, documentation, code, etc.) will be released under an
open source license (e.g. MIT or CC).