Outline

Radon overview / introduction of bg Nomenclatura Distinzione chiara architettura

Meccanismo interrupt

Abstraction layer

Divisione moduli servizi librerie

TODO

  • Background Radon
  • Background
  • Related Works

Background

What is it? A model and platform

  • Model for execution of portable distributed applications
  • Platform that implements the model with a specification

We model the architecture, deployment, scheduling, state, operating system and interactions We propose an execution model (cooperative interrupts etc) We propose a ā€œguidelineā€ for reuse and modularity of applications and components

We show how this model and platform we can successfully implement complex applications We show how this approach makes such applications portable We show how this approach simplifies development We discuss the abstraction level for reusable modules and show how we can provide high and low level reusability (but mid-level is difficult) We discuss how this model allows for seamless technological evolution and update of applications through updates or reconfigurations in the runtime We show how this approach provides strong performance when compared to reference and alternatives (JVM?)


  • Architecture
    • Distributed application decomposition
      • Structure of a distributed application
      • Multiple threads
      • Multiple processes
      • Multiple machines
      • Multiple data-centers
    • Structure of application
      • Application
        • Tasks
      • High-Level
        • Modules
      • Low-Level
        • Libraries
        • Runtime
    • Mapping threads to Tasks (atoms?)
    • Structure of radon
  • Architectural model
    • Runtime v Atoms
      • System code v User code
  • Interface
    • Deployment
      • Assignment
    • Scheduling
      • Config
      • Daemon vs Dynamic
      • Policies
        • Static
        • On-Demand
        • On-Demand-Expire
      • Recovery policies
    • Atoms
      • Language
        • Rust (main)
        • C / C++
        • Go
        • Typescript, .NET, Lua, Zig
        • Python, Java, Elixir, … (unstable)
      • Compilation
      • Execution model
        • Cooperative single thread
      • Runtime interface
        • Syscalls
      • Reusable modules
        • Services
        • Extensions
    • Ingress
      • Rest
        • Dispatching (scheduling)
  • Technology
    • Runtime
      • Rust
      • Wasm
      • Cranelift
      • LLVM
    • Communication
      • TCP
      • Axum
      • Tokio
    • Serialization
      • Bincode
    • Storage
      • sled
      • rocksdb
    • Messaging
      • Dashmap
      • flume
  • Component implementation
    • Naming
    • Deployment
      • SSH-based
      • TAR bundle

Definitions

Runtime developers: write the runtime Module developers: write reusable modules, either services or extensions Application developers: write atoms, use runtime functions, can use modules

Runtime: executable platform that can load and execute atoms, provides implementation of runtime functions Atom: application task, single threaded, unit of scheduling Engine/WASM Engine: component that can execute atoms compiled to WASM Module: reusable software package that provides functionalities to be used by application developers to write their atoms. Service: module providing functionality as an atom or collection of atoms, provided by module developers Extension: module providing functionality as a library that atoms can use directly in their code, relies on runtime function (or possibly services) to provide advanced functionality

Runtime function: syscall Exported function: atom function that the runtime can call through FFI Runtime specification: set of runtime function signatures and their execution contracts. Interface agreement between runtime and atoms Atom instance: Physical execution unit of an atom, may have multiple instances for an atom Daemon atom: atom executed on instantiation, known name Reactive atom: atom executed in response to events, generated name


Overview

We propose a model for writing and executing distributed applications. In this model, applications are written as a set of dynamic, communicating entities, which we name ā€˜atoms’, that are executed on a static runtime.

In this division, the runtime provides efficient implementations of commonly used functions, which we call ā€˜runtime functions’, and atoms, can make use of these functions to implement their application-specific logic.

The execution of real-world applications happen by first executing one or more interconnected runtimes, then loading the application logic that is defined within atoms. Atoms are designed to be portable and programmable in different programming languages, encouraging modularity, reuse and ease of integration.

The interface between the runtime and atoms can become a contract to be implemented in different runtimes, with different technologies and tradeoffs. This allows for the same application to be seamlessly executed on a different technological stack with different tradeoffs, without development cost on the application side.

To do this, we must define a set of operations that characterize most distributed applications and that are sufficient to encode the complex interaction patterns of real world applications.

This approach takes inspiration from Operating Systems, where executables can rely on the system calls provided by the OS to enact their logic. We expand this idea and adapt it to the scenario of distributed applications, we include higher level abstractions, such as messaging and naming.

We propose a platform that implements a version of this interface and show how it enables the realization of complex application patterns, and it does so while simplifying development and providing excellent performance when compared to a handcrafted approach. (Note: the interface we present could be extensible or extended in future work)

Architecture

To act our plan, we first need to understand the architectural characteristics of distributed applications.

A distributed application is composed by a set of connected entities that need to exchange information and coordinate in some form to complete a task.

We map this to a set of atoms running on multiple interconnected runtimes. Runtimes are executed based on the available computing resources, and may run on different architectures. They are optimized for the target architecture and may include implementation choices that benefit the specific hardware target. On the other hand, atoms are target agnostic, they only encode the domain logic, and they only need a single version that will run on all supported targets.

Runtimes can be deployed in relation to the changes in available computing resources, instead atoms are dynamically deployed in relation to application demands. (Note: our current proposal focuses on single-tenant applications, future work could add isolation features to support multi-tenancy)

The physical connections are mediated by the runtimes and may be enacted using configurable technologies, this is transparent to atoms, which only need to reason on a message based abstraction. It is also a runtime concern to keep track of existing atoms and route messages to the correct destination.

Interface design

Designing and modeling the interface We need to define a boundary for what can be done by an atom and what should be done by the runtime.

The first distinction is the reliance on network or disk As atoms are meant to be portable and must run seamlessly independently on the platform architecture, all input-output operations must be mediated by the runtime, as different platforms will have different ways to perform IO.

This decision does not only come from a technological limitation, but it is a deliberate decision, to provide a common abstraction over the most critical parts of a distributed application, communication.

This logic also applies to the other functionalities we decided to delegate to the runtime. We develop an interface that is higher level than a normal operating system, containing the most important foundational operations required to develop a distributed application.

We decide to include an embedded Key-Value-based data store in the runtime interface. This provides a way to manage persistent state, and can be utilized as the base layer to develop more advanced database abstractions (see lit. TiKV w/ Rocksdb)

The division aims to split the effort of domain experts and systems experts. Domain experts worry about the logic without needing to worry about system concerns, while systems experts can perform optimizations that will benefit the application logic without modifying it.

As another effect of defining this interface, we facilitate the development of reusable modules that interact seamlessly with any atom due to the common interface abstraction layer. This is another goal of our proposed model: as the runtime function interface cannot realistically aim to give all of the tools that a distributed application needs, we propose a system of reusable modules that can be developed to distribute higher level reusable components.

Reusable modules can be written either as services or as runtime extensions. Services are composed of one or more atoms that the application developer’s atoms can interact with using messages. Extensions, instead, are libraries that application developers can import and use to develop their atoms, they provide functions that atoms can use directly from code and that use the runtime provided primitives internally to expose extended functionalities.

Modularity and Reuse

Evolving specification Component mix and match Possibility for runtime config and swap to tune the applications Possibility of different runtime cooperation Heterogeneous deployment(?)

Application programming interface

In this section we describe the interface that application developers can use to write their application, including the programming interface for atoms, their execution model and the associated scheduling primitives.

Atoms can be developed in any language that supports compilation to WebAssembly. Languages that have first-class support for WASM compilation are Rust, C, C++ and Go, with many other languages providing varying levels of production-readiness, including Typescript, .NET, Lua, Zig, Python, Java, Elixir, …

Atoms and the runtime interact with a set of exported and imported function. Atoms must export an invoke function that implements the main logic of the component. At the same time, they can import the runtime functions to be used to define the logic of the invoke function.

(Note: atoms also need to export an alloc and a dealloc function that the runtime can use to send data to the WASM main memory, but this can be included in a redistributable library for each supported language)

In summary, a developer must write their application in the language they prefer, export the invoke function and compile the project as a WASM module. The module can be loaded by any runtime to execute the application.

Execution model

Embedded threads and execution

Atom execution model Interrupts Futures

Scheduling

We give ways to define how each atom should be scheduled and executed. First of all, atoms are divided in two fundamental categories: daemon atoms and reactive atoms. Daemon atoms start their execution immediately after they are created, and they have a static, known name. Reactive atoms will execute when they are invoked by another entity and have a name generated at runtime.

To understand how atoms are scheduled, we must define the concept of instantiation. We define an Atom Instance as the physical execution unit for an atom. An atom may be instantiated one or more times. Instances of the same atom share the code, but execute as distinct entities, with separate memory spaces, that may execute concurrently.

We define scheduling policies that regulate the way in which atoms are instantiated within a runtime. The first and simplest of scheduling policies is the one policy: atoms with this policy will only be instantiated once. This is the standard way to instance a daemon atom. Then we have policies that make multiple instances of an atom: the round-robin policy instantiates a fixed amount of atoms, and distributes invocations in round robin order. The on-demand creates a new instance for each invocation. Lastly, the on-demand-expire creates a new instance only if none is available, with instances being reused until an expiration criteria is met (time or number of number of invocations).

Scheduling policies alter the way that invocations and messages are distributed among instances.

Note: do we want to allow different scheduling policies for daemons? When considering daemon atoms, using a one scheduling policy will ensure that every message will be handled by the same instance. It is also possible to create multiple instances for a daemon atom using different scheduling policies, however in this case they will share the same name and the associated message inbox. This means messages will be processed by any one of the active instances.

In case of reactive daemons, policies with fixed instance count (one and round-robin), will signal an error on invocation if all instances are already in use, while the on-demand policies will simply create new instances to match the demand.

(Note: we are missing the deployment explanation, to decide where to put it)

Deployment

Needs some work

Paired with the scheduling strategies, we provide a candidate deployment system. Starting from a set of WASM modules and a configuration file, it can bootstrap a distributed deployment of runtimes with a set of corresponding atoms.

Our current implementation allows to define the set of hosts in the deployment with some configuration parameters and associated tags. Then, we define a set templates for atoms to be deployed on the runtimes. Other than specifying the parameters for the scheduling of the atom (scheduling policy, WASM module, daemon or reactive, …), the template describes on which runtimes the atoms should be deployed and if the definition should be replicated. The name in a template can make use of special tokens that are substituted on deployment, this can be used to assign a different name to an atom based on the host it is located on or an incremental identifier for multiple copies with the same name.

Since our name resolution supports pattern matching through regular expressions, we can use the template structure to match the atoms we want.

Runtime functions

In the current version of the runtime specification, we include runtime functions for the following categories of operations:

  • Messaging
  • Naming
  • Local persistent storage
  • OS primitives (time, sleep, …)
  • Runtime interaction (spawning atoms, …)

The messaging functions include one-to-one and one-to-many communication. From the application’s point of view, there is no difference between messaging an atom that is executed on the same runtime, or one that is in a remote location, the actual implementation may instead adopt optimized strategies to make use of this information.

Messages can be sent either without ordering guarantees, or with FIFO ordering with respect to the pair sender-receiver. The send operations are non-blocking, if the destination is a valid atom, the message will be queued and sent as soon as possible. On the other hand, receiving a message is more complex as it must integrate with the interrupt concept.

When an atom issues a receive message operation, its execution may be temporarily diverted to handle pending interrupts or futures. We use the possibly blocking nature of the receive operation to inject the processing of asynchronous handlers (interrupts and futures).

(Note: could be interesting to add pubsub)

We choose a hierarchical naming structure with three levels. We assign a flat name to every running runtime, and we assign a base name to each atom at deployment time. Then we define a name for each running atom as a composition of the runtime name on which it is deployed, its base name and finally an optional discriminator. The discriminator is added to reactive atoms to uniquely identify each atom instance, daemons instead have no discriminator and are retrievable using only their base name. If the runtime name is left unspecified, the response will be any atom that matches the atom name, independently from its physical location.

Other than resolving names implicitly when sending messages, we provide a way to query the naming system to discover services. In the current implementation we choose to allow for regex-based queries to find all of the atoms that match a specified expression.

Concerning storage, we include runtime functions to interact with a fast embedded data store. This is the main way in which an atom can persist information to durable storage. We choose a Key-Value abstraction as it is versatile and can be used to implement higher level storage abstractions (see lit. tikv).

Given that atoms don’t have direct access to the operating system’s functionality, we need a set of runtime functions to implement them. Consequently, we expose functions to query the system real-time clock, secure random number generation and other basic OS features.

Finally, in order to support dynamic reconfiguration and complex application patterns, we expose a set of runtime functions to interact directly with the runtime. These functions include the ability to spawn other atoms and invoke a co-located atom.

Recovery

As every application may fail due to unexpected circumstances, we add a way to handle atom faults and possibly perform recovery operations. Each atom can specify a recovery policy that will determine the way in which the runtime will handle an unrecoverable exception. By default, the None policy will signal the exception and stop the atom, reporting the error to the runtime, this is the simplest case as the rest of the atoms will keep running as if no fault had happened. However, this may not be the correct way to deal with a fault as it may compromise the application’s correctness. Alternatively, a developer can choose the escalate (Nota: forse servirebbe una escalate che tira giù anche roba sugli altri runtimes) policy which will immediately stop the runtime and all atoms currently running on it, this is appropriate for an unrecoverable fault that compromises the correctness of the whole application. On the other hand, if the fault is recoverable, the Restart and Recover policies will attempt to continue the execution by restarting the failed atom. The restart policy will create a new instance of the failed atom and run it with the same name. In addition to this, the recover policy expects the definition of an exported ā€˜recover’ function in the atom definition, this function will be called before restarting the execution of an atom, in this function, developers can code any operation required to fix the state of the application before resuming with the regular atom logic.

Ingress

In order to interact with the external world, a runtime can include one or more options for ingress of data or events. The specific structure of an ingress implementation does not need to follow a specific structure, as long as it interacts with the atoms using the invoke interface (and if needed loading data in the embedded data store).

In the current implementation we chose to integrate an HTTP ingress that maps each request to a reactive atom using the HTTP path to determine the name of the atom, and passing the body of the request as argument to the invocation. The HTTP endpoint will then respond with the result of the invocation when it is complete.

TODO

Runtime implementation

Runtime core technology Wasm engine Atom compilation Atom execution Threading Daemons and shit FFI details Passing data Compilation steps

Runtime function technology and implementation Messaging Mbox Interrupts and Futures implementation Remote communication Network stack Communication strategy, pools multiplexing and serialization Local storage KV Naming Deployment REST integration (?)

Writing Atoms

Strategies, Interrupt based programming Futures patterns

Model validation: writing distributed apps

asd

Programming evaluation

Lines of code Cost of deployment Number of Technologies required The cost of portability on deployment and development The cost of programming language compatibility The cost of heterogeneous integration of components (Da investigare: wasm compiled libs!) Futureproofing through runtime updates of old applications