Robert Važan

Transparent reactive programming

Reactive programming is a confusing concept, because everyone understands it differently. It covers so wide range of programming paradigms and architectures it is almost meaningless. That's why it is important to clarify which reactive programming does one have in mind. Here I will try to clearly define one variety of reactive programming, which is usually called transparent reactive programming.

This topic is of particular interest to me, because I am developing Hookless, a transparent reactive programming library for Java. As part of its development, I have explored various reactive programming concepts and approaches. Here's what I found.

A long, long time ago, people were mostly writing batch programs. Interactive UIs, especially GUIs, however necessitated a very different way to structure programs, which came to be known as event-driven programming. Event-driven program mostly consists of event handlers, each executing actions in response to some event. It was the first paradigm that was somewhat reactive. Event-driven programming however creates complexity and bugs. As the program gets bigger and its data structures more complicated and redundant, event handlers get longer and more complicated as they try to update all the data structures that are affected by the event. The complexity will inevitably become a hiding place for endless data and UI synchronization bugs.

The basic idea of reactive programming was to model the application as a flow of information from normalized core data structures through computed redundant data structures all the way to formatted data shown in the UI. You could still implement it using events, but now all the events were data change events. With reactive programming, redundant data is not updated by user action events directly but rather by subsequent data change events. Even UI surface can be seen as a redundant data structure derived from underlying model data. The various parts of the program together form dependency graph, which is a distinguishing feature of reactive programming.

There is a related concept of dataflow programming, which also organizes computation around dependency graph. To imagine dataflow or flow-oriented program, think of hardware circuits or merge sort on large streams. Some people see reactive programming as a subset of dataflow programming, but the two concepts overlap a lot and difference is more in focus and applications. Dataflow programming focuses on throughput and parallelism and it finds applications in hardware and clusters, whereas reactive programming focuses on responsiveness and it is most commonly applied to user interfaces.

MVVM (model-view-viewmodel) pattern structures programs as a dependency graph, but it is not necessarily reactive. Normalized core data structures are the model, computed intermediate data structures are the viewmodel, and Ui surface is the view. Since MVVM program is already a dependency graph, MVVM programs are a good fit for reactive programming. Data binding is usually just a specialized reactive connection between view and viewmodel properties in MVVM. Reactive programming can be however used to describe more than just UIs. And even if we consider only UIs, there are many more ways to structure reactive user interfaces than MVVM.

When you hear someone talking about high-performance reactive programming, they are most likely talking about stream-oriented reactive programming, in which every node in the dependency graph is a stream of values arriving over time. RxJava and similar Rx libraries for other languages are an example of stream-oriented reactive programming. The advantage of streams is that they allow you to perform neat transformations, especially time-based rules like delay or rate-limiting, and they let you reliably process all values that pass through the stream unless you have explicitly filtered them out.

Streams however add a lot of boilerplate to the program if all you care about is the latest value in the stream. This is what sync-oriented or state-oriented reactive programming is about. It exposes only the latest value at every node in the dependency graph. Time-based rules are still possible, but they don't come as naturally. And sync-oriented reactive libraries often mercilessly throw away unread values if the reader is not fast enough, which means you can no longer reliably process every value ever produced. Sync-oriented reactive programming is however very practical in user interfaces where latest value is usually what the user wants to see.

Reactive programming can either use push change propagation, in which changes are blindly pushed from producers to subscribed consumers, or push-pull propagation, which means consumers pull messages on their own schedule upon lightweight notification from producers. Some push systems use backpressure, which is essentially a roundabout way to do push-pull change propagation. Sync-oriented reactive programming does not need backpressure, because it can just discard older values, which ensures memory consumption is always bounded. It nevertheless benefits from push-pull, because processing every change may still tax CPU.

To sum up what we discussed so far, user interface programming generally benefits from sync-oriented reactive programming with push-pull change propagation. That's the overall architecture, but how do we go about implementing it without writing too much code?

This is where functional reactive programming (FRP) comes in. In functional reactive programming, dependency graph is constructed using predefined functional operators (e.g. map, filter, ...) that take one or more reactive input nodes and return new reactive output node. Operators can be additionally parameterized with pure functions, which implement application logic. Building programs in this way is quite restrictive, but the advantage is that reactivity is fully encapsulated in the operators and dependency graph is implicitly constructed as the operators are applied. Functional reactive programming can be used with both stream-oriented and sync-oriented reactive programming, but it is more common in stream-oriented reactive programming.

Functional reactive programming, besides being severely restrictive in how the program is structured, usually also encourages construction of static dependency graph, which is constructed once and then left to run as is. Applications, especially user interfaces, however often switch between views (windows, forms, pages, tabs) and even change view structure as the user selects different options. Such applications naturally have dynamic dependency graph that may change after every user action.

Probably the most famous reactive user interface is the spreadsheet. Spreadsheet cells form dependency graphs. Computed cells have reactive dependency on all cells used in their expression. When one cell changes, all dependent cells are immediately updated. With what we learned so far, we can easily classify spreadsheet as sync-oriented reactive application. What people often don't know is that cell positions used in expressions can be themselves computed. This means that spreadsheets have dynamic dependency graph that may change whenever any data in the spreadsheet changes. We can also tell that spreadsheet does not use functional reactive programming. Cell expressions are not pure functions. They read other cells and they can even choose which cells to read.

What spreadsheet implements is generally called transparent reactive programming. Dependencies are discovered automatically by observing what data is accessed. Dependency graph is constructed when the code runs for the first time and it is rebuilt during every subsequent run. Transparent reactive programming therefore naturally supports dynamic dependency graph. Nodes in the dependency graph are defined as ordinary functions that read values of other nodes and return value of the current node. These reads only care about the latest value, so transparent reactive programming is sync-oriented reactive programming.

Transparent reactive programming is therefore ideal for user interfaces. No wonder that all the popular reactive JavaScript frameworks use it: Vue.js, React, Knockout.js, Svetle, Meteor. Libraries supporting transparent reactive programming are also available in other languages. Off the top of my head: Assisticant in NET and Streamlit in Python. Java, which is of particular interest to me, only has Quasar dataflow as far as I know and that one is too simplistic and requires bytecode instrumentation. That's why I am developing Hookless, a transparent reactive programming library for Java.

One of the easiest ways to implement transparent reactivity is to just perform full refresh after every change. Videogames do this when they just rerender the whole screen every time they need to update it. It's simple and many apps don't need more. Full refresh is an extreme end on the scale of granularity of reactive programming. On the other end of the scale, there are experimental programming languages that make every variable reactive. Reactive programming however introduces overhead, both computational and in code complexity, which makes variable-level reactivity impractical. Applications have to be designed for some reasonable granularity, usually refreshing one small document (think JSON file) at a time.

So to sum it up, here's what characterizes transparent reactive programming (TRP):

Probably the simplest way to describe transparent reactive programming is to liken it to spreadsheets. Like in spreadsheets, if you change one value, all values computed from it are updated automatically. Except that transparent reactive programming lets you use arbitrary code to compute values.