New project: croupier
Intro to Dataflow Programming
This post is about explaining a new project, called Croupier, which is a library for dataflow programming.
What is that? It's a programming paradigm where you don't specify the sequence in which your code will execute.
Instead, you create a number of "tasks", declare how the data flows from one task to another, provide the initial data and then the system runs as many or as few of the tasks as needed, in whatever order it deems better.
Examples
Put that way it looks scary and complex but it's something so simple almost every programmer has ran into a tool based on this principle:
make
When you create a Makefile
, you declare a number of "targets", "dependencies" and "commands" (among other things) and then when you run make a_target
it's make
who decides which of those commands need to run, how and when.
Let's consider a more complex example: a static site generator.
Usually, these take a collection of markdown files with metadata such as title, date, tags, etc, and use that to produce a collection of HTML and other files that constitute a website.
Now, let's consider it from the POV of dataflow programming with a simplified version that only takes markdown files as inputs and builds a "blog" out of them.
For each post in a file foo.md
there will be a /foo.html
.
But if that file has tags tag1
and tag2
, then the contents of that file will affect the output files /tags/tag1.html
and /tags/tag2.html
And if one of those tags is new, then it will affect tags/index.html
And if the post itself is new, then it will be in /index.html
And also in a RSS feed. And the RSS feeds for the tags!
As you can see, adding or modifying a file can trigger a cascade of changes in the site.
Which you can model as dataflow.
That's the approach used by Nikola, a static site generator I wrote. Because it's implemented as dataflow, it can build only what's needed, which in most cases is just a tiny fragment of the whole site.
That is done via doit an awesome tool more people should know about, because a lot more people should know about dataflow programming itself.
So, what is Croupier?
It's a library for dataflow programming in the Crystal language I am writing!
Here's an example of it in use, from the docs, which should be self-explanatory if you have a passing knowledge of Crystal or Ruby:
require "croupier"
b1 = ->{
puts "task1 running"
File.read("input.txt").downcase
}
Croupier::Task.new(
name: "task1",
output: "fileA",
inputs: ["input.txt"],
proc: b1
)
b2 = ->{
puts "task2 running"
File.read("fileA").upcase
}
Croupier::Task.new(
name: "task2",
output: "fileB",
inputs: ["fileA"],
proc: b2
)
Croupier::Task.run_tasks
Why?
Because I want to write a fast SSG in Crystal, and because dataflow programming is (to me) a fundamental tool in my toolkit.
Anything else?
I will probably also do a simple make-like just as a playground for Croupier.