Getting started
To get started with Respresso’s flow, we will walk through the basic concepts by showing some samples and usage.
What is a flow?
A flow is a declarative data flow which is represented by a dependency graph. In this graph every node represents a data sink and/or a data source with an optional transformation capability, edges are data dependencies that work as a pipe.
The resolution of the dependencies and execution of the nodes are fully automatic and may be done in parallel. (It is currently executed serially.) During execution a node can only be executed once -, when all dependencies are resolved - and the result is kept and used to serve the data for other nodes. Although if the execution does not require the execution of a node it will be skipped, this is determined by the dependencies.
The definition of a flow is represented by an XML file designed to be readable and modifiable by hand.
Nodes
In a flow XML you can specify only three types of nodes, but when using them the system will automatically create a bunch of helper nodes if necessary.
Note
Every node must have a unique ID within a flow. The IDs created by you must contain only alphanumeric characters.
Warning
Every node’s port represents data which must be an object. Any other type like an array or a primitive type must be wrapped in an object.
Default nodes
When a flow is instantiated some nodes will be automatically created.
- Flow’s input node:
- This is a data source node which represents the data passed as the input of the flow.This can be targeted by
@input
- Flow’s output node:
- This is a data sink and source node which represents the data connected to its input. This should be used to create the output of the flow.This has unlimited input ports and the data received on these ports are fused into a single object according to the connections targeting this node.This can be targeted by
@output
- Inspector node:
- In some cases you may want to inspect the data present in the flow. This node can help you to determinate what is going on at runtime.Any data targeted to this node is piped to the same path used as the source node ID at the output.This can be targeted by
@inspector
- Output with inspector node:
- When you want to analyze the data in a flow you may want to use the output of this node.The output node’s output is written to
output
field and the inspector’s output is written toinspected
field.This can be targeted by@outputWithInspector
Processor
This node represents a transformation node. This must reference an existing processor in our system by its name and version.
It has the following ports:
- Input:
This is the data on which the transformation should be applied. Usually this should not contain any parameters that modifies the behaviour of the transformation.This port can not be targeted directly, it is tied to a helper node.
- Config:
The provided object on this port should be used to modify the behaviour of this transformation. Usually this is static and not changed at runtime, although it is possible.This port can not be targeted directly, it is tied to a helper node.
- Output:
This represents the result of the transformation. This port represents a data source provided by this processor node.It can be targeted by<node_id>/@output
Helper nodes
When declaring a processor node, the system automatically generates the following nodes:
- Input node:
This is a data sink and source node which is automatically connected to the processor’s input port.This has unlimited input ports and the data received on these ports are fused into a single object according to the connections targeting this node.This can be targeted by<node_id>/@input
- Dynamic config node:
This is a data sink and source node which is automatically connected to the processors config port.This has unlimited input ports and the data received on these ports are fused into a single object according to the connections targeting this node.This can be targeted by<node_id>/@config
. You may connect some data to this node to change the processors behaviour at runtime.
- Static config node:
This is a data source node which is automatically connected to the dynamic config node’s root usingmergeType="merge"
.The data represented by this node is the config looked up fromconfigId
or the XML tag’s text content.This can be targeted by<node_id>/@staticConfig
but this should not be used.
Subflow
This node represents the result of another flow’s result.
This must reference another flow by its ID and the most accurate version of it is looked up and used.
You have the option of setting which node should be executed, the default is the @output
node.
Helper input node
When declaring a subflow node, the system automatically generates a helper input node.
@input
node.<node_id>/@input
Data node
This node represents a data sink and a data source without any transformation but it has the ability to fuse it’s input to a single object according to it’s connections. You can use this node to create a single input to multiple nodes while this may be created by fusing multiple inputs like a temporary value. On the other hand it can handle a default value which can be declared as a JSON object in the node’s text content. With this you can input static data to the flow where you need it. (For example same config for multiple processors or some input to send to a CI or a custom converter)
Connections
In a flow XML you can specify connections that define dependencies and data pipes between nodes.
Every connection must have a source node called from
and a target node called to
.
In addition you can specify a mapping between the two objects.
You may also specify which data you would like to be read
from the source and where the system should write
the selected data at the target node.
From
This attribute must contain a node ID which exists in this flow. This ID references the node which will provide as a source of the data.
@output
helper node if it exists.<node_id>
is an alias for <node_id>/@output
.To
This attribute must contain a node ID which exists in this flow. This ID references the node where the data will be written. This node must be a data sink node!
@input
helper node if it exists.<node_id>
is an alias for <node_id>/@input
.Read
You have the opportunity to select a subtree of the data read from the source node and pipe only that to the target node. This attribute must contain a valid StructurePath read operation. Defaults to the object’s root.
Write
You have the opportunity to write the piped data to a subtree of the target node’s input object. This attribute must contain a valid StructurePath write operation. Defaults to the object’s root.
Merge type
In some cases you may want to specify how to write the piped data to the target object.
- You may choose from the following options:
override
- This will replace the target subtree regardless of the new value exists or not. This is the default for merge type.merge
- This deeply merges the new object with the target object. Any non-object type which exists in both objects will be replaced by the source value.none
- This will disable data piping for this connection. It can be used to define dependencies between nodes to determinate the execution order.
Note
The data pipes are executed in order of declaration.
That means if you specify two override
connections to the same field before the execution, the second pipe’s data will be present in the input object.
Structure of a flow XML
The flow is represented as an XML. You may want to use our XSD to get help during the editing and validating the XML: https://app.respresso.io/public/schema/flow.xsd
The flow XML’s root tag must be a flow
. In this you can define nodes
and connections
tags.
Processor
configId
attribute. When present this will be used against the text content.Sub flow
You can define multiple Subflow nodes inside the nodes
, using sub-flow
tag.
Connection
You can define multiple Connections inside the connections
, using connection
tag.
Sample
<flow xmlns="https://app.respresso.io/public/schema/flow.xsd">
<nodes>
<processor id="android" name="AndroidLocalizationExporter" version="1"/>
<processor id="custom" name="HttpProcessor" version="1"/>
<processor id="webhook" name="WebhookProcessor" version="1"/>
</nodes>
<connections>
<connection from="@input" to="android"/>
<connection from="@input" to="custom"/>
<connection from="android" to="webhook" mergeType="none"/>
<connection from="custom" to="webhook" mergeType="none"/>
<connection from="android" to="@output" mergeType="merge"/>
<connection from="custom" to="@output" mergeType="merge"/>
<connection from="webhook" to="@output" mergeType="none"/>
</connections>
</flow>
In the example above (and assuming that the @output
node is executed) the following happens:
Flow’s input will be written to
android
andcustom
node’s input.
android
andcustom
node will both start executing.When
android
andcustom
nodes are both executed, thewebhook
node will start executing without any data piping.When
webhook
is executed, the result ofandroid
andcustom
will be merged into an object at the@output
node. This will be the result of the flow.
We visualized this:
Flow visualization. (#No. means the execution order in case of sequential execution.)
So why will this happen in a high level view?
We need the result of
@output
so we need to resolve all of its dependencies:webhook/@output
,custom/@output
,android/@output
webhook/@output
can not be executed because it has unresolved dependencies. Butandroid/@output
andcustom/@output
can be executed because it has all dependencies resolved.In a parallel execution environment
android/@output
andcustom/@output
can be executed in parallel but for simplicity’s sake we order them in a random sequence.When both nodes are executed successfully, we have resolved all dependencies of
webhook/@output
so we can execute it.After successful execution of
webhook/@output
all dependencies of@output
are resolved, so the execution can stop as the result is produced.
But what happens on the edges and under the hood?
You may want to know how this works. Below you can find an explanation, but feel free to skip this section if you do not plan on using the flow’s advanced features. For better understanding we have visualized the full graph:
Detailed flow visualization.
As you can see here are all the helper nodes mentioned above, but don’t panic you have nothing to do with them.
Here is how you should interpret the symbols:
Ovals without numbering are non executed nodes representing static content initialized at flow initialization. (Except
@output
because it has a data fusion feature)Ovals with numbering are data fusion nodes which fuses multiple inputs into a single object according to their connections. Numbering means the execution order of that node.
Rectangles are custom logic implementations (aka Processors).
Edges represent the data flow. In this case every connection selects the source’s root object and writes to the target’s root object.
Edge types:
Solid: Override target path
Dashed: Merge selected value to target path
Dotted: No data is passed. Defines only dependencies between nodes.
Every dependency is executed (resolved) when their every dependency is already resolved (output value is already computed). When multiple nodes can be executed at the same time it will choose a random order.
node_id/@config
, node_id/@input
and @output
nodes will fuse the every dependencies selection from their output to its output according to the connection attributes. These nodes are also executed only once just like every other node.