Getting started

To get started with Respresso’s flow, we will walk through the basic concepts by showing some samples and usage.

What is a flow?

A flow is a declarative data flow which is represented by a dependency graph. In this graph every node represents a data sink and/or a data source with an optional transformation capability, edges are data dependencies that work as a pipe.

The resolution of the dependencies and execution of the nodes are fully automatic and may be done in parallel. (It is currently executed serially.) During execution a node can only be executed once -, when all dependencies are resolved - and the result is kept and used to serve the data for other nodes. Although if the execution does not require the execution of a node it will be skipped, this is determined by the dependencies.

The definition of a flow is represented by an XML file designed to be readable and modifiable by hand.

Nodes

In a flow XML you can specify only three types of nodes, but when using them the system will automatically create a bunch of helper nodes if necessary.

Note

Every node must have a unique ID within a flow. The IDs created by you must contain only alphanumeric characters.

Warning

Every node’s port represents data which must be an object. Any other type like an array or a primitive type must be wrapped in an object.

Default nodes

When a flow is instantiated some nodes will be automatically created.

Flow’s input node:
This is a data source node which represents the data passed as the input of the flow.
This can be targeted by @input
Flow’s output node:
This is a data sink and source node which represents the data connected to its input. This should be used to create the output of the flow.
This has unlimited input ports and the data received on these ports are fused into a single object according to the connections targeting this node.
This can be targeted by @output
Inspector node:
In some cases you may want to inspect the data present in the flow. This node can help you to determinate what is going on at runtime.
Any data targeted to this node is piped to the same path used as the source node ID at the output.
This can be targeted by @inspector
Output with inspector node:
When you want to analyze the data in a flow you may want to use the output of this node.
The output node’s output is written to output field and the inspector’s output is written to inspected field.
This can be targeted by @outputWithInspector

Processor

This node represents a transformation node. This must reference an existing processor in our system by its name and version.

It has the following ports:

  • Input:
    This is the data on which the transformation should be applied. Usually this should not contain any parameters that modifies the behaviour of the transformation.
    This port can not be targeted directly, it is tied to a helper node.
  • Config:
    The provided object on this port should be used to modify the behaviour of this transformation. Usually this is static and not changed at runtime, although it is possible.
    This port can not be targeted directly, it is tied to a helper node.
  • Output:
    This represents the result of the transformation. This port represents a data source provided by this processor node.
    It can be targeted by <node_id>/@output

Helper nodes

When declaring a processor node, the system automatically generates the following nodes:

  • Input node:
    This is a data sink and source node which is automatically connected to the processor’s input port.
    This has unlimited input ports and the data received on these ports are fused into a single object according to the connections targeting this node.
    This can be targeted by <node_id>/@input
  • Dynamic config node:
    This is a data sink and source node which is automatically connected to the processors config port.
    This has unlimited input ports and the data received on these ports are fused into a single object according to the connections targeting this node.
    This can be targeted by <node_id>/@config. You may connect some data to this node to change the processors behaviour at runtime.
  • Static config node:
    This is a data source node which is automatically connected to the dynamic config node’s root using mergeType="merge".
    The data represented by this node is the config looked up from configId or the XML tag’s text content.
    This can be targeted by <node_id>/@staticConfig but this should not be used.

Subflow

This node represents the result of another flow’s result. This must reference another flow by its ID and the most accurate version of it is looked up and used. You have the option of setting which node should be executed, the default is the @output node.

Helper input node

When declaring a subflow node, the system automatically generates a helper input node.

This is a data sink and source node which provides the data for the sub flow’s @input node.
This has unlimited input ports and the data received on these ports are fused into a single object according to the connections targeting this node.
This can be targeted by <node_id>/@input

Data node

This node represents a data sink and a data source without any transformation but it has the ability to fuse it’s input to a single object according to it’s connections. You can use this node to create a single input to multiple nodes while this may be created by fusing multiple inputs like a temporary value. On the other hand it can handle a default value which can be declared as a JSON object in the node’s text content. With this you can input static data to the flow where you need it. (For example same config for multiple processors or some input to send to a CI or a custom converter)

Connections

In a flow XML you can specify connections that define dependencies and data pipes between nodes.

Every connection must have a source node called from and a target node called to. In addition you can specify a mapping between the two objects. You may also specify which data you would like to be read from the source and where the system should write the selected data at the target node.

From

This attribute must contain a node ID which exists in this flow. This ID references the node which will provide as a source of the data.

If you do not specify the exact node ID it will use its @output helper node if it exists.
As a result in this attribute <node_id> is an alias for <node_id>/@output.

To

This attribute must contain a node ID which exists in this flow. This ID references the node where the data will be written. This node must be a data sink node!

If you do not specify the exact node ID it will use it’s @input helper node if it exists.
As a result in this attribute <node_id> is an alias for <node_id>/@input.

Read

You have the opportunity to select a subtree of the data read from the source node and pipe only that to the target node. This attribute must contain a valid StructurePath read operation. Defaults to the object’s root.

Write

You have the opportunity to write the piped data to a subtree of the target node’s input object. This attribute must contain a valid StructurePath write operation. Defaults to the object’s root.

Merge type

In some cases you may want to specify how to write the piped data to the target object.

You may choose from the following options:
  • override - This will replace the target subtree regardless of the new value exists or not. This is the default for merge type.

  • merge - This deeply merges the new object with the target object. Any non-object type which exists in both objects will be replaced by the source value.

  • none - This will disable data piping for this connection. It can be used to define dependencies between nodes to determinate the execution order.

Note

The data pipes are executed in order of declaration. That means if you specify two override connections to the same field before the execution, the second pipe’s data will be present in the input object.

Structure of a flow XML

The flow is represented as an XML. You may want to use our XSD to get help during the editing and validating the XML: https://app.respresso.io/public/schema/flow.xsd

The flow XML’s root tag must be a flow. In this you can define nodes and connections tags.

Processor

You can define multiple Processor nodes inside the nodes, using processor tag.
This tag can have an optional text body but it must be a JSON. This value will be parsed and served in <node_is>/@staticConfig node which is directly piped to <node_is>/@config
You can also reference a config using configId attribute. When present this will be used against the text content.

Sub flow

You can define multiple Subflow nodes inside the nodes, using sub-flow tag.

Connection

You can define multiple Connections inside the connections, using connection tag.

Sample

<flow xmlns="https://app.respresso.io/public/schema/flow.xsd">
    <nodes>
        <processor id="android" name="AndroidLocalizationExporter" version="1"/>
        <processor id="custom" name="HttpProcessor" version="1"/>
        <processor id="webhook" name="WebhookProcessor" version="1"/>
    </nodes>
    <connections>
        <connection from="@input" to="android"/>
        <connection from="@input" to="custom"/>
        <connection from="android" to="webhook" mergeType="none"/>
        <connection from="custom" to="webhook" mergeType="none"/>
        <connection from="android" to="@output" mergeType="merge"/>
        <connection from="custom" to="@output" mergeType="merge"/>
        <connection from="webhook" to="@output" mergeType="none"/>
    </connections>
</flow>

In the example above (and assuming that the @output node is executed) the following happens:

  1. Flow’s input will be written to android and custom node’s input.

  2. android and custom node will both start executing.

  3. When android and custom nodes are both executed, the webhook node will start executing without any data piping.

  4. When webhook is executed, the result of android and custom will be merged into an object at the @output node. This will be the result of the flow.

We visualized this:

Flow visualization.

Flow visualization. (#No. means the execution order in case of sequential execution.)

So why will this happen in a high level view?

  • We need the result of @output so we need to resolve all of its dependencies: webhook/@output, custom/@output, android/@output

  • webhook/@output can not be executed because it has unresolved dependencies. But android/@output and custom/@output can be executed because it has all dependencies resolved.

  • In a parallel execution environment android/@output and custom/@output can be executed in parallel but for simplicity’s sake we order them in a random sequence.

  • When both nodes are executed successfully, we have resolved all dependencies of webhook/@output so we can execute it.

  • After successful execution of webhook/@output all dependencies of @output are resolved, so the execution can stop as the result is produced.

But what happens on the edges and under the hood?

You may want to know how this works. Below you can find an explanation, but feel free to skip this section if you do not plan on using the flow’s advanced features. For better understanding we have visualized the full graph:

Detailed flow visualization.

Detailed flow visualization.

As you can see here are all the helper nodes mentioned above, but don’t panic you have nothing to do with them.

Here is how you should interpret the symbols:

  • Ovals without numbering are non executed nodes representing static content initialized at flow initialization. (Except @output because it has a data fusion feature)

  • Ovals with numbering are data fusion nodes which fuses multiple inputs into a single object according to their connections. Numbering means the execution order of that node.

  • Rectangles are custom logic implementations (aka Processors).

  • Edges represent the data flow. In this case every connection selects the source’s root object and writes to the target’s root object.

  • Edge types:

    • Solid: Override target path

    • Dashed: Merge selected value to target path

    • Dotted: No data is passed. Defines only dependencies between nodes.

Every dependency is executed (resolved) when their every dependency is already resolved (output value is already computed). When multiple nodes can be executed at the same time it will choose a random order.

node_id/@config, node_id/@input and @output nodes will fuse the every dependencies selection from their output to its output according to the connection attributes. These nodes are also executed only once just like every other node.