Turbocharge Your .NET Applications performance with DataFlow Parallel Processing

Exploring the transformation from a serial ETL and notification solution, relying on multithreading and pointers, to a parallelized and highly optimized system using the .NET Dataflow library.

By Rodrigo Bergara, Basilio Farach, Nahuel Perales, SOUTHWORKS


In the realm of data processing, optimization opportunities often lie untapped. This blog post explores the transformation from a serial ETL and notification solution, relying on multithreading and pointers, to a parallelized and highly optimized system using the .NET Dataflow library.

We showcase a case study where the performance of an application processing large datasets is significantly improved by leveraging the Dataflow Parallel library.


Data processing is a recurring solution in the industry and sometimes a missed opportunity for optimization improvement on some projects. Using .NET Dataflow Parallel library solutions that involve code optimization with parallelism can be handled easily in a way that makes sense from the very beginning.

We will present a scenario in which we enhance our application's performance and streamline the processing of extensive data, around 80 million records, through the utilization of .NET Dataflow for code parallelization.

Serial Solution

The initial solution involves a single-threaded Load process that sequentially handles data grouping, uploading batches to storage, and finalizing the remaining data. Although parallelization could enhance this process, it introduces challenges such as data transfer, synchronization, and object locking.

In this particular situation, there exists a Load process that receives data following the Transformation process. The responsibilities of this Load class include:

  • Grouping data into batches
  • Uploading batches to storage
  • Compiling the remaining data into a final batch

Introduction to Dataflow

.NET Dataflow Library is a practical toolkit for handling parallel tasks without making things overly complicated, managing the technicalities of parallelization so that you can focus on other development tasks.

Imagine it as a pipeline model that streamlines the heavy lifting, allowing you to concentrate on the essential logic. What's neat? Not only it takes care of parallelization but also ensures smooth communication between different blocks. You get to decide how your data moves and buffers.

Plus, it's proficient in managing threads and optimizing your computer’s capabilities for better efficiency. Whether you prefer a straightforward pipeline or want to experiment with networked graphs, Dataflow is a reliable companion that simplifies the process.

The Key aspects of the Dataflow model

  • In-process message passing sync/async communication
  • Explicit control over data buffering and flow over the blocks
  • Buffering of data and work scheduling when available
  • Manage threads to increase throughput

When to use the Dataflow model:

  • Coarse-grained dataflows where substantial or well-defined work is needed.
  • For data processing scenarios, both linear pipelines and networked graphs.

How it works

Dataflow blocks core behavior is to buffer and process data in different ways.

  • Source blocks are the source of data. You Read from them.
  • Target blocks are the receivers of data. You Write to them.
  • Propagator blocks act as both source and receiver.

To create a Dataflow pipeline, you need to complete the following steps:

  1. Block Creation: Generate the necessary blocks by either utilizing predefined ones or crafting custom blocks tailored to your requirements.
  2. Pipeline Linking: Connect the blocks within the pipeline using the "LinkTo" method, forming a structured flow for data processing.
  3. Completion Handling: Implement mechanisms to handle pipeline completion, ensuring that the entire process is gracefully managed and finalized.
  4. Data Posting: Introduce data into the pipeline by posting it through the appropriate channels, initiating the flow of information and subsequent processing

Dataflow Integration

The Dataflow model relies on well-defined decoupled tasks. The blog post details the configuration of blocks, introducing a Custom Block (Discriminator) composed of ActionBlock, BroadcastBlock, and BatchBlocks for efficient data grouping.

Prior to engaging with Dataflow, it is crucial to recognize that the underlying model relies on well-defined, decoupled tasks. This implies that the tasks within the system are carefully defined and operate independently, encouraging modularity and flexibility in the parallelized processing. Understanding this foundational aspect is key to harnessing the full potential of Dataflow, as it allows for effective coordination and execution of diverse tasks within the parallelized workflow.


Blocks can be configured to adapt to specific requirements. In this particular instance, the process involves specifying various parameters. Specifically, we provide information on the capacity, cancellation token, and maximum degree of parallelism for both the Transformation and Action Blocks. Additionally, for the Batch blocks, the configuration encompasses setting the maximum number of groups, this configuration approach ensures that each block operates in alignment with the desired parameters, enhancing the adaptability and efficiency of the overall Dataflow system.

Define blocks

The solution with the Dataflow library includes a CustomBlock (Discriminator) that receives the data from the Transformation phase and groups each piece of data received to continue processing.

The custom block is composed of an ActionBlock that functions as a router to store data correctly, a broadcast block to share copies of the Batches, and a Dictionary of BatchBlocks to organize the data.

Initialization: CustomBatchBock is set up to group incoming data into batches based on Batch and ID, with a specified batch size.

Block Setup: It uses a dictionary to discriminate batches by key, an action block to process and route data, and a broadcast block to share multiple batches.

Processing: Incoming values are processed by the Action Block, ensuring each ID is processed only once, and then sent to the appropriate batch.

Completion: When processing is complete, it triggers the completion of individual batches, ensuring all data is processed and broadcasted.

Inside CustomBatchBlock the BatchBlock is used to group data by IDs.

  • Used to combine input data into batches for propagation.
  • Specify the size of each batch.
  • Indicate no more data is arriving to form last batch or force last batch by code.

Batch Block was used to group data and process by batches.

We use the Broadcast Block to share copies of the blocks.

  • Efficiently duplicates input data to multiple linked targets.
  • It accepts messages and broadcasts them to all linked targets, making it useful for scenarios where the same data needs to be processed or consumed by multiple components.
  • Minimize redundancy by duplicating messages without re-executing source logic for each target.
  • Ideal for scenarios requiring shared data among multiple consumers.
  • Use when there is no need for separate processing logic for each target, reducing computational overhead.

The TranformBlock data structure is designed to take in a single input block, which is buffered. A delegated process manages the block, and afterward, a return task generates a buffered response. The process is not completed until the return task finishes the work.

  • Used process data blocks and produce a single one.
  • Can buffer data internally until delegates are available.
  • Can buffer output data.

Transform Block was used to manipulate data

This unique structure, TransformMany, possesses the distinctive feature of accepting a single block and yielding none to many results. Within this block, two tasks unfold. The initial task takes charge of processing the input block, generating a collection of resulting blocks. Then, the second task extracts items from this collection one by one, delivering them to the external buffer. These tasks can be delegated and employed in both synchronous and asynchronous manners.

  • Used process data blocks and produce none to many.
  • Can buffer data internally until delegates are available.
  • Can buffer output data.
  • The parallelism degree is 1 by default, adjust accordingly.

Transform Many Block was used to manipulate data.

The Action Block used is used as the last step of the process when a batch is loaded to send a notification at the end of the pipeline.

  • Used to perform an action whenever data is available.
  • Can buffer data internally until delegates are available.
  • The parallelism degree is 1 by default, adjust accordingly.

Linking the blocks to create the pipeline is really easy.

A visual representation of the pipeline with the used blocks.

To start the processing data is sent to _batch a BatchBlock that will buffer data initially.

To end the processing a last batch is received and used to trigger the last buffered data and complete the blocks.


With the help of .NET Dataflow's features, we were able to divide the tasks in the process and execute them simultaneously. This approach of parallelization enabled us to use the full potential of the available hardware resources, leading to a significant decrease in the overall execution time. The benefits of parallelization are noticeable in situations where the workload is inherently parallelizable. Tasks that previously were carried out sequentially can now be processed concurrently, resulting in a considerable increase in throughput. This improvement not only enhances the user experience by providing faster results but also enables the handling of larger datasets or more simultaneous requests. Additionally, the simplicity of expressing parallelism through .NET Dataflow made the codebase more maintainable and readable. The framework's high-level abstractions simplified the implementation of parallel processing, allowing the development team to concentrate on individual tasks' logic rather than the intricate details of managing concurrency.