Data Generation From Visual Pipelines

A free and open-source library to generate and validate datasets with full trasparency.

Why DataGenFlow

Transform your ideas into workflows using custom blocks and validate data with ease.

Easy to Extend

Add custom blocks in minutes with auto-discovery. Drop your file in user_blocks/ and it's automatically available-no configuration needed.

Faster Development

Visual pipeline builder eliminates boilerplate code. Connect blocks and they automatically share data through accumulated state.

Simple to Use

Intuitive drag-and-drop interface, no training required. Build complex data generation workflows without writing orchestration code.

Full Transparency

Complete execution traces for debugging. See exactly how each result was generated with full visibility into every pipeline step.

How it Works

Starting from a seed file, build or customize a pipeline to generate the desired data for your use case.

1. Define Seed

Start with text content that your pipeline will process.

seed.json
{      "repetitions": 3,        "metadata": {          "content": "Electric cars reduce emissions but require charging infrastructure."        }      }

2. Build Pipeline

Design your workflow using drag-and-drop blocks. Each block adds data to the accumulated state.

Pipeline Diagram
StructuredGenerator      →        JSONValidatorBlock          ↓                               ↓      generated          +        valid, parsed_json

3. Review Results

Review your results with keyboard shortcuts and configure the view to easily see the needed data.

Review Configuration
Keyboard Shortcuts:A → Accept  |  R → Reject  |  U → PendingE → Edit    |  N → Next    |  P → PreviousField Configuration:Primary:   [parsed_json, valid]Secondary: [metadata.content]Hidden:    [... all other fields]

4. Export Data

Export your data in JSONL format, filtered by status (accepted, rejected, pending).

dataset.json
{  "id": 71,  "metadata": {    "content": "Electric cars reduce emissions..."  },  "status": "accepted",  "accumulated_state": {    "generated": {      "title": "Electric Vehicles",      "description": "Analysis of EVs..."    },    "valid": true,    "parsed_json": {...}  },  "created_at": "2025-10-25T10:30:00",  "updated_at": "2025-10-25T10:31:15"}

Get Started in Under 2 Minutes

You can start now, locally or using Docker with just a few commands:

run.sh
make setupmake devmake run# Open http://localhost:8000# Setup the .env for more advanced usage!

That's it! No complex configuration required. Free and open source.

Found a bug or have an idea? We welcome contributions from the community!