EVL – ETL Tool


Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Copyright © 2017–2021 EVL Tool, s.r.o.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Table of Contents

EVL Overview

ETL (Extract–Transform–Load) system

ETL processing usually consists of three main parts

  • ETL itself (ETL jobs) – to process data
  • Orchestration (ETL workflows) – to manage ETL jobs, handle job consequences, await file delivery, provide information about processing via e-mail or SNMP traps, etc.
  • Scheduling – to fire ETL workflows at give time in a given day

Quite often is Orchestration and Scheduling named together as Scheduler, but let’s distinguish these two parts of ETL system to follow Unix Philosophy: “do one thing, and do it well”.

DAG = Directed Acyclic Graph

../images/DAG

Either ETL jobs or ETL workflows consists of one or more oriented acyclic graphs, with the following meaning:

jobsworkflows
verticesdata modifying componentsjobs, other workflows
edgesdata flowssuccessor

The main difference in approach, between jobs and workflows, is:

  • When ETL job fails, whole must be restarted.
  • When ETL workflow fails, can be either restarted from the beginning or continue from last failure(s).

So an ETL workflow like this:

../images/DAG-workflow

might be restarted from the red job. Green (i.e.  successful ones will be skipped.

EVL parts

Considering above theory, EVL splits ETL system into three main entities:

../images/EVL_overview

All three entities are supposed to be tracked by Git or any other version control system.