EVL – ETL Tool

Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Introduction
Release Notes
- Version 1.0
- Version 1.1
- Version 1.2
- Version 1.3
- Version 2.0
- Version 2.1
- Version 2.2
- Version 2.3
- Version 2.4
- Version 2.5
- Version 2.6
Installation and Settings
- Linux – RPM
- Linux – DEB
- Other Unix systems
- Settings
  - Compiler
  - Project
- Text Editor
  - Vim
EVL Overview
- EVL Jobs
- EVL Workflows
- Scheduling
Main EVL Command
- Usage
- Examples
- Options
- Environment
- evl project
- evl run
- evl workflow
EVD and Data Types
- EVD Structure
- EVD Options
- Default Values
- Compound Types
- String
- Integral Types
- Decimal
  - Declaration in mapping
  - Manipulation, comparison
- Float and Double
- Date and Time
Components Common
- Common Options
Basic Components
- Assign
- Cat
- Cmd
- Component
- Cut
- Departition
- Echo
- Filter
- Gather
- Generate
- Head
- Lookup
- Merge
- Partition
- Sort
- Sortgroup
- Tail
- Tee
- Trash
- Uniq
- Validate
- Watcher
Mapping Components
- Aggreg
- Join
- Map
Read Components
- Read
- Readevd
- Readjson
- Readkafka
- Readmysql
- Readora
- Readparquet
- Readpg
- Readqvd
- Readtd
- Readxls
- Readxlsx
- Readxml
Run SQL Components
- Runmysql
- Runora
- Runpg
Write Components
- Write
- Writeevd
- Writejson
- Writekafka
- Writeora
- Writeparquet
- Writepg
- Writeqvd
- Writeqvx
- Writetd
- Writexlsx
- Writexml
- Writemysql
Commands
- Cancel
- Cp
- Chmod
- Crontab
- End
- Fr
- Log
- Ls
- Mail
- Manager
- Mkdir
- Mv
- Rm
- Set
- Skip
- Sleep
- Spark
- Status
- Test
- Wait
EVM Mappings
- Output Functions
- String Functions
- Checksum Functions
- IP Addresses Functions
  - IPv4 Functions
  - IPv6 Functions
- Randomization Functions
- Anonymization Functions
Joins and Lookups
- Lookup tables
  - Declaration and load
  - Methods
Utils
- csv2evd
- csv2qvd
- json2evd
- qvd2csv
- qvd2evd
- evl_increment_run_id
- qvd-header
EVM Functions Index
EVD Data Types Index
Variables Index
General Index

EVL Overview

ETL (Extract–Transform–Load) system

ETL processing usually consists of three main parts

ETL itself (ETL jobs) – to process data
Orchestration (ETL workflows) – to manage ETL jobs, handle job consequences, await file delivery, provide information about processing via e-mail or SNMP traps, etc.
Scheduling – to fire ETL workflows at give time in a given day

Quite often is Orchestration and Scheduling named together as Scheduler, but let’s distinguish these two parts of ETL system to follow Unix Philosophy: “do one thing, and do it well”.

DAG = Directed Acyclic Graph

Either ETL jobs or ETL workflows consists of one or more oriented acyclic graphs, with the following meaning:

	jobs	workflows
vertices	data modifying components	jobs, other workflows
edges	data flows	successor

The main difference in approach, between jobs and workflows, is:

When ETL job fails, whole must be restarted.
When ETL workflow fails, can be either restarted from the beginning or continue from last failure(s).

So an ETL workflow like this:

might be restarted from the red job. Green (i.e. successful ones will be skipped.

EVL parts

Considering above theory, EVL splits ETL system into three main entities:

All three entities are supposed to be tracked by Git or any other version control system.

EVL – ETL Tool

Table of Contents

EVL Overview

ETL (Extract–Transform–Load) system

DAG = Directed Acyclic Graph

EVL parts