EVL

Table of Contents


Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Copyright © 2017–2020 EVL Tool, s.r.o.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Main EVL Command

All command line EVL functionality is handled by the main command ‘evl’, which runs EVL jobs or workflows or serves other EVL subcommands.

The most common usage is to run a job or workflow in the short way:

evl run/<job>.evl

is just a shortcut to:

evl run run/<job>.evl
evl workflow/<workflow>.ewf

is just a shortcut to:

evl run workflow/<workflow>.ewf [--odate=<yyyymmdd>] \
                                         [-s|--progress]

Full versions of ‘evl’ command invocation are then:

evl run ( run/<job>.evl | <script>.sh )

run specified EVL <job>, or any shell <script>. For details, check ‘man evl-run’.

evl ( run | continue | restart | skip | status ) workflow/<workflow>.ewf

For details, check ‘man evl-workflow’.

evl project

handle EVL projects, like create new or sample one, source variables from project.sh, or get particular project variables. For details, check ‘man evl-project’.

evl <evl_command>

it calls particular EVL command/component, like ‘sort’, or ‘readjson’. All possible EVL components or commands are listed below and each has its own man page which explain usage and arguments. To see man page for a command, run ‘man evl <evl_command>’.

evl init

to initiate EVL installation under your (i.e. non-root) user. To be run only once for each user.

Usage

evl
  ( run/<job>.evl | workflow/<workflow>.ewf )
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress]

evl run
  <job>.evl [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress]

evl ( run | continue | restart | skip | status )
  <workflow>.ewf [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress]

evl run
  <script>.sh [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [<option>...]

evl
  ( <evl_command> | <evl_component> ) [<option>...]

evl project
  ( new | sample | get | set ) <project_dir>

evl
  --expiration-date
evl
  [ <evl_command> | init | project | run | continue | restart | skip | status ]
  ( --help | --usage | --version )

Examples

  1. To run a job with yesterday Ordering Date:
    evl run/staging.invoices.evl --odate=yestarday
    
  2. To run a workflow with yesterday Ordering Date:
    evl workflow/staging.ewf --odate=yestarday
    

Options

Run job/workflow options:

-o, --odate=<yyyymmdd>

run evl job/workflow with specified Ordering Date, environment variable ‘EVL_ODATE’ is ignored

-s, --progress

in EVL job it shows the number of records passed each component, in EVL workflow it shows the states of each component. Refreshed every ‘EVL_PROGRESS_REFRESH_SEC’. By default it is 2 seconds.

-p, --project=<project_dir>

specify project folder if not the current working one

Commands for base and mapping components:

aggreg

aggregate (and map) records by key

assign

assign the content of input flow or file into specified variable

cat

concatenate flows or files

cmd

run any system command with possibility to connect to flow

comp

run custom EVL component

cut

remove columns from input flow or file

depart

gather or merge partitioned flows or files into one partition

echo

write an argument into output flow or file

filter

split flows according to a condition or just filter records out

gather

gather multiple flows or files into one in round-robin fashion

generate

create artificial records

head

output the first part of input flow or file

join

join sorted inputs

lookup

create and remove shared lookup

map

generic mapping

merge

merge sorted inputs (by keeping the sort)

partition

partition input flow or file

sort

sort (and possibly deduplicate) records of input flow or file

sortgroup

sort input flow or file within a group

tac

write flow or file in reverse

tail

output the last part of input flow or file

tee

replicate input flow or file

trash

send flow(s) to /dev/null

uniq

deduplicate sorted input flow or file

validate

check data types and possibly filter out invalid records

watcher

catch flow content into text file, debugging purpose

Commands for read components:

read

generic file reader, handle various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘xls’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘tar’, ‘bz2’, ‘zip’, ‘Z’) and URI Scheme for files (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’)

readpg

read Postgresql table into flow or file

readtd

read Teradata table into flow or file

readjson

parse JSON input

readkafka

consume Kafka topic

readora

read Oracle table into flow or file

raedparquet

read Parquet files

readqvd

read and parse QVD (QlikView, Qlik Sense) file

readxls

read XLS (MS Excel) sheet

readxlsx

read XLSX (MS Excel) sheet

readxml

parse XML input

Commands for write components:

write

generic file writer, handle various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘QVX’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘bz2’, ‘zip’) and URI Scheme for files (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’)

writepg

write flow or file into PostgreSQL table

writetd

write flow or file into Teradata table

writejson

write input as JSON

writekafka

produce Kafka topic

writeora

write flow or file into Oracle table

writeparquet

write flow or file into Parquet files

writeqvd

write flow or file into QVD (QlikView, Qlik Sense) file

writeqvx

write flow or file into QVX (QlikView, Qlik Sense) file

writexlsx

write flow or file into XLSX (MS Excel) files

writexml

write input as XML

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

--version

print version and exit

--expiration-date

return an expiration date of this version of EVL, empty output means no expiration

Environment

The list of all EVL variables with their default values. One can change these values in his ‘~/.evlrc’ file or in the project in ‘project.sh’.

EVL_BUILD_COMP=1

whether to build the job every time it runs or not. In production it is mostly safe to set to ‘0’, so the job is then built only the first time, and then only if the source files changed.

EVL_COLOURS=1

terminal output use colours, but in the case that it cause troubles, one can switch it off by setting environment variable ‘EVL_COLOURS=0

EVL_COMPILER=gcc

mappings are compiled either by GCC or Clang. By this variable one can specify which one to use. Possible valus are:

EVL_COMPILER=gcc
EVL_COMPILER=clang

If this variable is not set, then on Linux systems is GCC used by default, and on Windows and Mac it is Clang.

GCC must be at least in the version 7.4 and Clang at least 6.0.

EVL_COMPILER_PATH

path to GCC’s or Clang’s ‘bin’, ‘include’, ‘lib’ and ‘lib64’ folder. Leave empty to use system-wide GCC/Clang.

EVL_DEBUG_LEVEL=4

specify number between 0 and 9 to say how detailed evl debug messages should be. Higher number means more detailed, ‘0’ means no debug messages at all, ‘9’ means maximum allowed level with lots of messages.

EVL_DEBUG_FAIL_RECORD_NUMBER=2

the number of records to show when fail with ‘EVL_DEBUG_MODE=1

EVL_DEBUG_MODE=0

if set to 1, then it checks if you try to assign NULL value into not nullable field, and provide the most recently processed records in case of a failure. But it slows down the processing, so use only in developmnet or switch on temporarily in production in the case of investigation data problems.

EVL_DEFAULT_FIELD_SEPARATOR='|'

when no ‘sep=’ attribute for a field in EVD file, use this character instead. This character might be any one of the first 128 ascii ones.

EVL_DEFAULT_RECORD_SEPARATOR

when no ‘sep=’ attribute for the last field in EVD file, use this character instead. This character might be any one of the first 128 ascii ones. By default a Linux newline is used:

EVL_DEFAULT_RECORD_SEPARATOR=$'\n'

but to use Windows end of line (i.e. ‘\r\n’), use components’ options ‘--text-input-dos-eol’ and/or ‘--text-output-dos-eol’.

EVL_ENV=DEV

to specify an environment, usually one of ‘DEV’, ‘TEST’ or ‘PROD’.

EVL_ENVSUBST_EVM=1

whether to replace ‘$...’ and ‘${...}’ in ‘EVM’ mapping files by environment variables (by envsubst utility)

EVL_FASTEXPORT_SLEEP, EVL_FASTEXPORT_TENACITY, EVL_FASTEXPORT_SESSIONS

Teradata FastExport options.

EVL_FASTLOAD_ERROR_LIMIT, EVL_FASTLOAD_SESSIONS

Teradata FastLoad options.

EVL_FR=1

if set to 0, then EVL File Register is not used, only provide debug messages, but does nothing.

EVL_FR_LOG_FILE

file to be used for storing information for EVL File Register.

EVL_KAFKA_CONSUMER_COMMAND, EVL_KAFKA_PRODUCER_COMMAND

paths to Kafka consumer and producer commands.

EVL_LOG_PATH="$HOME/evl-log"

path to logs from job and workflow runs. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.

EVL_MAIL_SEND=1

send e-mails by default in the case of fails in a workflows or by the commmand Mail. To switch off, for example in non-production environments, set ‘EVL_MAIL_SEND=0’.

EVL_MONITOR_SQLITE_PATH="$EVL_LOG_PATH"

path to SQLite database which is necessary for EVL Manager. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.

EVL_NICE=1

each EVL command and component is fired prefixed by:

eval nice -n $EVL_NICE

To change the priority of EVL processes, to have EVL jobs "nicer", one can set ‘EVL_NICE’ to the value between 0 and 19. Higher number means that processes will have lower priority. For details one can check ‘man nice’.

EVL_ODATE

when no ‘--odate=’ option is used when running a job or workflow, it tries to use an Ordering Date from this variable. So calling:

evl run/some_job.evl --odate=20200628

is the same as:

export EVL_ODATE=20200628
evl run/some_job.evl
EVL_PARTITIONS

to specify how many partitions to use in ‘Partition’ component. This EVL installation allows at most ‘1024’ partitions.

EVL_PROGRESS_REFRESH_SEC=2

when ‘--progress’ option is used, it refresh the state every 2 seconds by default. To change this default, set this variable to other number of seconds. Possible range is from 1 to 30.

EVL_PROJECT_LOG_DIR

by default project’s log directory is set to:

EVL_PROJECT_LOG_DIR="$EVL_LOG_PATH/<project_name>"
EVL_PROJECT_TMP_DIR

by default project’s temporary directory is set to:

EVL_PROJECT_TMP_DIR="$EVL_TMP_PATH/<project_name>"
EVL_RUN_ID_FILE

path to file which stores incremental ‘RUN_ID’, a unique ID of each job or workflow run. It is unique within a project. By default it is:

EVL_RUN_ID_FILE="$EVL_PROJECT_LOG_DIR/evl_run_id.hwm"
EVL_TMP_PATH="/tmp"

path to (local) temporary directory, to be used by jobs and workflows. Situate this folder on the same mount point as data will be, to make ‘mv’ command fastest as possible. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.

EVL_WATCHER=0

whether or not the component ‘Watcher’ is silent. In production this would be usually set to ‘0’, but in development, if ‘Watcher’ is used to investigate interim data, it is fine to set to ‘1’. Check ‘man evl-watcher’ for more details.

evl project

<project_dir> is the name of the directory with some EVL project. Either full or relative path can be specified. Last folder in the <project_dir> path is considered as project name. Prefer to use small letters for the project name, however numbers, capital letters, underscore and dash are possible.

Projects can be included into another projects. But remember that parent’s project.sh is not automatically included (i.e. sourced) by subproject’s one.

new <project_dir> [<project_dir>...]

create <project_dir> directory with standard subfolders structure and default project.sh configuration file.

sample <project_dir> [<project_dir>...]

create <project_dir> directory with sample data and sample jobs and workflows.

get [--path] <variable_name> [<project_dir>]

get the value of <variable_name>, based on the project.sh configuration file. Search ‘project.sh’ in the current directory, unless <project_dir> if mentioned. With option ‘--path’, it returns path in a clean way (i.e. no multiple slashes, no slash at the end, no ‘/./’, no spaces or tabs at the end or beginning).

set [<project_dir> [<project_dir>...] ]

source the project.sh configuration file variables into environment. Search ‘project.sh’ in the current directory, unless <project_dir> if mentioned.

To drop the whole project simply delete the folder recursively.

Synopsis

evl project
  ( new | sample | set ) <project_dir>... [-v|--verbose]

evl project
  get [--path] <variable_name> [<project_dir>] [-v|--verbose]

evl project
  ( --help | --usage | --version )

Options

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

--version

print version and exit

Examples

  1. To create three main projects with couple of subprojects:
    # shared to all projects
    evl project new shared
    
    evl project new stage       # shared stuff only for "stage" projects
    evl project new stage/sap stage/tap stage/erp stage/signaling
    
    evl project new dwh         # shared stuff only for "dwh" projects
    evl project new dwh/usage dwh/billing dwh/party dwh/contract dwh/product
    
    evl project new mart        # shared stuff only for "mart" projects
    evl project new mart/marketing mart/sales
    
  2. To create new project with sample data and jobs:
    evl project sample my_sample
    
  3. To get the project path to log directory (i.e. EVL_PROJECT_LOG_DIR):
    evl project get --path EVL_PROJECT_LOG_DIR
    
  4. To set the project variables into environment:
    evl project set stage/sap
    

which simply do this:

source stage/sap/project.sh

evl run

EVL Run command can be run from commandline invocation and within an EVL workflow

evl run ( <job>.evl | <workflow>.ewf | <script>.sh )...

standalone commandline usage

Run ( [<time>[smhd]] [<retries>r] ( --file=<mask> | <job><script> ) )...

within a workflow, i.e. in an EVS file.

In both cases it runs <evl_job> or any Bash <script>. If more than one is provided, then run them one after another, once one fail, then whole command fails.

Synopsis

Run
  ( [<time>[smhd]] [<retries>r] ( --file=<mask> | <job>|<workflow>|<script> ) )...
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>]

evl run
  ( <job>.evl | <workflow>.ewf | <script>.sh )...
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>]
  [-s|--progress] [-v|--verbose]

evl run
  ( --help | --usage | --version )

Options

-o, --odate=<yyyymmdd>

run evl job with specified Ordering Date, environment variable ‘EVL_ODATE’ is ignored

-s, --progress

show the number of records passed each component, refreshed every ‘$EVL_PROGRESS_REFRESH_SEC’. By default it is 2 seconds.

-p, --project=<project_dir>

specify project folder if not the current working one

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

-v, --verbose

print to stderr info/debug messages of the component

--version

print version and exit

Examples

  1. To run an EVL job with yesterday ‘ODATE’ showing progress:
    evl run/aggreg_invoices.evl --odate=yesterday --progress
    
  2. To run an EVL job in an EVL Workflow (in an ‘ews’ file):
    Run aggreg_invoices.evl
    

evl workflow

Run specified workflow <workflow>.ewf with given Ordering DATE <odate>.

An Ordering DATE is a date for which the data are being processed. Every workflow has to run with some <odate>. When no <odate> is specified, then current date is used.

EVL workflow consists of components, which are used in EWS workflow structure definition file. ‘EWS’ is EVL workflow structure file (workflow template), for details see ‘man 5 evl-ews’.

Arguments

run

run <workflow>.ewf with Ordering DATE (‘ODATE’) equal to <odate>. In case that workflow with given ‘ODATE’ has been started in the past, it will fail. Use ‘continue’ or ‘restart’ in such cases. This command is intended to be scheduled by cron for example.

continue

continue <workflow>.ewf with Ordering DATE equal <odate> from last failed step, i.e. do not run again already successfully finished steps. This command is useful for usual manual restart from failed point.

restart

restart whole <workflow>.ewf (with given ODATE) from the beginning, no matter what is the status of the workflow. Use this command with care, normally not to be used in production environment.

Ordering Date

An <odate> can be of any form that standard GNU/Linux command ‘date’ can recognize as a date. Recommended is however to use format ‘YYYYMMDD’ or ‘yesterday’.

Synopsis

syntax/workflow
evl
  ( run | continue | restart ) <workflow>.ewf
  [-o|--odate=<odate>] [-p|--project=<project_dir>]
  [-s|--progress] [-v|--verbose]

evl
  ( run | continue | restart | workflow )
  ( --help | --usage | --version )

Options

-o, --odate=<odate>

run evl workflow with specified <odate>, environment variable ‘EVL_ODATE’ is ignored

-s, --progress

it shows the states of each component, refreshed every ‘$EVL_PROGRESS_REFRESH_SEC’ seconds. (2 seconds by default.)

-p, --project=<project_dir>

specify project folder if not the current working one

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

-v, --verbose

print to stderr info/debug messages of the component

--version

print version and exit

Commands

EVL workflow structure file (‘*.ews’ file) is resolved as Bash script. Following EVL commands can be used, see ‘man evl-<command>’ for details.

Cp

copy files, handle also HDFS

end

end up an EVL job or workflow structures (‘EVS’ or ‘EWS’ files)

Fr

File Register logging, i.e. register files, mark them as processed and/or move them to archive directory.

Ls

list directory contents, handle also HDFS and AWS S3

Mail

send an e-mail

Mkdir

create directory, handle also HDFS

Mv

move (rename) files (handle also HDFS, GS, S3, SFTP)

Rm

remove files or directories (handle also HDFS, GS, S3, SFTP)

Set

set a status to given EVL job or workflow or shell script

Skip

skip run of a given EVL job or workflow

Snmp

send a SNMP trap message

Spark

run the Scala Spark jar file or build jar file from specified source

Status

print the status of a given EVL job or workflow

Test

check file types and existence, handle also hdfs

Wait

split pieces of EVL job or workflow into stages

Run component

EVL workflow structure file (EWS file) is resolved as Bash script. Next to Commands above, which are run immediately, there is a ‘Run’ component which is just parsed, but fired later once ‘Wait’ or ‘End’ command is reached.

For details see ‘man evl-run’.