EVL

Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Main EVL Command

All command line EVL functionality is handled by the main command ‘evl’, which runs EVL jobs or workflows or serves other EVL subcommands.

The most common usage is to run a job or workflow in the short way:

evl run/<job>.evl

is just a shortcut to:

evl run run/<job>.evl

evl workflow/<workflow>.ewf

is just a shortcut to:

evl run workflow/<workflow>.ewf [--odate=<yyyymmdd>] \
                                         [-s|--progress]

Full versions of ‘evl’ command invocation are then:

evl run ( run/<job>.evl | <script>.sh ): run specified EVL <job>, or any shell <script>. For details, check ‘man evl-run’.
evl ( run | continue | restart | skip | status ) workflow/<workflow>.ewf: For details, check ‘man evl-workflow’.
evl project: handle EVL projects, like create new or sample one, source variables from project.sh, or get particular project variables. For details, check ‘man evl-project’.
evl <evl_command>: it calls particular EVL command/component, like ‘sort’, or ‘readjson’. All possible EVL components or commands are listed below and each has its own man page which explain usage and arguments. To see man page for a command, run ‘man evl <evl_command>’.
evl init: to initiate EVL installation under your (i.e. non-root) user. To be run only once for each user.

Usage

evl
  ( run/<job>.evl | workflow/<workflow>.ewf )
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress]

evl run
  <job>.evl [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress]

evl ( run | continue | restart | skip | status )
  <workflow>.ewf [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress]

evl run
  <script>.sh [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [<option>...]

evl
  ( <evl_command> | <evl_component> ) [<option>...]

evl project
  ( new | sample | get | set ) <project_dir>

evl
  --expiration-date
evl
  [ <evl_command> | init | project | run | continue | restart | skip | status ]
  ( --help | --usage | --version )

Examples

To run a job with yesterday Ordering Date:

evl run/staging.invoices.evl --odate=yestarday

To run a workflow with yesterday Ordering Date:
```
evl workflow/staging.ewf --odate=yestarday
```

Options

Run job/workflow options:

-o, --odate=<yyyymmdd>: run evl job/workflow with specified Ordering Date, environment variable ‘EVL_ODATE’ is ignored
-s, --progress: in EVL job it shows the number of records passed each component, in EVL workflow it shows the states of each component. Refreshed every ‘EVL_PROGRESS_REFRESH_SEC’. By default it is 2 seconds.
-p, --project=<project_dir>: specify project folder if not the current working one

Commands for base and mapping components:

aggreg: aggregate (and map) records by key
assign: assign the content of input flow or file into specified variable
cat: concatenate flows or files
cmd: run any system command with possibility to connect to flow
comp: run custom EVL component
cut: remove columns from input flow or file
depart: gather or merge partitioned flows or files into one partition
echo: write an argument into output flow or file
filter: split flows according to a condition or just filter records out
gather: gather multiple flows or files into one in round-robin fashion
generate: create artificial records
head: output the first part of input flow or file
join: join sorted inputs
lookup: create and remove shared lookup
map: generic mapping
merge: merge sorted inputs (by keeping the sort)
partition: partition input flow or file
sort: sort (and possibly deduplicate) records of input flow or file
sortgroup: sort input flow or file within a group
tac: write flow or file in reverse
tail: output the last part of input flow or file
tee: replicate input flow or file
trash: send flow(s) to /dev/null
uniq: deduplicate sorted input flow or file
validate: check data types and possibly filter out invalid records
watcher: catch flow content into text file, debugging purpose

Commands for read components:

read: generic file reader, handle various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘xls’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘tar’, ‘bz2’, ‘zip’, ‘Z’) and URI Scheme for files (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’)
readpg: read Postgresql table into flow or file
readtd: read Teradata table into flow or file
readjson: parse JSON input
readkafka: consume Kafka topic
readora: read Oracle table into flow or file
raedparquet: read Parquet files
readqvd: read and parse QVD (QlikView, Qlik Sense) file
readxls: read XLS (MS Excel) sheet
readxlsx: read XLSX (MS Excel) sheet
readxml: parse XML input

Commands for write components:

write: generic file writer, handle various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘QVX’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘bz2’, ‘zip’) and URI Scheme for files (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’)
writepg: write flow or file into PostgreSQL table
writetd: write flow or file into Teradata table
writejson: write input as JSON
writekafka: produce Kafka topic
writeora: write flow or file into Oracle table
writeparquet: write flow or file into Parquet files
writeqvd: write flow or file into QVD (QlikView, Qlik Sense) file
writeqvx: write flow or file into QVX (QlikView, Qlik Sense) file
writexlsx: write flow or file into XLSX (MS Excel) files
writexml: write input as XML

Standard options:

--help: print this help and exit
--usage: print short usage information and exit
--version: print version and exit
--expiration-date: return an expiration date of this version of EVL, empty output means no expiration

Environment

The list of all EVL variables with their default values. One can change these values in his ‘~/.evlrc’ file or in the project in ‘project.sh’.

EVL_BUILD_COMP=1

whether to build the job every time it runs or not. In production it is mostly safe to set to ‘0’, so the job is then built only the first time, and then only if the source files changed.

EVL_COLOURS=1

terminal output use colours, but in the case that it cause troubles, one can switch it off by setting environment variable ‘EVL_COLOURS=0’

EVL_COMPILER=gcc

mappings are compiled either by GCC or Clang. By this variable one can specify which one to use. Possible valus are:

EVL_COMPILER=gcc
EVL_COMPILER=clang

If this variable is not set, then on Linux systems is GCC used by default, and on Windows and Mac it is Clang.

GCC must be at least in the version 7.4 and Clang at least 6.0.

EVL_COMPILER_PATH

path to GCC’s or Clang’s ‘bin’, ‘include’, ‘lib’ and ‘lib64’ folder. Leave empty to use system-wide GCC/Clang.

EVL_DEBUG_LEVEL=4

specify number between 0 and 9 to say how detailed evl debug messages should be. Higher number means more detailed, ‘0’ means no debug messages at all, ‘9’ means maximum allowed level with lots of messages.

EVL_DEBUG_FAIL_RECORD_NUMBER=2

the number of records to show when fail with ‘EVL_DEBUG_MODE=1’

EVL_DEBUG_MODE=0

if set to 1, then it checks if you try to assign NULL value into not nullable field, and provide the most recently processed records in case of a failure. But it slows down the processing, so use only in developmnet or switch on temporarily in production in the case of investigation data problems.

EVL_DEFAULT_FIELD_SEPARATOR='|'

when no ‘sep=’ attribute for a field in EVD file, use this character instead. This character might be any one of the first 128 ascii ones.

EVL_DEFAULT_RECORD_SEPARATOR

when no ‘sep=’ attribute for the last field in EVD file, use this character instead. This character might be any one of the first 128 ascii ones. By default a Linux newline is used:

EVL_DEFAULT_RECORD_SEPARATOR=$'\n'

but to use Windows end of line (i.e. ‘\r\n’), use components’ options ‘--text-input-dos-eol’ and/or ‘--text-output-dos-eol’.

EVL_ENV=DEV

to specify an environment, usually one of ‘DEV’, ‘TEST’ or ‘PROD’.

EVL_ENVSUBST_EVM=1

whether to replace ‘$...’ and ‘${...}’ in ‘EVM’ mapping files by environment variables (by envsubst utility)

EVL_FASTEXPORT_SLEEP, EVL_FASTEXPORT_TENACITY, EVL_FASTEXPORT_SESSIONS

Teradata FastExport options.

EVL_FASTLOAD_ERROR_LIMIT, EVL_FASTLOAD_SESSIONS

Teradata FastLoad options.

EVL_FR=1

if set to 0, then EVL File Register is not used, only provide debug messages, but does nothing.

EVL_FR_LOG_FILE

file to be used for storing information for EVL File Register.

EVL_KAFKA_CONSUMER_COMMAND, EVL_KAFKA_PRODUCER_COMMAND

paths to Kafka consumer and producer commands.

EVL_LOG_PATH="$HOME/evl-log"

path to logs from job and workflow runs. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.

EVL_MAIL_SEND=1

send e-mails by default in the case of fails in a workflows or by the commmand Mail. To switch off, for example in non-production environments, set ‘EVL_MAIL_SEND=0’.

EVL_MONITOR_SQLITE_PATH="$EVL_LOG_PATH"

path to SQLite database which is necessary for EVL Manager. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.

EVL_NICE=1

each EVL command and component is fired prefixed by:

eval nice -n $EVL_NICE

To change the priority of EVL processes, to have EVL jobs "nicer", one can set ‘EVL_NICE’ to the value between 0 and 19. Higher number means that processes will have lower priority. For details one can check ‘man nice’.

EVL_ODATE

when no ‘--odate=’ option is used when running a job or workflow, it tries to use an Ordering Date from this variable. So calling:

evl run/some_job.evl --odate=20200628

is the same as:

export EVL_ODATE=20200628
evl run/some_job.evl

EVL_PARTITIONS

to specify how many partitions to use in ‘Partition’ component. This EVL installation allows at most ‘1024’ partitions.

EVL_PROGRESS_REFRESH_SEC=2

when ‘--progress’ option is used, it refresh the state every 2 seconds by default. To change this default, set this variable to other number of seconds. Possible range is from 1 to 30.

EVL_PROJECT_LOG_DIR

by default project’s log directory is set to:

EVL_PROJECT_LOG_DIR="$EVL_LOG_PATH/<project_name>"

EVL_PROJECT_TMP_DIR

by default project’s temporary directory is set to:

EVL_PROJECT_TMP_DIR="$EVL_TMP_PATH/<project_name>"

EVL_RUN_ID_FILE

path to file which stores incremental ‘RUN_ID’, a unique ID of each job or workflow run. It is unique within a project. By default it is:

EVL_RUN_ID_FILE="$EVL_PROJECT_LOG_DIR/evl_run_id.hwm"

EVL_TMP_PATH="/tmp"

path to (local) temporary directory, to be used by jobs and workflows. Situate this folder on the same mount point as data will be, to make ‘mv’ command fastest as possible. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.

EVL_WATCHER=0

whether or not the component ‘Watcher’ is silent. In production this would be usually set to ‘0’, but in development, if ‘Watcher’ is used to investigate interim data, it is fine to set to ‘1’. Check ‘man evl-watcher’ for more details.

evl project

<project_dir> is the name of the directory with some EVL project. Either full or relative path can be specified. Last folder in the <project_dir> path is considered as project name. Prefer to use small letters for the project name, however numbers, capital letters, underscore and dash are possible.

Projects can be included into another projects. But remember that parent’s project.sh is not automatically included (i.e. sourced) by subproject’s one.

new <project_dir> [<project_dir>...]: create <project_dir> directory with standard subfolders structure and default project.sh configuration file.
sample <project_dir> [<project_dir>...]: create <project_dir> directory with sample data and sample jobs and workflows.
get [--path] <variable_name> [<project_dir>]: get the value of <variable_name>, based on the project.sh configuration file. Search ‘project.sh’ in the current directory, unless <project_dir> if mentioned. With option ‘--path’, it returns path in a clean way (i.e. no multiple slashes, no slash at the end, no ‘/./’, no spaces or tabs at the end or beginning).
set [<project_dir> [<project_dir>...] ]: source the project.sh configuration file variables into environment. Search ‘project.sh’ in the current directory, unless <project_dir> if mentioned.

To drop the whole project simply delete the folder recursively.

Synopsis

evl project
  ( new | sample | set ) <project_dir>... [-v|--verbose]

evl project
  get [--path] <variable_name> [<project_dir>] [-v|--verbose]

evl project
  ( --help | --usage | --version )

Options

Standard options:

--help: print this help and exit
--usage: print short usage information and exit
--version: print version and exit

Examples

To create three main projects with couple of subprojects:

# shared to all projects
evl project new shared

evl project new stage       # shared stuff only for "stage" projects
evl project new stage/sap stage/tap stage/erp stage/signaling

evl project new dwh         # shared stuff only for "dwh" projects
evl project new dwh/usage dwh/billing dwh/party dwh/contract dwh/product

evl project new mart        # shared stuff only for "mart" projects
evl project new mart/marketing mart/sales

To create new project with sample data and jobs:
```
evl project sample my_sample
```
To get the project path to log directory (i.e. EVL_PROJECT_LOG_DIR):
```
evl project get --path EVL_PROJECT_LOG_DIR
```
To set the project variables into environment:
```
evl project set stage/sap
```

which simply do this:

source stage/sap/project.sh

evl run

EVL Run command can be run from commandline invocation and within an EVL workflow

evl run ( <job>.evl | <workflow>.ewf | <script>.sh )...: standalone commandline usage
Run ( [<time>[smhd]] [<retries>r] ( --file=<mask> | <job><script> ) )...: within a workflow, i.e. in an EVS file.

In both cases it runs <evl_job> or any Bash <script>. If more than one is provided, then run them one after another, once one fail, then whole command fails.

Synopsis

Run
  ( [<time>[smhd]] [<retries>r] ( --file=<mask> | <job>|<workflow>|<script> ) )...
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>]

evl run
  ( <job>.evl | <workflow>.ewf | <script>.sh )...
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>]
  [-s|--progress] [-v|--verbose]

evl run
  ( --help | --usage | --version )

Options

-o, --odate=<yyyymmdd>: run evl job with specified Ordering Date, environment variable ‘EVL_ODATE’ is ignored
-s, --progress: show the number of records passed each component, refreshed every ‘$EVL_PROGRESS_REFRESH_SEC’. By default it is 2 seconds.
-p, --project=<project_dir>: specify project folder if not the current working one

Standard options:

--help: print this help and exit
--usage: print short usage information and exit
-v, --verbose: print to stderr info/debug messages of the component
--version: print version and exit

Examples

To run an EVL job with yesterday ‘ODATE’ showing progress:
```
evl run/aggreg_invoices.evl --odate=yesterday --progress
```
To run an EVL job in an EVL Workflow (in an ‘ews’ file):
```
Run aggreg_invoices.evl
```

evl workflow

Run specified workflow <workflow>.ewf with given Ordering DATE <odate>.

An Ordering DATE is a date for which the data are being processed. Every workflow has to run with some <odate>. When no <odate> is specified, then current date is used.

EVL workflow consists of components, which are used in EWS workflow structure definition file. ‘EWS’ is EVL workflow structure file (workflow template), for details see ‘man 5 evl-ews’.

Arguments

run: run <workflow>.ewf with Ordering DATE (‘ODATE’) equal to <odate>. In case that workflow with given ‘ODATE’ has been started in the past, it will fail. Use ‘continue’ or ‘restart’ in such cases. This command is intended to be scheduled by cron for example.
continue: continue <workflow>.ewf with Ordering DATE equal <odate> from last failed step, i.e. do not run again already successfully finished steps. This command is useful for usual manual restart from failed point.
restart: restart whole <workflow>.ewf (with given ODATE) from the beginning, no matter what is the status of the workflow. Use this command with care, normally not to be used in production environment.

Ordering Date

An <odate> can be of any form that standard GNU/Linux command ‘date’ can recognize as a date. Recommended is however to use format ‘YYYYMMDD’ or ‘yesterday’.

Synopsis

evl
  ( run | continue | restart ) <workflow>.ewf
  [-o|--odate=<odate>] [-p|--project=<project_dir>]
  [-s|--progress] [-v|--verbose]

evl
  ( run | continue | restart | workflow )
  ( --help | --usage | --version )

Options

-o, --odate=<odate>: run evl workflow with specified <odate>, environment variable ‘EVL_ODATE’ is ignored
-s, --progress: it shows the states of each component, refreshed every ‘$EVL_PROGRESS_REFRESH_SEC’ seconds. (2 seconds by default.)
-p, --project=<project_dir>: specify project folder if not the current working one

Standard options:

--help: print this help and exit
--usage: print short usage information and exit
-v, --verbose: print to stderr info/debug messages of the component
--version: print version and exit

Commands

EVL workflow structure file (‘*.ews’ file) is resolved as Bash script. Following EVL commands can be used, see ‘man evl-<command>’ for details.

Cp: copy files, handle also HDFS
end: end up an EVL job or workflow structures (‘EVS’ or ‘EWS’ files)
Fr: File Register logging, i.e. register files, mark them as processed and/or move them to archive directory.
Ls: list directory contents, handle also HDFS and AWS S3
Mail: send an e-mail
Mkdir: create directory, handle also HDFS
Mv: move (rename) files (handle also HDFS, GS, S3, SFTP)
Rm: remove files or directories (handle also HDFS, GS, S3, SFTP)
Set: set a status to given EVL job or workflow or shell script
Skip: skip run of a given EVL job or workflow
Snmp: send a SNMP trap message
Spark: run the Scala Spark jar file or build jar file from specified source
Status: print the status of a given EVL job or workflow
Test: check file types and existence, handle also hdfs
Wait: split pieces of EVL job or workflow into stages

Run component

EVL workflow structure file (EWS file) is resolved as Bash script. Next to Commands above, which are run immediately, there is a ‘Run’ component which is just parsed, but fired later once ‘Wait’ or ‘End’ command is reached.

For details see ‘man evl-run’.

EVL

Table of Contents

Main EVL Command

Usage

Examples

Options

Run job/workflow options:

Commands for base and mapping components:

Commands for read components:

Commands for write components:

Standard options:

Environment

evl project

Synopsis

Options

Standard options:

Examples

evl run

Synopsis

Options

Standard options:

Examples

evl workflow

Arguments

Ordering Date

Synopsis

Options

Standard options:

Commands

Run component