EVL

Table of Contents


Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Copyright © 2017–2020 EVL Tool, s.r.o.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Main EVL Command

All command line EVL functionality is handled by the main command ‘evl’, which runs EVL jobs or workflows or serves other EVL subcommands.

The most common usage is to run a job or workflow in the short way:

evl run/<job>.evl

is just a shortcut to:

evl run run/<job>.evl
evl workflow/<workflow>.ewf

is just a shortcut to:

evl run workflow/<workflow>.ewf [--odate=<yyyymmdd>] \
                                         [-s|--progress]

Full versions of ‘evl’ command invocation are then:

evl run ( run/<job>.evl | <script>.sh )

run specified EVL <job>, or any shell <script>. For details, check ‘man evl-run’.

evl ( run | continue | restart ) workflow/<workflow>.ewf

For details, check ‘man evl-workflow’.

evl project

handle EVL projects, like create new or sample one, source variables from project.sh, or get particular project variables. For details, check ‘man evl-project’.

evl <evl_command>

it calls particular EVL command/component, like ‘sort’, or ‘readjson’. All possible EVL components or commands are listed below and each has its own man page which explain usage and arguments. To see man page for a command, run ‘man evl <evl_command>’.

evl init

to initiate EVL installation under your (i.e. non-root) user. To be run only once for each user.

Usage

evl
  ( run/<job>.evl | workflow/<workflow>.ewf )
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress]

evl
  run <job>.evl
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress]

evl
  ( run | continue | restart ) <workflow>.ewf
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress]

evl
  run <script>.sh
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [<option>...]

evl
  ( <evl_command> | <evl_component> ) [<option>...]

evl
  --expiration-date

evl
  [ <evl_command> ]
  ( --help | --usage | --version )

Examples

  1. To run a job with yesterday Order Date:
    evl run/staging.invoices.evl --odate=yesterday
    
  2. To run a workflow with yesterday Order Date:
    evl workflow/staging.ewf --odate=yesterday
    

Options

Run job/workflow options:

-o, --odate=<yyyymmdd>

run evl job/workflow with specified Order Date, environment variable ‘EVL_ODATE’ is ignored

-p, --project=<project_dir>

specify project folder if not the current working one

-s, --progress

in EVL job it shows the number of records passed each component, in EVL workflow it shows the states of each component. Refreshed every ‘EVL_PROGRESS_REFRESH_SEC’. By default it is 2 seconds.

Commands for base and mapping components:

aggreg

aggregate (and map) records by key

assign

assign the content of input flow or file into specified variable

cat

concatenate flows or files

cmd

run any system command with possibility to connect to flow

comp

run custom EVL component

cut

remove columns from input flow or file

depart

gather or merge partitioned flows or files into one partition

echo

write an argument into output flow or file

filter

split flows according to a condition or just filter records out

gather

gather multiple flows or files into one in round-robin fashion

generate

create artificial records

head

output the first part of input flow or file

join

join sorted inputs

lookup

create and remove shared lookup

map

generic mapping

merge

merge sorted inputs (by keeping the sort)

partition

partition input flow or file

sort

sort (and possibly deduplicate) records of input flow or file

sortgroup

sort input flow or file within a group

tac

write flow or file in reverse

tail

output the last part of input flow or file

tee

replicate input flow or file

trash

send flow(s) to /dev/null

uniq

deduplicate sorted input flow or file

validate

check data types and possibly filter out invalid records

watcher

catch flow content into text file, debugging purpose

Commands for read components:

read

generic source reader, handle various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘xls’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘tar’, ‘bz2’, ‘zip’, ‘Z’) and URI Scheme for file storage (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’) and for DBs (‘mysql://’, ‘postgres://’, ‘oracle://’, ‘sqlite://’, ‘teradata://’)

readasn1

read ASN.1 format

readavro

read and parse Avro file format

readjson

parse JSON input

readkafka

consume Kafka topic

readmysql

read MariaDB/MySQL table into flow or file

readora

read Oracle table into flow or file

raedparquet

read Parquet files

readpg

read PostgreSQL table into flow or file

readqvd

read and parse QVD (QlikView, Qlik Sense) file

readtd

read Teradata table into flow or file

readxls

read XLS (MS Excel) sheet

readxlsx

read XLSX (MS Excel) sheet

readxml

parse XML input

Commands for write components:

write

generic file writer, handle various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘QVX’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘bz2’, ‘zip’) and URI Scheme for files (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’)

writeavro

write input as Avro file

writejson

write input as JSON

writekafka

produce Kafka topic

writemysql

write flow or file into MariaDB/MySQL table

writeora

write flow or file into Oracle table

writeparquet

write flow or file into Parquet files

writepg

write flow or file into PostgreSQL table

writeqvd

write flow or file into QVD (QlikView, Qlik Sense) file

writeqvx

write flow or file into QVX (QlikView, Qlik Sense) file

writetd

write flow or file into Teradata table

writexlsx

write flow or file into XLSX (MS Excel) files

writexml

write input as XML

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

--version

print version and exit

--expiration-date

return an expiration date of this version of EVL, empty output means no expiration

Environment

The list of all EVL variables with their default values. One can change these values in his ‘~/.evlrc’ file or in the project in ‘project.sh’.

EVL_BUILD_COMP=1

whether to build the job every time it runs or not. In production it is mostly safe to set to ‘0’, so the job is then built only the first time, and then only if the source files changed.

EVL_COLOURS=1

terminal output use colours, but in the case that it cause troubles, one can switch it off by setting environment variable ‘EVL_COLOURS=0

EVL_COMPILER=gcc

mappings are compiled either by GCC or Clang. By this variable one can specify which one to use. Possible valus are:

EVL_COMPILER=gcc
EVL_COMPILER=clang

If this variable is not set, then on Linux systems is GCC used by default, and on Windows and Mac it is Clang.

GCC must be at least in the version 7.4 and Clang at least 6.0.

EVL_COMPILER_PATH

path to GCC’s or Clang’s ‘bin’, ‘include’, ‘lib’ and ‘lib64’ folder. Leave empty to use system-wide GCC/Clang.

EVL_DEBUG_LEVEL=4

specify number between 0 and 9 to say how detailed evl debug messages should be. Higher number means more detailed, ‘0’ means no debug messages at all, ‘9’ means maximum allowed level with lots of messages.

EVL_DEBUG_FAIL_RECORD_NUMBER=2

the number of records to show when fail with ‘EVL_DEBUG_MODE=1

EVL_DEBUG_MODE=0

if set to 1, then it checks if you try to assign NULL value into not nullable field, and provide the most recently processed records in case of a failure. But it slows down the processing, so use only in developmnet or switch on temporarily in production in the case of investigation data problems.

EVL_DEFAULT_FIELD_SEPARATOR='|'

when no ‘sep=’ attribute for a field in EVD file, use this character instead. This character might be any one of the first 128 ascii ones.

EVL_DEFAULT_RECORD_SEPARATOR

when no ‘sep=’ attribute for the last field in EVD file, use this character instead. This character might be any one of the first 128 ascii ones. By default a Linux newline is used:

EVL_DEFAULT_RECORD_SEPARATOR=$'\n'

but to use Windows end of line (i.e. ‘\r\n’), use components’ options ‘--text-input-dos-eol’ and/or ‘--text-output-dos-eol’.

EVL_ENV=DEV

to specify an environment, usually one of ‘DEV’, ‘TEST’ or ‘PROD’.

EVL_ENVSUBST_EVD=1

whether to replace ‘$...’ and ‘${...}’ in ‘EVD’ mapping files by environment variables (by envsubst utility)

EVL_ENVSUBST_EVM=1

whether to replace ‘$...’ and ‘${...}’ in ‘EVM’ mapping files by environment variables (by envsubst utility)

EVL_FASTEXPORT_SLEEP, EVL_FASTEXPORT_TENACITY, EVL_FASTEXPORT_SESSIONS

Teradata FastExport options.

EVL_FASTLOAD_ERROR_LIMIT, EVL_FASTLOAD_SESSIONS

Teradata FastLoad options.

EVL_FR=1

if set to 0, then EVL File Register is not used, only provide debug messages, but does nothing.

EVL_FR_LOG_FILE

file to be used for storing information for EVL File Register.

EVL_KAFKA_CONSUMER_COMMAND, EVL_KAFKA_PRODUCER_COMMAND

paths to Kafka consumer and producer commands.

EVL_LOG_PATH="$HOME/evl-log"

path to logs from job and workflow runs. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.

EVL_MAIL_SEND=1

send e-mails by default in the case of fails in a workflows or by the commmand Mail. To switch off, for example in non-production environments, set ‘EVL_MAIL_SEND=0’.

EVL_MONITOR_SQLITE_PATH="$EVL_LOG_PATH"

path to SQLite database which is necessary for EVL Manager. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.

EVL_NICE=1

each EVL command and component is fired prefixed by:

eval nice -n $EVL_NICE

To change the priority of EVL processes, to have EVL jobs "nicer", one can set ‘EVL_NICE’ to the value between 0 and 19. Higher number means that processes will have lower priority. For details one can check ‘man nice’.

EVL_ODATE

when no ‘--odate=’ option is used when running a job or workflow, it tries to use an Order Date from this variable. So calling:

evl run/some_job.evl --odate=20200930

is the same as:

export EVL_ODATE=20200930
evl run/some_job.evl
EVL_PARTITIONS

to specify how many partitions to use in ‘Partition’ component. This EVL installation allows at most ‘1024’ partitions.

EVL_PASSFILE="$HOME/.evlpass"

contains path to file with passwords. Must have ‘600’ permissions. Structure of the file:

type:server:port:database:username:encrypted_password

So for example:

postgresql:10.0.0.10:5432:some_db:some_user:ka786_Ufzf5oaD9
oracle:10.0.0.10:1521:some_db:some_user:ka786_Ufzf5oaD9
sftp:100.10.9.8:22:/target/folder:user:LKKo-098
impala:localhost:3001:impala_user:2_lLkPl_010
hsm:212.0.0.11:288:USR_0000:162534

For details see ‘man evl-password’.

EVL_PROGRESS_REFRESH_SEC=2

when ‘--progress’ option is used, it refresh the state every 2 seconds by default. To change this default, set this variable to other number of seconds. Possible range is from 1 to 30.

EVL_PROJECT_LOG_DIR

by default project’s log directory is set to:

EVL_PROJECT_LOG_DIR="$EVL_LOG_PATH/<project_name>"
EVL_PROJECT_TMP_DIR

by default project’s temporary directory is set to:

EVL_PROJECT_TMP_DIR="$EVL_TMP_PATH/<project_name>"
EVL_RUN_ID_FILE

path to file which stores incremental ‘RUN_ID’, a unique ID of each job or workflow run. It is unique within a project. By default it is:

EVL_RUN_ID_FILE="$EVL_PROJECT_LOG_DIR/evl_run_id.hwm"
EVL_TMP_PATH="/tmp"

path to (local) temporary directory, to be used by jobs and workflows. Situate this folder on the same mount point as data will be, to make ‘mv’ command fastest as possible. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.

EVL_WATCHER=0

whether or not the component ‘Watcher’ is silent. In production this would be usually set to ‘0’, but in development, if ‘Watcher’ is used to investigate interim data, it is fine to set to ‘1’. Check ‘man evl-watcher’ for more details.

evl project

<project_dir> is the name of the directory with some EVL project. Either full or relative path can be specified. Last folder in the <project_dir> path is considered as project name. Prefer to use small letters for the project name, however numbers, capital letters, underscore and dash are possible.

Projects can be included into another projects. But remember that parent’s project.sh is not automatically included (i.e. sourced) by subproject’s one.

new <project_dir> [<project_dir>...]

create <project_dir> directory with standard subfolders structure and default project.sh configuration file.

sample <project_dir> [<project_dir>...]

create <project_dir> directory with sample data and sample jobs and workflows.

get [--path] [--omit-newline] <variable_name> [<project_dir>]

get the value of <variable_name>, based on the project.sh configuration file. Search ‘project.sh’ in the current directory, unless <project_dir> if mentioned. With option ‘--path’, it returns path in a clean way (i.e. no multiple slashes, no slash at the end, no ‘/./’, no spaces or tabs at the end or beginning). With option ‘--omit-newline’, return value without trailing newline.

set [<project_dir> [<project_dir>...] ]

source the project.sh configuration file variables into environment. Search ‘project.sh’ in the current directory, unless <project_dir> if mentioned.

To drop the whole project simply delete the folder recursively.

Synopsis

evl project
  ( new | sample | set ) <project_dir>... [-v|--verbose]

evl project
  get <variable_name> [<project_dir>]
  [--path] [--omit-newline] [-v|--verbose]

evl project
  ( --help | --usage | --version )

Options

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

-v, --verbose

print to stderr info/debug messages of the component

--version

print version and exit

Examples

  1. To create three main projects with couple of subprojects:
    # shared to all projects
    evl project new shared
    
    evl project new stage       # shared stuff only for "stage" projects
    evl project new stage/sap stage/tap stage/erp stage/signaling
    
    evl project new dwh         # shared stuff only for "dwh" projects
    evl project new dwh/usage dwh/billing dwh/party dwh/contract dwh/product
    
    evl project new mart        # shared stuff only for "mart" projects
    evl project new mart/marketing mart/sales
    
  2. To create new project with sample data, jobs and workflows:
    evl project sample my_sample
    
  3. To get the project path to log directory (i.e. EVL_PROJECT_LOG_DIR):
    evl project get --path EVL_PROJECT_LOG_DIR
    
  4. To set the project variables into environment:
    evl project set stage/sap
    

which simply do this:

source stage/sap/project.sh

evl run

EVL Run command can be run from commandline and within an EVL workflow

evl run

standalone commandline invocation

Run

within a workflow, i.e. in an ‘EVS’ file.

In both cases it runs EVL task, i.e. <job>, <workflow>, any Bash <script>, or wait for a file with <file_mask> to be delivered.

Type of the task is recognized by a file suffix:

*.evl

suppose an EVL Job, either full path or relative to project directory ‘run/’ subfolder

*.ewf

suppose an EVL Workflow, either full path or relative to project directory ‘workflow/’ subfolder

*.sh

suppose a Bash script, either full path or relative to project directory ‘run/’ subfolder

any other

suppose a file mask for which to wait to be delivered

Once need to wait for a file with suffix ‘*.evl’, ‘*.ewf’ or ‘*.sh’, ‘--file’ option need to be used.

If more than one task is provided, then run them one after another, once one fail, then whole command fails.

<time> can be secified in seconds, minutes, hours or days, so suffix s, m, h or d need to be specified to the number. If no unit is specified, seconds are assumed.

Synopsis

Run
  ( [<time>[smhd]] [<retries>r] ( <job>|<workflow>|<script>|<file_mask> ) )...
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>]

evl run
  ( [<time>[smhd]] [<retries>r] ( <job>|<workflow>|<script>|<file_mask> ) )...
  [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>]
  [-s|--progress] [-v|--verbose]

evl run
  ( --help | --usage | --version )

Options

--file

using this option all files are supposed to be a file masks, i.e. not a job, workflow or script. Useful only in case a file has suffix ‘evl’, ‘ewf’ or ‘sh’.

-o, --odate=<yyyymmdd>

run evl job with specified Order Date, environment variable ‘EVL_ODATE’ is ignored

-p, --project=<project_dir>

specify project folder if not the current working one

-s, --progress

show the number of records passed each component, refreshed every ‘EVL_PROGRESS_REFRESH_SEC’ (default is 2 seconds)

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

-v, --verbose

print to stderr info/debug messages of the component

--version

print version and exit

Examples

  1. To run an EVL job in an EVL Workflow (i.e. within an ‘ews’ file), and try once more when job fails:
    Run 1r aggreg_invoices.evl
    
  2. Following invocation means to check existence of ‘/data/landing/invoices_????-??-??.csv’ regularly (by default every 5 minutes), and once such file appears, fire a job ‘run/stage.invoices.evl’:
    Run 2h /data/landing/some_????-??-??.csv stage.invoices.evl
    

    When no file with such mask appear within 2 hours, then fail.

  3. Commandline invocation. To run an EVL job ‘run/aggreg_invoices.evl’ with yesterday ‘ODATE’ showing progress:
    evl run aggreg_invoices.evl --odate=yesterday --progress
    

evl workflow

EVL Workflow consists of ‘Run’ components, which are used in EWS workflow structure definition file, and which fires EVL jobs or other EVL workflows or wait for a file with given file mask. For details about this componet, see ‘man evl-run’.

EWS’ is EVL workflow structure file (workflow template), for details see ‘man 5 evl-ews’.

EWF’ is EVL worflow definition file (a workflow), for details see ‘man 5 evl-ewf’.

Arguments

run

run <workflow> with Order Date (‘ODATE’) equal to <odate>. In case that workflow with given ‘ODATE’ has been started in the past, it will fail. Use ‘continue’ or ‘restart’ in such cases. This command is intended to be scheduled by ‘crontab’ for example.

continue

continue <workflow> with Order Date equal <odate> from last failed step, i.e. do not run again already successfully finished steps. This command is useful for usual manual restart from failed point.

restart

restart whole <workflow> (with given ODATE) from the beginning, no matter what is the status of the workflow. Use this command with care, normally not to be used in production environment.

Order Date

is a date for which the data are being processed. Every workflow has to be run with some <odate>. When no <odate> is specified, then current date is used. An <odate> can be of any form that standard GNU/Linux command ‘date’ can recognize as a date. Recommended is however to use format ‘YYYYMMDD’ or ‘yesterday’.

Synopsis

syntax/workflow
evl
  ( run | continue | restart ) <workflow>...
  [-o|--odate=<odate>] [-p|--project=<project_dir>]
  [-s|--progress] [-v|--verbose]

evl workflow
  ( --help | --usage | --version )

Options

-o, --odate=<odate>

run evl workflow with specified <odate>, environment variable ‘EVL_ODATE’ is ignored

-p, --project=<project_dir>

specify project folder if not the current working one

-s, --progress

it shows the states of each component, refreshed every ‘$EVL_PROGRESS_REFRESH_SEC’ seconds. (2 seconds by default.)

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

-v, --verbose

print to stderr info/debug messages of the component

--version

print version and exit

Commands

EVL workflow structure file (‘*.ews’ file) is resolved as Bash script. Following EVL commands can be used, see ‘man evl-<command>’ for details.

cancel

cancel running EVL task

Chmod

change file permissions, handle also HDFS

Cp

copy files, handle also HDFS

crontab update

create/update crontab entries for EVL workflows based on crontab.sh

end

end up an EVL job or workflow structures (‘EVS’ or ‘EWS’ files)

Fr

File Register logging, i.e. register files, mark them as processed and/or move them to archive directory.

log

get detail information about runs of given EVL task

Ls

list directory contents, handle also HDFS and S3

Mail

send an e-mail

manager

EVL Manager commands

Mkdir

create directory, handle also HDFS

Mv

move (rename) files (handle also HDFS, GS, S3, SFTP)

password

EVL password handling

Rm

remove files or directories (handle also HDFS, GS, S3, SFTP)

set

set status to given EVL task

skip

skip run of given EVL task

Sleep

run previously defined EVL tasks and delay for a specified amount of time

Snmp

send a SNMP trap message

Spark

run the Scala Spark jar file or build jar file from specified source

status

print the status of a given EVL task

Test

check file types and existence, handle also hdfs

Wait

split pieces of EVL job or workflow into stages

Run component

EVL workflow structure file (EWS file) is resolved as Bash script. Next to Commands above, which are run immediately, there is a ‘Run’ component which is just parsed, but fired later once ‘Wait’ or ‘End’ command is reached.

For details see ‘man evl-run’.