Main EVL Command
All command line EVL functionality is handled by the main command ‘evl’, which runs EVL jobs or workflows or serves other EVL subcommands.
The most common usage is to run a job or workflow in the short way:
- evl run/<job>.evl
-
is just a shortcut to:
evl run run/<job>.evl
- evl workflow/<workflow>.ewf
-
is just a shortcut to:
evl run workflow/<workflow>.ewf [--odate=<yyyymmdd>] \ [-s|--progress]
Full versions of ‘evl’ command invocation are then:
- evl run ( run/<job>.evl | <script>.sh )
-
run specified EVL
<job>
, or any shell<script>
. For details, check ‘man evl-run’. - evl ( run | continue | restart | skip | status ) workflow/<workflow>.ewf
-
For details, check ‘man evl-workflow’.
- evl project
-
handle EVL projects, like create new or sample one, source variables from project.sh, or get particular project variables. For details, check ‘man evl-project’.
- evl <evl_command>
-
it calls particular EVL command/component, like ‘sort’, or ‘readjson’. All possible EVL components or commands are listed below and each has its own man page which explain usage and arguments. To see man page for a command, run ‘man evl <evl_command>’.
- evl init
-
to initiate EVL installation under your (i.e. non-root) user. To be run only once for each user.
Usage
evl ( run/<job>.evl | workflow/<workflow>.ewf ) [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress] evl run <job>.evl [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress] evl ( run | continue | restart | skip | status ) <workflow>.ewf [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress] evl run <script>.sh [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [<option>...] evl ( <evl_command> | <evl_component> ) [<option>...] evl project ( new | sample | get | set ) <project_dir> evl --expiration-date evl [ <evl_command> | init | project | run | continue | restart | skip | status ] ( --help | --usage | --version )
Examples
- To run a job with yesterday Ordering Date:
evl run/staging.invoices.evl --odate=yestarday
- To run a workflow with yesterday Ordering Date:
evl workflow/staging.ewf --odate=yestarday
Options
Run job/workflow options:
- -o, --odate=<yyyymmdd>
-
run evl job/workflow with specified Ordering Date, environment variable ‘EVL_ODATE’ is ignored
- -s, --progress
-
in EVL job it shows the number of records passed each component, in EVL workflow it shows the states of each component. Refreshed every ‘EVL_PROGRESS_REFRESH_SEC’. By default it is 2 seconds.
- -p, --project=<project_dir>
-
specify project folder if not the current working one
Commands for base and mapping components:
- aggreg
-
aggregate (and map) records by key
- assign
-
assign the content of input flow or file into specified variable
- cat
-
concatenate flows or files
- cmd
-
run any system command with possibility to connect to flow
- comp
-
run custom EVL component
- cut
-
remove columns from input flow or file
- depart
-
gather or merge partitioned flows or files into one partition
- echo
-
write an argument into output flow or file
- filter
-
split flows according to a condition or just filter records out
- gather
-
gather multiple flows or files into one in round-robin fashion
- generate
-
create artificial records
- head
-
output the first part of input flow or file
- join
-
join sorted inputs
- lookup
-
create and remove shared lookup
- map
-
generic mapping
- merge
-
merge sorted inputs (by keeping the sort)
- partition
-
partition input flow or file
- sort
-
sort (and possibly deduplicate) records of input flow or file
- sortgroup
-
sort input flow or file within a group
- tac
-
write flow or file in reverse
- tail
-
output the last part of input flow or file
- tee
-
replicate input flow or file
- trash
-
send flow(s) to /dev/null
- uniq
-
deduplicate sorted input flow or file
- validate
-
check data types and possibly filter out invalid records
- watcher
-
catch flow content into text file, debugging purpose
Commands for read components:
- read
-
generic file reader, handle various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘xls’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘tar’, ‘bz2’, ‘zip’, ‘Z’) and URI Scheme for files (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’)
- readpg
-
read Postgresql table into flow or file
- readtd
-
read Teradata table into flow or file
- readjson
-
parse JSON input
- readkafka
-
consume Kafka topic
- readora
-
read Oracle table into flow or file
- raedparquet
-
read Parquet files
- readqvd
-
read and parse QVD (QlikView, Qlik Sense) file
- readxls
-
read XLS (MS Excel) sheet
- readxlsx
-
read XLSX (MS Excel) sheet
- readxml
-
parse XML input
Commands for write components:
- write
-
generic file writer, handle various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘QVX’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘bz2’, ‘zip’) and URI Scheme for files (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’)
- writepg
-
write flow or file into PostgreSQL table
- writetd
-
write flow or file into Teradata table
- writejson
-
write input as JSON
- writekafka
-
produce Kafka topic
- writeora
-
write flow or file into Oracle table
- writeparquet
-
write flow or file into Parquet files
- writeqvd
-
write flow or file into QVD (QlikView, Qlik Sense) file
- writeqvx
-
write flow or file into QVX (QlikView, Qlik Sense) file
- writexlsx
-
write flow or file into XLSX (MS Excel) files
- writexml
-
write input as XML
Standard options:
- --help
-
print this help and exit
- --usage
-
print short usage information and exit
- --version
-
print version and exit
- --expiration-date
-
return an expiration date of this version of EVL, empty output means no expiration
Environment
The list of all EVL variables with their default values. One can change these values in his ‘~/.evlrc’ file or in the project in ‘project.sh’.
- EVL_BUILD_COMP=1
-
whether to build the job every time it runs or not. In production it is mostly safe to set to ‘0’, so the job is then built only the first time, and then only if the source files changed.
- EVL_COLOURS=1
-
terminal output use colours, but in the case that it cause troubles, one can switch it off by setting environment variable ‘EVL_COLOURS=0’
- EVL_COMPILER=gcc
-
mappings are compiled either by GCC or Clang. By this variable one can specify which one to use. Possible valus are:
EVL_COMPILER=gcc EVL_COMPILER=clang
If this variable is not set, then on Linux systems is GCC used by default, and on Windows and Mac it is Clang.
GCC must be at least in the version 7.4 and Clang at least 6.0.
- EVL_COMPILER_PATH
-
path to GCC’s or Clang’s ‘bin’, ‘include’, ‘lib’ and ‘lib64’ folder. Leave empty to use system-wide GCC/Clang.
- EVL_DEBUG_LEVEL=4
-
specify number between 0 and 9 to say how detailed evl debug messages should be. Higher number means more detailed, ‘0’ means no debug messages at all, ‘9’ means maximum allowed level with lots of messages.
- EVL_DEBUG_FAIL_RECORD_NUMBER=2
-
the number of records to show when fail with ‘EVL_DEBUG_MODE=1’
- EVL_DEBUG_MODE=0
-
if set to 1, then it checks if you try to assign NULL value into not nullable field, and provide the most recently processed records in case of a failure. But it slows down the processing, so use only in developmnet or switch on temporarily in production in the case of investigation data problems.
- EVL_DEFAULT_FIELD_SEPARATOR='|'
-
when no ‘sep=’ attribute for a field in EVD file, use this character instead. This character might be any one of the first 128 ascii ones.
- EVL_DEFAULT_RECORD_SEPARATOR
-
when no ‘sep=’ attribute for the last field in EVD file, use this character instead. This character might be any one of the first 128 ascii ones. By default a Linux newline is used:
EVL_DEFAULT_RECORD_SEPARATOR=$'\n'
but to use Windows end of line (i.e. ‘\r\n’), use components’ options ‘--text-input-dos-eol’ and/or ‘--text-output-dos-eol’.
- EVL_ENV=DEV
-
to specify an environment, usually one of ‘DEV’, ‘TEST’ or ‘PROD’.
- EVL_ENVSUBST_EVM=1
-
whether to replace ‘$...’ and ‘${...}’ in ‘EVM’ mapping files by environment variables (by envsubst utility)
- EVL_FASTEXPORT_SLEEP, EVL_FASTEXPORT_TENACITY, EVL_FASTEXPORT_SESSIONS
-
Teradata FastExport options.
- EVL_FASTLOAD_ERROR_LIMIT, EVL_FASTLOAD_SESSIONS
-
Teradata FastLoad options.
- EVL_FR=1
-
if set to 0, then EVL File Register is not used, only provide debug messages, but does nothing.
- EVL_FR_LOG_FILE
-
file to be used for storing information for EVL File Register.
- EVL_KAFKA_CONSUMER_COMMAND, EVL_KAFKA_PRODUCER_COMMAND
-
paths to Kafka consumer and producer commands.
- EVL_LOG_PATH="$HOME/evl-log"
-
path to logs from job and workflow runs. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.
- EVL_MAIL_SEND=1
-
send e-mails by default in the case of fails in a workflows or by the commmand Mail. To switch off, for example in non-production environments, set ‘EVL_MAIL_SEND=0’.
- EVL_MONITOR_SQLITE_PATH="$EVL_LOG_PATH"
-
path to SQLite database which is necessary for EVL Manager. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.
- EVL_NICE=1
-
each EVL command and component is fired prefixed by:
eval nice -n $EVL_NICE
To change the priority of EVL processes, to have EVL jobs "nicer", one can set ‘EVL_NICE’ to the value between 0 and 19. Higher number means that processes will have lower priority. For details one can check ‘man nice’.
- EVL_ODATE
-
when no ‘--odate=’ option is used when running a job or workflow, it tries to use an Ordering Date from this variable. So calling:
evl run/some_job.evl --odate=20200628
is the same as:
export EVL_ODATE=20200628 evl run/some_job.evl
- EVL_PARTITIONS
-
to specify how many partitions to use in ‘Partition’ component. This EVL installation allows at most ‘1024’ partitions.
- EVL_PROGRESS_REFRESH_SEC=2
-
when ‘--progress’ option is used, it refresh the state every 2 seconds by default. To change this default, set this variable to other number of seconds. Possible range is from 1 to 30.
- EVL_PROJECT_LOG_DIR
-
by default project’s log directory is set to:
EVL_PROJECT_LOG_DIR="$EVL_LOG_PATH/<project_name>"
- EVL_PROJECT_TMP_DIR
-
by default project’s temporary directory is set to:
EVL_PROJECT_TMP_DIR="$EVL_TMP_PATH/<project_name>"
- EVL_RUN_ID_FILE
-
path to file which stores incremental ‘RUN_ID’, a unique ID of each job or workflow run. It is unique within a project. By default it is:
EVL_RUN_ID_FILE="$EVL_PROJECT_LOG_DIR/evl_run_id.hwm"
- EVL_TMP_PATH="/tmp"
-
path to (local) temporary directory, to be used by jobs and workflows. Situate this folder on the same mount point as data will be, to make ‘mv’ command fastest as possible. The default is set in ‘/opt/EVL-2.4/etc/evlrc’.
- EVL_WATCHER=0
-
whether or not the component ‘Watcher’ is silent. In production this would be usually set to ‘0’, but in development, if ‘Watcher’ is used to investigate interim data, it is fine to set to ‘1’. Check ‘man evl-watcher’ for more details.
evl project
<project_dir> is the name of the directory with some EVL project. Either full or relative path can be specified. Last folder in the <project_dir> path is considered as project name. Prefer to use small letters for the project name, however numbers, capital letters, underscore and dash are possible.
Projects can be included into another projects. But remember that parent’s project.sh is not automatically included (i.e. sourced) by subproject’s one.
- new <project_dir> [<project_dir>...]
-
create <project_dir> directory with standard subfolders structure and default project.sh configuration file.
- sample <project_dir> [<project_dir>...]
-
create <project_dir> directory with sample data and sample jobs and workflows.
- get [--path] <variable_name> [<project_dir>]
-
get the value of <variable_name>, based on the project.sh configuration file. Search ‘project.sh’ in the current directory, unless <project_dir> if mentioned. With option ‘--path’, it returns path in a clean way (i.e. no multiple slashes, no slash at the end, no ‘/./’, no spaces or tabs at the end or beginning).
- set [<project_dir> [<project_dir>...] ]
-
source the project.sh configuration file variables into environment. Search ‘project.sh’ in the current directory, unless <project_dir> if mentioned.
To drop the whole project simply delete the folder recursively.
Synopsis
evl project ( new | sample | set ) <project_dir>... [-v|--verbose] evl project get [--path] <variable_name> [<project_dir>] [-v|--verbose] evl project ( --help | --usage | --version )
Options
Standard options:
- --help
-
print this help and exit
- --usage
-
print short usage information and exit
- --version
-
print version and exit
Examples
- To create three main projects with couple of subprojects:
# shared to all projects evl project new shared evl project new stage # shared stuff only for "stage" projects evl project new stage/sap stage/tap stage/erp stage/signaling evl project new dwh # shared stuff only for "dwh" projects evl project new dwh/usage dwh/billing dwh/party dwh/contract dwh/product evl project new mart # shared stuff only for "mart" projects evl project new mart/marketing mart/sales
- To create new project with sample data and jobs:
evl project sample my_sample
- To get the project path to log directory (i.e. EVL_PROJECT_LOG_DIR):
evl project get --path EVL_PROJECT_LOG_DIR
- To set the project variables into environment:
evl project set stage/sap
which simply do this:
source stage/sap/project.sh
evl run
EVL Run command can be run from commandline invocation and within an EVL workflow
- evl run ( <job>.evl | <workflow>.ewf | <script>.sh )...
-
standalone commandline usage
- Run ( [<time>[smhd]] [<retries>r] ( --file=<mask> | <job><script> ) )...
-
within a workflow, i.e. in an EVS file.
In both cases it runs <evl_job> or any Bash <script>. If more than one is provided, then run them one after another, once one fail, then whole command fails.
Synopsis
Run ( [<time>[smhd]] [<retries>r] ( --file=<mask> | <job>|<workflow>|<script> ) )... [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] evl run ( <job>.evl | <workflow>.ewf | <script>.sh )... [-o|--odate=<yyyymmdd>] [-p|--project=<project_dir>] [-s|--progress] [-v|--verbose] evl run ( --help | --usage | --version )
Options
- -o, --odate=<yyyymmdd>
-
run evl job with specified Ordering Date, environment variable ‘EVL_ODATE’ is ignored
- -s, --progress
-
show the number of records passed each component, refreshed every ‘$EVL_PROGRESS_REFRESH_SEC’. By default it is 2 seconds.
- -p, --project=<project_dir>
-
specify project folder if not the current working one
Standard options:
- --help
-
print this help and exit
- --usage
-
print short usage information and exit
- -v, --verbose
-
print to stderr info/debug messages of the component
- --version
-
print version and exit
Examples
- To run an EVL job with yesterday ‘ODATE’ showing progress:
evl run/aggreg_invoices.evl --odate=yesterday --progress
- To run an EVL job in an EVL Workflow (in an ‘ews’ file):
Run aggreg_invoices.evl
evl workflow
Run specified workflow <workflow>.ewf
with given Ordering DATE <odate>
.
An Ordering DATE is a date for which the data are being processed. Every workflow has to run with
some <odate>
. When no <odate>
is specified, then current date is used.
EVL workflow consists of components, which are used in EWS workflow structure definition file. ‘EWS’ is EVL workflow structure file (workflow template), for details see ‘man 5 evl-ews’.
Arguments
- run
-
run
<workflow>.ewf
with Ordering DATE (‘ODATE’) equal to<odate>
. In case that workflow with given ‘ODATE’ has been started in the past, it will fail. Use ‘continue’ or ‘restart’ in such cases. This command is intended to be scheduled by cron for example. - continue
-
continue
<workflow>.ewf
with Ordering DATE equal<odate>
from last failed step, i.e. do not run again already successfully finished steps. This command is useful for usual manual restart from failed point. - restart
-
restart whole
<workflow>.ewf
(with given ODATE) from the beginning, no matter what is the status of the workflow. Use this command with care, normally not to be used in production environment.
Ordering Date
An <odate>
can be of any form that standard GNU/Linux command ‘date’ can recognize as a
date. Recommended is however to use format ‘YYYYMMDD’ or ‘yesterday’.
Synopsis
evl ( run | continue | restart ) <workflow>.ewf [-o|--odate=<odate>] [-p|--project=<project_dir>] [-s|--progress] [-v|--verbose] evl ( run | continue | restart | workflow ) ( --help | --usage | --version )
Options
- -o, --odate=<odate>
-
run evl workflow with specified
<odate>
, environment variable ‘EVL_ODATE’ is ignored - -s, --progress
-
it shows the states of each component, refreshed every ‘$EVL_PROGRESS_REFRESH_SEC’ seconds. (2 seconds by default.)
- -p, --project=<project_dir>
-
specify project folder if not the current working one
Standard options:
- --help
-
print this help and exit
- --usage
-
print short usage information and exit
- -v, --verbose
-
print to stderr info/debug messages of the component
- --version
-
print version and exit
Commands
EVL workflow structure file (‘*.ews’ file) is resolved as Bash script. Following EVL commands can be used, see ‘man evl-<command>’ for details.
- Cp
-
copy files, handle also HDFS
- end
-
end up an EVL job or workflow structures (‘EVS’ or ‘EWS’ files)
- Fr
-
File Register logging, i.e. register files, mark them as processed and/or move them to archive directory.
- Ls
-
list directory contents, handle also HDFS and AWS S3
-
send an e-mail
- Mkdir
-
create directory, handle also HDFS
- Mv
-
move (rename) files (handle also HDFS, GS, S3, SFTP)
- Rm
-
remove files or directories (handle also HDFS, GS, S3, SFTP)
- Set
-
set a status to given EVL job or workflow or shell script
- Skip
-
skip run of a given EVL job or workflow
- Snmp
-
send a SNMP trap message
- Spark
-
run the Scala Spark jar file or build jar file from specified source
- Status
-
print the status of a given EVL job or workflow
- Test
-
check file types and existence, handle also hdfs
- Wait
-
split pieces of EVL job or workflow into stages
Run component
EVL workflow structure file (EWS file) is resolved as Bash script. Next to Commands above, which are run immediately, there is a ‘Run’ component which is just parsed, but fired later once ‘Wait’ or ‘End’ command is reached.
For details see ‘man evl-run’.