EVL

Table of Contents


Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Copyright © 2017–2020 EVL Tool, s.r.o.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Write

Write <f_in> into a <file>.

It handles various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘QVX’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘tar’, ‘bz2’, ‘zip’, ‘Z’) and URI Scheme for files (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’).

Besides below mentioned options, which changes file suffix behaviour, one can use generic ‘--cmd=<cmd>’ option, which calls something like ‘| <cmd> > <file>’ at the end. <cmd> can be also a pipeline. See examples below for inspiration.

Write

is to be used in EVS job structure definition file. <f_out> is either output file or flow name.

evl write

is intended for standalone usage, i.e. to be invoked from command line and and write to standard output.

EVD is EVL data definition file, for details see evl-evd(5).

URI Scheme:

Based on the URI Scheme in the <file>, it calls appropriate utilities to write the files to the destination.

no scheme, ‘file://’,

suppose local filesystem

hdfs://

calls ‘hadoop fs’ utility

gs://

calls ‘gsutil’ utility

s3://

calls ‘aws s3’ utility

sftp://

calls ‘ssh’ utility

smb://

calls ‘smbclient’ utility

Compression:

Compression file suffix behaviour (applied by following the order):

*.bz2’, ‘*.BZ2

calls ‘bzip2 -c

*.gz’, ‘*.GZ

calls ‘gzip -c

*.zip’, ‘*.ZIP

calls ‘zip

File Type:

Write component behaves according to the <file> suffix.

Specific file formats suffix behaviour:

*.avro’, ‘*.AVRO

calls ‘evl writeavro

*.csv’, ‘*.CSV’, ‘*.txt’, ‘*.TXT

write file with ‘--text-output’ option, other than standard Unix end-of-line character (‘\n’) can be specified by option ‘--dos-eol’ or ‘--mac-eol

*.json’, ‘*.JSON

calls ‘evl writejson

*.parquet’, ‘*.parq’, ‘*.PARQUET’, ‘*.PARQ

calls ‘evl writeparquet

*.qvd’, ‘*.QVD

calls ‘evl writeqvd

*.qvx’, ‘*.QVX

calls ‘evl writeqvx

*.xlsx’, ‘*.XLSX

calls ‘evl writexlsx

*.xml’, ‘*.XML

calls ‘evl writexml

Synopsis

Write
  <f_in> <file> (<evd>|-d <inline_evd>) [--append]
  [--footer-file=<f_in>] [--header-file=<f_in> | -h|--header] 
  [ --avro |
    --json [--omit-null-fields] |
    --parquet |
    --qvd | --qvx
    --xlsx
    --xml [--document-tag=<tag>] [--record-tag=<tag>]
          [--vector-element-tag=<tag>] |
    -y|--text-output [--dos-eol] [--mac-eol]
  ]
  [--gz] [--cmd=<cmd>] [--ignore-suffix]
  [-x|--text-input] [--validate]

evl write
  <file> (<evd>|-d <inline_evd>) [--append]
  [--footer-file=<file>] [--header-file=<file> | -h|--header] 
  [ --avro |
    --json [--omit-null-fields] |
    --parquet |
    --qvd | --qvx
    --xlsx
    --xml [--document-tag=<tag>] [--record-tag=<tag>]
          [--vector-element-tag=<tag>] |
    -y|--text-output [--dos-eol] [--mac-eol]
  ]
  [--gz] [--cmd=<cmd>] [--ignore-suffix]
  [-x|--text-input] [--validate]
  [--verbose]

evl write
  ( --help | --usage | --version )

Options

Standard options:

-d, --data-definition=<inline_evd>

either this option or the file <evd> must be presented

--footer-file=<file>

add <file> after last written record

-h, --header

add header line with field names. Applicable only for text files (e.g. CSV) and XLSX file.

--header-file=<file>

add <file> before the first record

--validate

without this option, no fields are checked against data types. With this option, all output fields are checked

-x, --text-input

suppose the input as text, not binary

--dos-eol

suppose the output is text with CRLF as end of line

--mac-eol

suppose the output is text with CR as end of line

-y, --text-output

write the output as text, not binary

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

-v, --verbose

print to stderr info/debug messages of the component

--version

print version and exit

Options changing file suffix behaviour:

--avro

whatever file’s suffix, write the file in Avro file format

--cmd=<cmd>

bash command <cmd> is used to write into <file>. In such case recognizing file’s suffix is switched off. See examples below for inspiration.

--csv

whatever file’s suffix, write the file in as CSV using delimiters based on EVD (same as –text-output option)

--gz

whatever file’s suffix, use ‘gzip’ to compress the file

--ignore-suffix

ignore file’s suffix, act only based on options

--json

whatever file’s suffix, write the file as JSON

--parquet

whatever file’s suffix, write the file in Parquet columnar file format

--qvd

whatever file’s suffix, write the file as Qlik’s QVD file

--qvx

whatever file’s suffix, write the file as Qlik’s QVX file

--xml

whatever file’s suffix, write the file as XML

--xlsx

whatever file’s suffix, write the file as MS Excel sheet

XML specific options:

--document-tag=<tag>

for other than XML file is this option ignored. Check ‘man evl writexml’ for details.

--record-tag=<tag>

for other than XML file is this option ignored. Check ‘man evl writexml’ for details.

--vector-element-tag=<tag>

for other than XML file is this option ignored. Check ‘man evl writexml’ for details.

JSON specific options:

--omit-null-fields

for other than JSON file is this option ignored. Check ‘man evl writejson’ for details.

Examples

1. Standard examples of standalone usage. Write gzipped file with header and validated data

types

gzip ‘example.csv’ with validating data types and adding header line:

evl write -d 'id int sep=";", value string sep="\n"' -h -xy --validate \
    < example.csv > example.csv.gz