Write
Write <f_in>
into a <file>
.
It handles various file types (‘Avro’, ‘json’, ‘Parquet’, ‘QVD’, ‘QVX’, ‘xlsx’ and ‘xml’), compression (‘gz’, ‘tar’, ‘bz2’, ‘zip’, ‘Z’) and URI Scheme for files (‘file://’, ‘sftp://’, ‘hdfs://’, ‘s3://’, ‘gs://’, ‘smb://’).
Besides below mentioned options, which changes file suffix behaviour, one can use generic
‘--cmd=<cmd>’ option, which calls something like ‘| <cmd> > <file>’ at the end.
<cmd>
can be also a pipeline. See examples below for inspiration.
- Write
-
is to be used in EVS job structure definition file.
<f_out>
is either output file or flow name. - evl write
-
is intended for standalone usage, i.e. to be invoked from command line and and write to standard output.
EVD is EVL data definition file, for details see evl-evd(5).
URI Scheme:
Based on the URI Scheme in the <file>
, it calls appropriate utilities to write the files to
the destination.
- no scheme, ‘file://’,
-
suppose local filesystem
- ‘hdfs://’
-
calls ‘hadoop fs’ utility
- ‘gs://’
-
calls ‘gsutil’ utility
- ‘s3://’
-
calls ‘aws s3’ utility
- ‘sftp://’
-
calls ‘ssh’ utility
- ‘smb://’
-
calls ‘smbclient’ utility
Compression:
Compression file suffix behaviour (applied by following the order):
- ‘*.bz2’, ‘*.BZ2’
-
calls ‘bzip2 -c’
- ‘*.gz’, ‘*.GZ’
-
calls ‘gzip -c’
- ‘*.zip’, ‘*.ZIP’
-
calls ‘zip’
File Type:
Write component behaves according to the <file>
suffix.
Specific file formats suffix behaviour:
- ‘*.avro’, ‘*.AVRO’
-
calls ‘evl writeavro’
- ‘*.csv’, ‘*.CSV’, ‘*.txt’, ‘*.TXT’
-
write file with ‘--text-output’ option, other than standard Unix end-of-line character (‘\n’) can be specified by option ‘--dos-eol’ or ‘--mac-eol’
- ‘*.json’, ‘*.JSON’
-
calls ‘evl writejson’
- ‘*.parquet’, ‘*.parq’, ‘*.PARQUET’, ‘*.PARQ’
-
calls ‘evl writeparquet’
- ‘*.qvd’, ‘*.QVD’
-
calls ‘evl writeqvd’
- ‘*.qvx’, ‘*.QVX’
-
calls ‘evl writeqvx’
- ‘*.xlsx’, ‘*.XLSX’
-
calls ‘evl writexlsx’
- ‘*.xml’, ‘*.XML’
-
calls ‘evl writexml’
Synopsis
Write <f_in> <file> (<evd>|-d <inline_evd>) [--append] [--footer-file=<f_in>] [--header-file=<f_in> | -h|--header] [ --avro | --json [--omit-null-fields] | --parquet | --qvd | --qvx --xlsx --xml [--document-tag=<tag>] [--record-tag=<tag>] [--vector-element-tag=<tag>] | -y|--text-output [--dos-eol] [--mac-eol] ] [--gz] [--cmd=<cmd>] [--ignore-suffix] [-x|--text-input] [--validate] evl write <file> (<evd>|-d <inline_evd>) [--append] [--footer-file=<file>] [--header-file=<file> | -h|--header] [ --avro | --json [--omit-null-fields] | --parquet | --qvd | --qvx --xlsx --xml [--document-tag=<tag>] [--record-tag=<tag>] [--vector-element-tag=<tag>] | -y|--text-output [--dos-eol] [--mac-eol] ] [--gz] [--cmd=<cmd>] [--ignore-suffix] [-x|--text-input] [--validate] [--verbose] evl write ( --help | --usage | --version )
Options
Standard options:
- -d, --data-definition=<inline_evd>
-
either this option or the file
<evd>
must be presented - --footer-file=<file>
-
add
<file>
after last written record - -h, --header
-
add header line with field names. Applicable only for text files (e.g. CSV) and XLSX file.
- --header-file=<file>
-
add
<file>
before the first record - --validate
-
without this option, no fields are checked against data types. With this option, all output fields are checked
- -x, --text-input
-
suppose the input as text, not binary
- --dos-eol
-
suppose the output is text with CRLF as end of line
- --mac-eol
-
suppose the output is text with CR as end of line
- -y, --text-output
-
write the output as text, not binary
Standard options:
- --help
-
print this help and exit
- --usage
-
print short usage information and exit
- -v, --verbose
-
print to stderr info/debug messages of the component
- --version
-
print version and exit
Options changing file suffix behaviour:
- --avro
-
whatever file’s suffix, write the file in Avro file format
- --cmd=<cmd>
-
bash command
<cmd>
is used to write into<file>
. In such case recognizing file’s suffix is switched off. See examples below for inspiration. - --csv
-
whatever file’s suffix, write the file in as CSV using delimiters based on EVD (same as –text-output option)
- --gz
-
whatever file’s suffix, use ‘gzip’ to compress the file
- --ignore-suffix
-
ignore file’s suffix, act only based on options
- --json
-
whatever file’s suffix, write the file as JSON
- --parquet
-
whatever file’s suffix, write the file in Parquet columnar file format
- --qvd
-
whatever file’s suffix, write the file as Qlik’s QVD file
- --qvx
-
whatever file’s suffix, write the file as Qlik’s QVX file
- --xml
-
whatever file’s suffix, write the file as XML
- --xlsx
-
whatever file’s suffix, write the file as MS Excel sheet
XML specific options:
- --document-tag=<tag>
-
for other than XML file is this option ignored. Check ‘man evl writexml’ for details.
- --record-tag=<tag>
-
for other than XML file is this option ignored. Check ‘man evl writexml’ for details.
- --vector-element-tag=<tag>
-
for other than XML file is this option ignored. Check ‘man evl writexml’ for details.
JSON specific options:
- --omit-null-fields
-
for other than JSON file is this option ignored. Check ‘man evl writejson’ for details.
Examples
- 1. Standard examples of standalone usage. Write gzipped file with header and validated data
types
gzip ‘example.csv’ with validating data types and adding header line:
evl write -d 'id int sep=";", value string sep="\n"' -h -xy --validate \ < example.csv > example.csv.gz