EVL

Table of Contents


Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Copyright © 2017–2020 EVL Tool, s.r.o.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Read

Read <file>(s) (file mask can be specified) and sends it to output <f_out>. <file>s are concatenated.

Besides below mentioned options, which changes file suffix behaviour, one can use generic ‘--cmd=<cmd>’ option, which calls ‘echo <file>... | xargs <cmd>’ to obtain the input for this component. <cmd> can be also a pipeline (that is the reason for xargs). See examples below for inspiration.

Read

is to be used in EVS job structure definition file. F_OUT is either output file or flow name.

evl read

is intended for standalone usage, i.e. to be invoked from command line and and write to standard output.

EVD is EVL data definition file, for details see evl-evd(5).

Compressed file suffix behaviour (applied by following the order):

*.tgz, *.tar.gz  calls ``tar -zxO``
*.tar.Z          calls ``tar -ZxO``
*.tar.bz2        calls ``tar -jxO``
*.tar            calls ``tar -xO``
*.gz, *.GZ, *.Z, *.zip, *.bz2  calls ``gunzip -c``

Read component behaves according to the last FILE’s suffix.

Specific file formats suffix behaviour:

*.avro         calls ``evl readavro``
*.json         calls ``evl readjson``
*.parquet      calls ``evl readparquet``
*.xml          calls ``evl readxml``

Synopsis

Read
  <file>... <f_out> (<evd>|-d <inline_evd>)
  [--footer=<n>] [--header=<n>]
  [ --avro |
    --json [--all-fields-exist] [--match-fields] |
    --parquet |
    --xml [--all-fields-exist] [--match-fields]
          [--document-tag=<tag>] [--record-tag=<tag>] [--vector-element-tag=<tag>]
  ]
  [--gz] [--tar] [--cmd=<cmd>] [--ignore-suffix]
  [-v|--validate] [-x|--text-input] [-y|--text-output]
  [--text-input-dos-eol] [--text-output-dos-eol]
  [--text-input-mac-eol] [--text-output-mac-eol]

evl read
  <file>... (<evd>|-d <inline_evd>)
  [--footer=<n>] [--header=<n>]
  [ --avro |
    --json [--all-fields-exist] [--match-fields] |
    --parquet |
    --xml [--all-fields-exist] [--match-fields]
          [--document-tag=<tag>] [--record-tag=<tag>] [--vector-element-tag=<tag>]
  ]
  [--gz] [--tar] [--cmd=<cmd>] [--ignore-suffix]
  [-v|--validate] [-x|--text-input] [-y|--text-output]
  [--text-input-dos-eol] [--text-output-dos-eol]
  [--text-input-mac-eol] [--text-output-mac-eol]
  [--verbose]

evl read
  ( --help | --usage | --version )

Options

Standard options:

-d, --data-definition=<inline_evd>

either this option or the file <evd> must be presented. Example: -d ’user_sum long’

-f, --footer=<n>

skip last <n> records. When multiple files, skip last <n> records in each of them. Command ‘evl head -n-<n> --skip-parse’ is used for this job.

-h, --header=<n>

skip first <n> records. When multiple files, skip first <n> records in each of them. Command ‘evl tail -<n>+(N+1) --skip-parse’ is used for this job.

--validate

without this option, no fields are checked against data types. With this option, all output fields are checked

-x, --text-input

suppose the input as text, not binary

--text-input-dos-eol

suppose input as text with CRLF

--text-input-mac-eol

suppose input as text with CR

-y, --text-output

write the output as text, not binary

--text-output-dos-eol

produce output as text with CRLF

--text-input-mac-eol

produce output as text with CR

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

-v, --verbose

print to stderr info/debug messages of the component

--version

print version and exit

Options changing file suffix behaviour:

--avro

whatever <file>’s suffix, act as reading ‘avro’ file format

--cmd=<cmd>

bash command <cmd> is used to read the <file>s. In such case recognizing file’s suffix is switched off. See examples below for inspiration.

--gz

whatever <file>’s suffix, act as reading ‘gz’, ‘Z’, ‘zip’, ‘bz2’ compressed file format

--ignore-suffix

ignore <file>’s suffix, act only based on options.

--json

whatever <file>’s suffix, act as reading ‘json’ file format

--parquet

whatever <file>’s suffix, act as reading ‘parquet’ file format

--tar

whatever <file>’s suffix, act as reading tar file

--xml

whatever <file>’s suffix, act as reading ‘xml’ file format

XML and JSON specific options:

--all-fields-exist

for other then XML and JSON file is this option ignored.

--match-fields

for other then XML and JSON file is this option ignored.

XML specific options:

--document-tag=<tag>

for other then XML file is this option ignored. Check ‘man evl readxml’ for details.

--record-tag=<tag>

for other then XML file is this option ignored. Check ‘man evl readxml’ for details.

--vector-element-tag=<tag>

for other then XML file is this option ignored. Check ‘man evl readxml’ for details.

Examples

Standard examples of standalone usage:

  1. Read tar.gz, skip header line and validate data types Write into ‘example.csv’ the content of the tared and gzipped source without the header line and with validated data types:
    evl read -d 'id int sep=";", value string sep="\n"' \
             -h1 -vxy <example.csv.tar.gz >example.csv
    
  2. Gzipped xml file –xml option has to be mentioned here as the file doesn’t end to ‘.xml’. But –gz option is not necessary as gunzip is applied automatically:
    evl read sample.xml.gz sample.evd --xml -xy >sample.csv
    

    And basically the same as previous invocation:

    gunzip -c sample.xml.gz | evl read --xml sample.evd -xy > sample.csv
    

Standard examples of usage in evs file:

  1. Gzipped xml file The same as example 2., but to be used in evs file:
    Read   sample.xml.gz SOURCE_XML sample.evd --xml --text-input
    Write  SOURCE_XML    sample.csv sample.evd       --text-output