EVL – ETL Tool


Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Copyright © 2017–2022 EVL Tool, s.r.o.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Table of Contents

Ls

(since EVL 2.0)

When argument contains ‘hdfs://’, then it is supposed to be on HDFS file system and calls the function ‘evl_hdfs_ls’, which is by default ‘hadoop fs -ls’.

When argument contains ‘s3://’, then it is supposed to be on S3 file system and calls the function ‘evl_s3_ls’, which is by default ‘aws s3 ls’.

Otherwise act as usual ‘ls’ command.

Ls

is to be used in EVS job structure definition file or in EWS workflow structure definition.

evl ls

is intended for standalone usage, i.e. to be invoked from command line.

Synopsis

Ls
  [-dhlqrRStu] (hdfs://<path> | <local_path> )...

Ls
  [-hR] s3://<bucket>/<path>

evl ls
  [-dhlqrRStu] (hdfs://<path> | <local_path> )...

evl ls
  [-hR] s3://<bucket>/<path>

evl ls
  ( --help | --usage | --version )

Options

-d, --directory

list directories themselves, not their contents

-h, --human-readable

print human readable sizes (e.g. 1K, 234M or 2G)

-l

use a long listing format, for HDFS means ‘hdfs dfs -ls

-q, --hide-control-chars

print ? instead of nongraphic characters

-r, --reverse

reverse order while sorting

-R, --recursive

list subdirectories recursively

-S

sort by file size, largest first

-t

sort by modification time, newest first

-u

with -lt: sort by, and show, access time; with -l: show access time and sort by name; otherwise: sort by access time, newest first

Examples

  1. These simple examples write result on stdout:
    Ls hdfs:///some/path/????-??-??.csv
    Ls s3:///somebucketname/path/
    Ls /some/local/machine/path/*
    
  2. To be used to initiate a flow in EVL job:
    INPUT_FILES=/data/input
    Run   ""    INPUT  "Ls $INPUT_FILES"
    Map   INPUT ...
    ...
    

And then, for PROD environment, input files would be defined for example:

INPUT_FILES=hdfs:///data/input