Sort
(since EVL 1.0)
Command takes records from stdin or <f_in>
, sort them via <key>
and write them to
stdout or <f_out>
. With the -u option it deduplicates the data. At the moment it uses only
traditional sort order (i.e. like LC_ALL=C), not national.
- Sort
-
is to be used in EVS job structure definition file.
<f_in>
and<f_out>
are either input and output file or flow name. - evl sort
-
is intended for standalone usage, i.e. to be invoked from command line and reading records from standard input and writing to standard output.
EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).
Synopsis
Sort <f_in> <f_out> (<evd>|-d <inline_evd) -k <key> [-u <unique-key> [-t|--keep-first] [--reject=<file>]] [-c|--check-sort] [-f|--file-storage] [-i|--ignore-case] [--validate] [-x|--text-input] [-y|--text-output] evl sort (<evd>|-d <inline_evd) -k <key> [-u <unique-key> [-t|--keep-first] [--reject=<file>]] [-c|--check-sort] [-f|--file-storage] [-i|--ignore-case] [--validate] [-x|--text-input] [-y|--text-output] [-v|--verbose] evl sort ( --help | --usage | --version )
Options
- -c, --check-sort
-
only check if the input is sorted and fail if not
- -d, --data-definition=<inline_evd>
-
either this option or the file <evd> must be presented. Example: -d ’id int, user_id string(6) enc=iso-8859-1’
- -f, --file-storage
-
store temporary files on disk instead of using memory
- -i, --ignore-case
-
ignore case sensitivity for key fields
- -k, --key=<key>
-
sort via a key, where <key> is comma separated list of fields with type (default type is ASC). Example: -k ’id,user_id DESC,modify_dt ASC’
- -r, --reject=<reject_file>
-
being used with option -u it catch duplicated records into <reject_file>
- -t, --keep-first
-
when deduplicate by –unique-key, keep the first record from the group
- -u, --unique-key=<unique_key>
-
deduplicate the output via <unique_key>; take only the last value unless –keep-first is specified. Duplicated records are catched by -r option. Example: -u ’id,user_id’
- --validate
-
without this option, no fields are checked against data types. With this option, all output fields are checked
- -x, --text-input
-
suppose the input as text, not binary
- -y, --text-output
-
write the output as text, not binary
Standard options:
- --help
-
print this help and exit
- --usage
-
print short usage information and exit
- -v, --verbose
-
print to stderr info/debug messages of the component
- --version
-
print version and exit
Examples
Sort via the whole record (i.e. according to all fields) the text input and write into text output file:
evl sort example.evd -k '' -xy <in.txt >out.txt
Deduplicate the binary input (for example from another EVL component) by keeping the first record in each group with the same id (with the lowest updated date) and write the result into output.csv and duplicates into duplicates.csv:
cat input.bin | \ evl sort -ty -k'd,updated' -u'id' \ -d'id int sep=",", updated date sep="\n"' -r duplicates.csv >output.csv
Check sort (being case insensitive) of input text file input.txt and write into file output.bin in binary (i.e. not as text):
evl sort -cix -k'name' -d'name string sep="|", personal_id int sep="\n"' \ <input.txt >output.bin