EVL

Table of Contents


Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Copyright © 2017–2020 EVL Tool, s.r.o.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Uniq

(since EVL 2.1)

Read stdin or <f_in> and write to stdout or <f_out> last record in the group specified by the <key>. The input must be sorted according to this key.

Uniq

is to be used in EVS job structure definition file. <f_in> and <f_out> are either input and output file or flow name.

evl uniq

is intended for standalone usage, i.e. to be invoked from command line and reading records from standard input and writing to standard output.

EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).

Synopsis

syntax/Uniq
Uniq
  <f_in> <f_out> (<evd>|-d <inline_evd>) -k <key> [-c|--check-sort]
  [-i|--ignore-case] [--reject=<file>] [-t|--keep-first]
  [--validate] [-x|--text-input] [-y|--text-output]

evl uniq
  [<evd>] -k <key> [-c|--check-sort]
  [-i|--ignore-case] [--reject=<file>] [-t|--keep-first]
  [--validate] [-x|--text-input] [-y|--text-output]
  [-v|--verbose]

evl uniq
  ( --help | --usage | --version )

Options

-c, --check-sort

check if the input is sorted and fail if not

-d, --data-definition=<inline_evd>

either this option or the file <evd> must be presented. Example: -d ’id int, user_id string(6) enc=iso-8859-1’

-i, --ignore-case

ignore case sensitivity for key fields

-k, --key=<key>

deduplicate via a key, where <key> is comma separated list of fields with type (default type is ASC). Example: -k ’id,user_id DESC,modify_dt ASC’

-r, --reject=<reject_file>

being used with option -u it catch duplicated records into <reject_file>

-t, --keep-first

keep the first record of the group instead of the last one

--validate

without this option, no fields are checked against data types. With this option, all output fields are checked

-x, --text-input

suppose the input as text, not binary

-y, --text-output

write the output as text, not binary

Standard options:

--help

print this help and exit

--usage

print short usage information and exit

-v, --verbose

print to stderr info/debug messages of the component

--version

print version and exit

Examples

  1. Uniq via the all fields and write into text output file:
    evl uniq example.evd -k'' -xy < in.txt > out.txt
    
  2. Deduplicate the binary input (for example from another EVL component) by keeping the first record in each group with the same id (with the lowest updated date) and write the result into output.csv and duplicates into duplicates.csv:
    cat input.bin | evl uniq -ty -k'id,updated' -u'id' \
         -d'id int sep=",", updated date sep="\n"' \
         -r duplicates.csv > output.csv
    
  3. Check uniq (being case insensitive) of input text file input.txt and write into file output.bin in binary (i.e. not as text):
    evl uniq -cix --key="name" \
             -d 'name string sep="|", personal_id int sep="\n"' \
             < input.txt > output.bin