Uniq
(since EVL 2.1)
Read stdin or <f_in>
and write to stdout or <f_out>
last record in the group
specified by the <key>
. The input must be sorted according to this key.
- Uniq
-
is to be used in EVS job structure definition file.
<f_in>
and<f_out>
are either input and output file or flow name. - evl uniq
-
is intended for standalone usage, i.e. to be invoked from command line and reading records from standard input and writing to standard output.
EVD and EVS are EVL definition files, for details see evl-evd(5) and evl-evs(5).
Synopsis
Uniq <f_in> <f_out> (<evd>|-d <inline_evd>) -k <key> [-c|--check-sort] [-i|--ignore-case] [--reject=<file>] [-t|--keep-first] [--validate] [-x|--text-input] [-y|--text-output] evl uniq [<evd>] -k <key> [-c|--check-sort] [-i|--ignore-case] [--reject=<file>] [-t|--keep-first] [--validate] [-x|--text-input] [-y|--text-output] [-v|--verbose] evl uniq ( --help | --usage | --version )
Options
- -c, --check-sort
-
check if the input is sorted and fail if not
- -d, --data-definition=<inline_evd>
-
either this option or the file <evd> must be presented. Example: -d ’id int, user_id string(6) enc=iso-8859-1’
- -i, --ignore-case
-
ignore case sensitivity for key fields
- -k, --key=<key>
-
deduplicate via a key, where <key> is comma separated list of fields with type (default type is ASC). Example: -k ’id,user_id DESC,modify_dt ASC’
- -r, --reject=<reject_file>
-
being used with option -u it catch duplicated records into <reject_file>
- -t, --keep-first
-
keep the first record of the group instead of the last one
- --validate
-
without this option, no fields are checked against data types. With this option, all output fields are checked
- -x, --text-input
-
suppose the input as text, not binary
- -y, --text-output
-
write the output as text, not binary
Standard options:
- --help
-
print this help and exit
- --usage
-
print short usage information and exit
- -v, --verbose
-
print to stderr info/debug messages of the component
- --version
-
print version and exit
Examples
- Uniq via the all fields and write into text output file:
evl uniq example.evd -k'' -xy < in.txt > out.txt
- Deduplicate the binary input (for example from another EVL component) by keeping the first record
in each group with the same id (with the lowest updated date) and write the result into output.csv
and duplicates into duplicates.csv:
cat input.bin | evl uniq -ty -k'id,updated' -u'id' \ -d'id int sep=",", updated date sep="\n"' \ -r duplicates.csv > output.csv
- Check uniq (being case insensitive) of input text file input.txt and write into file output.bin in
binary (i.e. not as text):
evl uniq -cix --key="name" \ -d 'name string sep="|", personal_id int sep="\n"' \ < input.txt > output.bin