EVL Anonymization

Table of Contents


Products, services and company names referenced in this document may be either trademarks or registered trademarks of their respective owners.

Copyright © 2017–2020 EVL Tool, s.r.o.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts.

Introduction

EVL Anonymization Microservice enables fast, automated and cost-effective anonymization of data sets. It can be used for anonymization of the production data according to GDPR requirements as well as for the protection of commercially sensitive data for developers, testers and other outside contractors.

EVL Anonymization belongs to the portfolio of other EVL Microservices, which provide fast, automated and cost-effective functionality for a specific business purpose.

../images/modules

Figure 1.1: EVL Microservices Overview

EVL Microservices are built on top of the core EVL software and retain its flexibility, robustness, high productivity, and ability to read/write various file formats and databases.

EVL

EVL originally stood for Extract–Validate–Load, but it’s now a fully featured ETL (Extract–Transform–Load) tool.

EVL is designed with the Unix philosophies of interoperability and “do one thing, and do it well” in mind.

Templates, a high level of abstraction, and the ability to dynamically create jobs, make for a powerful ETL tool.

Characteristics:

  • Versatile, i.e. cooperate with other components of the customer’s solution, solving only particular problem.
  • High performance, written in C++.
  • Lightweight, just install a rpm/deb package or unzip tgz.
  • No Drag’n’drop, but still graphical (EVL Manager), highly efficient development.
  • Managed access to the source code (e.g. git).
  • Linux only, using the best of the system.

Features:

  • Natively read/write1:
    • File formats: CSV, JSON, XML, XLS, XLSX, Parquet, Avro, QVD/QVX, ASN.1
    • DBMS: Teradata, Oracle, PostgreSQL, SQLite, ODBC, (near future: Maria/MySQL, Snowflake, Redshift)
    • Cloud storages: Amazon S3 and Google Storage
  • Hadoop: read/write HDFS, resolve, build and run Spark jobs, Impala/Hive queries
  • Partitioning, to partition data and/or parallelize processing
  • Productivity boosters, to generate jobs/workflows from metadata

For the most recent information about EVL and supported formats and DBs please check https://www.evltool.com.

EVL Microservices

Each of the Microservice solves particular problem:

  • Anonymization: anonymizing production data according to GDPR requirements and other regulations for developers, testers and outside contractors
  • Data generation: simulating data complying with the real-life data patterns for proper testing environments, application development and implementing ETL processes
  • Validation: replacing heavy and complex testing tools in migration projects or quick quality checks of production data
  • Staging: To get data from various sources, like Oracle, Teradata, Kafka, CSV or JSON files, and provide a historized base stage.
  • QVD Utils: Enables reading/writing QVD files without using Qlik Sense or QlikView and also provides metadata from QVD/QVX header.
  • Hadoop Utils: Read/write Parquet and Avro file formats, query Impala, Hive, etc.
  • ASN.1 Decoder: Decode files from ASN.1 format into JSON with the highest performance.
  • Orchestration: To schedule and monitor sequences of jobs and workflows, await file delivery, etc. View the workflows in a graphical user interface and start, restart, cancel jobs and workflows and check their statistics and logs.

But they could supply each other, so combining them one get the whole solution.

For the most recent list of EVL Microservices and additional information please visit https://www.evltool.com.


Footnotes

(1)

Actually any source/target can be used, once available from Linux.