What is ETL?
ETL stands for Extract, Transform and Load, which is a process
used to collect data from various sources, transform the data depending on
business rules/needs and load the data into a destination database.
The need to use ETL arises from the fact that in
modern computing business data resides in multiple locations and in many
incompatible formats. For example, business data might be stored on the file
system in various formats
(Word docs, PDF, spreadsheets, plain text, etc), or can be stored as email files, or can be
kept in a various database
servers like MS SQL Server, Oracle and MySQL for example. Handling all
this business information efficiently is a great challenge and ETL plays an important role
in solving this problem.
Extract, Transform and Load
The ETL process has 3 main steps, which are
Extract, Transform and Load.
Extract – The first step in the ETL process is extracting the data
from various sources. Each of the source systems may store its data in
completely different format from the rest. The sources are usually flat files
or RDBMS, but almost any data storage can be used as a source for an ETL
process.
Transform – Once the data has been extracted and converted in the
expected format, it’s time for the next step in the ETL process, which is
transforming the data according to set of business rules. The data
transformation may include various operations including but not limited to
filtering, sorting, aggregating, joining data, cleaning data, generating
calculated data based on existing values, validating data, etc.
Load – The final ETL step involves loading the transformed data
into the destination target, which might be a database or data warehouse.
ETL Tools
IBM (IBM InfoSphere DataStage),
Oracle (Oracle Warehouse Builder)
Microsoft (SQL Server Integration Services - SSIS)
No comments:
Post a Comment