QBoard » Big Data » Big Data - Data Processing and ETL » What counts as ETL?

What counts as ETL?

  • I know that ETL stands for Extract, Transform and Load data into a new target database. But in what scope does it still count as ETL? For example, if i want to move a contact database with 7000 records into a CRM software, does this process count as ETL as well?
      August 7, 2020 4:48 PM IST
    0
  • ETL stands for Extract, Transform, Load stages for the data. Extract from a data source, TRANSFORM the extracted data and LOAD into target data source.

    Whenever you do EXTRACT in one place and LOAD in another place, your process still comes into ETL. ETL may not involve TRANSFORM in every scenario, where it is straight forward data load. Most of the scenarios, there will be TRANSFORM to the data to suit the target environment/schema.

    To answer your question, yes. your loading of records fall under the purview of ETL. But, in your case, it is not having TRANSFORM stage.

      August 7, 2020 4:52 PM IST
    0
  • As stated by Venkataraman R, you don't have a transform stage that is why your job can't really be considered ETL.


    Normally the transform portion would include some sort of data mapping (EG. standardize country codes or extract country codes USA -> US; TUR -> TR). Aside from lots of lookup verification and mapping you would do some general cleaning like removal of bad data, proper formatting like title caps, reworking of keys in the case of data warehouse). You can also do imputation, binning and normalization in the case of preparation of machine learning training. But i think the most important one would be removal of duplicates as it can cause issues regarding aggregation.


    It is also considered transformation if you derive a new set of data from your existing data into aggregate form. This means that you have somehow group your data together (SUM/AVG/MAX) so that when a tool uses the data, it would no longer need to perform the aggregation themselves minimizing the computational and bandwidth requirements.

      August 13, 2021 12:54 PM IST
    0
  • I think it's interesting that, since this question was asked, a whole new set of tools has emerged that call themselves "Reverse ETL" and they sync data in the direction you are talking about: from the database/warehouse into things like CRM systems. For example, out of Postgres and into Salesforce or Marketo.

    The "Reverse" piece seems to be a acknowledgement that this is going in the opposite direction as ETL usually went in historically.

      November 20, 2021 12:33 PM IST
    0