Posts

How to read a CSV file in Pentaho Data Integration (Kettle) (PDI)

Image
 How to read CSV File in Pentaho Data Integration Kettle (PDI) Here I am giving a use case where I read a CSV file and dump the CSV data into a database table. I am using Mysql DB on my local machine. CSV file: Preparation:  Now, for transformation, I am taking the CSV file input component and then a Table Output component Step 1: Reading CSV file Let's see how we read CSV from the CSV file input component. From the above screenshot  1. Browse the file 2. Change the file Delimiter for your file 3. Change the encoder for your file 4. Click on Get Fields to read the fields from the file 5. Preview the data to confirm everything is working 6. PDI automatically detects the data type of columns. change the type if it was wrongly detected. Step 2: Dump CSV data into a table:  Now, let's see how we dump the CSV data into a table. Now connect the CSV component to Table Output Component. 1. Click on New to give a connection to a Database. Click on Edit if you already gav...

Install Pentaho Data Integration (Kettle) (PDI)

Image
Install Pentaho Data integration Pentaho Data Integration is one of the best ETL/automation tools. We can do ETLs from various kinds of data sources that PDI is offering. We can automate various processes based on schedules we create. Currently, I am using Pentaho data Integration 8.1 community edition.  You can download the community edition from the following link. https://sourceforge.net/projects/pentaho/files/Pentaho%208.1/client-tools/ https://excellmedia.dl.sourceforge.net/project/pentaho/Pentaho%208.1/client-tools/pdi-ce-8.1.0.0-365.zip Then unzip the file and run spoon.bat for windows users & spoon.sh for Linux users.

Create User defined variables in Pentaho Data Integration (Kettle)

Image
 Create User-Defined Variables in Pentaho Data Integration (Kettle) There are various situations where we have to use user-defined variables instead of static values for several components like DB connections, quey filters etc... Here I am going to give one example where I defined DB connection details as variables and I am read the DB table using the variables as connection details. I can use those variables where I used the DB connections in transformations/Jobs. Main Job: Read XML file: Using variables to Query DB (read_DB_data):  XML file in which the connection defined:  Preparation:  Step 1: Now, first, we are reading an XML file where the connection details are given. Then we select those connections details and set them as variables.  Then We will select the fields defined in the XML file using the  Select Values component Click on each tab to get the fields from the previous step. Then we will set the fields as variables using the  Set Variab...