How to read a CSV file in Pentaho Data Integration (Kettle) (PDI)
How to read CSV File in Pentaho Data Integration Kettle (PDI)
Here I am giving a use case where I read a CSV file and dump the CSV data into a database table. I am using Mysql DB on my local machine.
CSV file:
Preparation:
Now, for transformation, I am taking the CSV file input component and then a Table Output component
Step 1: Reading CSV file
Let's see how we read CSV from the CSV file input component.
1. Browse the file
2. Change the file Delimiter for your file
3. Change the encoder for your file
4. Click on Get Fields to read the fields from the file
5. Preview the data to confirm everything is working
6. PDI automatically detects the data type of columns. change the type if it was wrongly detected.
Step 2: Dump CSV data into a table:
Now, let's see how we dump the CSV data into a table. Now connect the CSV component to Table Output Component.
1. Click on New to give a connection to a Database. Click on Edit if you already gave a connection and you want to edit/update.
2. select the target table. you can write the table name if you know the table name or just click on browse and select from the list.
3. Check the box to manually map the source(CSV columns) & destination(table columns) columns.
4.click on it to manually map the columns from source & destination.
5.Chnage the commit size.
DB connection window:
From the above screenshot
1. Give a connection name as you wish.
2. select the connection type. usually/most of the cases it is General.
3. Access type select as Native(JDBC).
4. Connection type as the type of DB you are connecting. In my case it is MySql.
5. Give connection details.
6. Click on Test to test the DB connectivity.
Mapping Columns :
1. Click on Database Fields to map the source & destination columns manually.
2. Click on Get Fields to get the fields from the source & destinations.
3. Click on Enter Fields mapping to map the columns manually. If the source & Destination column names are the same them PDI has the intelligence to map them automatically if not we should map manually. in my case, the source & destination columns are the same and they get mapped automatically.
Columns mapping window:
Comments
Post a Comment