Get a List of Files on an FTP Server and then process with Foreach Loop Container

Today we are going to address a very common FTP question: “How can I get a list of the files that are on my FTP Server?”

This is a task that can be accomplished very easily with the Task Factory Secure FTP component and this walk through will provide step by step directions. If you are interested in Task Factory and the more than 40 tasks you can find the free trial here:

Here we will use the “Get list of files” option and store all the files for a given directory into an object variable. We will use a Foreach loop container to iterate through the object variable and then an Execute SQL task to write each file name to a table in our database. This walkthrough is broken down into three sections.

  • Part 1 – Setting up the Secure FTP component to get List of Files
  • Part 2 – Setting up the For Each Loop to read the the object variable
  • Part 3 – (Optional) Setting up the Execute SQL Task to write the file names to a table.

Part 1 – Configuring the Secure FTP site to Get List of Files from server.

  1. Start a new package and drag the Secure FTP task into your control flow from the toolbox.
  2. Next right click on the Secure FTP Task and click edit.
  3. Steps 4-7 below are in screenshot 1
  4. What will this task do? Select “Get list of files” from the drop down menu.
  5. For SFTP Connection Manager either choose an existing connection manager or create a new one if don’t have one set up.
  6. For your Directory location on server you have two options:
    1. First you can choose to click the ellipsis button and hard code the directory location.
    2. Second you can choose to store the directory location in a variable
  7. Next create an object variable to store the list of file names in from the server. (objFilesList)

Screenshot 1

Part 2 – Configuring the For Each Loop to iterate through the object variable

  1. Pull a Foreach Loop container into the Control Flow and drag the precedence constraint from the Secure FTP task to the Foreach loop container.
  2. Steps 3-6 can be found in Screenshot 2
  3. Open the For Each container and click on the Collection tab.
  4. For “Enumerator” choose “Foreach From Variable Enumerator” from the drop down.
  5. Next select the object variable you specified in the Secure FTP component. (objFilesList)
  6. Finally click the Variable Mappings tab.

Screenshot 2

  1. On the Variable Mappings tab select a string variable to store the file name in. (strFileName)
  2. Make sure Index is set to 0. (Screenshot 3)
  3. Click Ok.

Screenshot 3

Optional: (This section simply shows how to map variables in an Execute SQL Task and helps with visualization of each iteration of the Foreach Loop through the list of files on our FTP.)

Part 3 – Setting up the Execute SQL Task to read the string variable

(This section has been added for completeness and is fully optional.)

Note: For this example I have created a simple table with four columns:

CREATE TABLE [dbo].[FilesOnSftp](

[ID] [int] IDENTITY(1,1) NOT NULL,

[Directory] [varchar](50) NULL,

[FileName] [varchar](50) NULL,

[DateLoaded] [date] NULL)

  1. Pull an Execute SQL Task from the toolbox into the Foreach Loop container.
  2. Create a new variable called strSftpDirectory and give it a value of /

  3. Right click on the Execute SQL Task and click Edit to bring up the Editor
  4. For “Connection” choose the connection manager to your database.
  5. Click and highlight “SQLStatement” and then click the Ellipsis button that appears on the right.

Screenshot 4

  1. Clicking the Ellipsis button will give you a pop up window, this is where we enter our Insert SQL Statement. Here is the syntax I used if you are following along with the tutorial:

    INSERT INTO dbo.FilesOnSftp

([Directory], [FileName], [DateLoaded])

VALUES(?, ?,GETDATE())

  1. Now its time to map the parameters. Click on the Parameter Mapping tab. (Screenshot 5)
  2. Click the Add button and then select “strSftpDirectory” for Variable Name.
  3. For data type choose Varchar and change the Parameter Name to 0.
  4. Click the Add button a second time and select strFileName for the Variable Name.
  5. For data type choose Varchar and change the Parameter Name to 1. Click ok to save changes.

    Screenshot 5

Advertisement

How to flatten data using the Pivot Transform in SSIS.

The Pivot Transform is a very powerful tool in SSIS. It’s a component that is overlooked and isn’t the first thing to come to mind when you need it most.

Imagine you have a table that has an employee listing for each type of phone number. For example, you have one row for business phone, personal phone, and cell phone. We want to clean this data up and have only one row per employee. Well this is the situation I was presented with recently. The great news is with the Pivot Transform in SSIS this can be accomplished fairly easy. The table below is an simplified example of the data we had to work with.

EmployeeID PhoneType PhoneNumber
1 Business 123-987-6543
1 Personal 231-789-3456
1 Cell 312-897-5643
2 Business 321-978-6534
2 Personal 132-798-5634
2 Cell 213-879-3564

 

Our end product will look like this:

EmployeeID BusinessPhone PersonalPhone CellPhone
1 123-987-6543 231-789-3456 312-897-5643
2 321-978-6534 132-798-5634 213-879-3564

 

Step 1) Set up a data flow task in the control flow.

Step 2) Open up the data flow and pull in your source component. For this example I created a table in SQL Server.

Step 3) Drag in a Pivot Transform and link it to the source component.

Step 4) Configure the Pivot Transform

I was going to go through step by step instructions on how to configure the Pivot Transform. However, there is already an awesome blog out there by Devin Knight. Please click on the link below to view his blog post.

STEP 3 in Devin’s post explains how to configure the Pivot Transform.

https://devinknightsql.com/2009/06/15/better-know-a-ssis-transform-the-pivot-transform/http://bidn.com/blogs/DevinKnight/ssis/85/better-know-a-ssis-transform-%E2%80%93-the-pivot-transform

 

Thanks for Looking!

Quickly remove invalid XML characters! (Dimension Processing Error: Illegal XML Character)

Recently I was using the Dimension Processing task inside of SSIS to perform a process add to one of my dimensions. I then found out that I had some bad data, more specifically, I had some invalid XML characters in my source data and this was causing the dimension processing task to fail. Below is a screenshot that includes the invalid character, after Mitchell before =.

image

Error Message:

image

Business Problem:

I needed to remove invalid XML characters from source data so that I could use the dimension processing task to perform a process add on the dimension.

Solution:

Sadly, I originally began this process by trying to write my own script transform to remove the invalid XML characters. Unfortunately the script I was using (Found on the internet) apparently did not account for all characters as my processing task continued to fail with the same error.

So what did I do? I turned to Task Factory of course! To solve this problem I used the Task Factory replace unwanted characters transform.

Walkthrough:

    1. Bring the Task Factory Replace Unwanted Characters Transform into your data flow and place it between your source and destination:
    2. image
    3. Next open up the Task Factory component and you will see a list of your columns on the left.
    4. On the right you will see the column “Choose Action”. Click this drop down menu for the column you wish to cleanse. This menu will provide you with 4 different cleansing actions! (List below)
      1. image
    5. Select “Replace invalid characters that cannot be part of an XML Document.

image

Conclusion:

Now when I execute my package you will notice the illegal XML character is gone!

image

Thank you Task Factory Team!

SQL Tips! Generate DDL Scripts with the data from the table. (Installment 2)

Welcome back to this series on SQL Tips. Sometimes I find myself needing to not only generate the DDL for a table but also the data that is stored in that table. For example imagine I have created and populated a few tables for a blog post. It would be much easier if I could share the table definition and the SQL statement to populate that table, Agree? Great!

Scenario

In this scenario I am going to show you how to generate the DDL to create the DimCurrency table from the AdventureWorksDW2012 database. Along with the DDL script I will also show you how you can generate the script to populate the table.

Walkthrough

    1. Open SSMS and connect to the server where your table is located at.
    2. Right click on the database where your table is stored, I’m using AdventureWorksDW2012.
    3. Next click on Tasks
    4. Next click Generate Scripts
    5. image
    6. Inside the Generate Scripts wizard click next on the introduction screen.
    7. For Choose Objects screen –> Click “Select specific database objects”.
    8. Expand the tables list.
    9. Select the DimCurrency Table.
    10. image
    11. On the “Scripting Options screen” I’m just going to load the script into a new query window. Please select “Save to new query window”. This option is found at the bottom of the screen.
    12. On the same screen select the “Advanced” button found on the right side.
    13. Now scroll down to “Types of data to script” and change it from schema only to Schema and Data.
    14. image
    15. Click Ok. Click Next, Click Next. The script will now be created and generated in a new query window. Click Finish once it has completed. Below is a partial screenshot of the completed product.
    16. image

 

As always, thanks for looking!

SQL Tips! Set your default database in SSMS (Installment 1)

In my time as a Business Intelligence consultant and trainer I have picked up some pretty cool tricks and thought I would start blogging on these tips two or three times a week as I catch myself using them.

In this blog I am going to show you how you can set your default database when you connect to a server. I find this to be an extremely helpful little tip. Please see below.

Walkthrough:

    1. Open SSMS and from the object explorer click “Connect”.
    2. From the Connect to Server prompt: Enter your server and Authentication information.
    3. Click “Options”, located at the bottom right.
    4. Next click the “Connection Properties” tab found at the top.
    5. Locate “Connect to Database” and then click the drop down menu.
    6. From the dropdown menu select <Browse> and then select the database that you want to be your default. Click Connect..
    7. Voila! Now every time you connect to that server it will default to that database instead of master!

image

As always, thanks for looking!

For Loop Container in SSIS–Does File Exist (Part 2)

In this post I am going to explain how to set up and use the For Loop container inside SSIS in a real world scenario. In a previous post I showed how to set up a script task inside SSIS to check if a File Exist. Well if the file doesn’t exists I want to keep checking over and over again until that file does exists. The For Loop container is just what I need. Let’s get started.

This post assumes you are building on my previous post, found here: Does File Exist

Walkthrough

Open the package created in Part 1 and pull a For Loop container task into the control flow.

  1. Drag the script task created in part 1 into the For Loop container.
  2. Right click on the For Loop Container and click “edit”.
  3. Under For Loop properties you will see three options.
    1. InitExpression – This value is optional. Here you can assign an initial value. (More on this shortly)
    2. EvalExpression – This value is required. As long as this expression evaluates to true the loop will continue to loop. Once it evaluates to false the loop will stop.
    3. AssignExpression – This value is optional. This value is used to change the value of the variable that you use in the EvalExpression. For example you can add 1 to each iteration of the loop.
  4. For InitExpression enter the following: @bolDoesFileExist = False.
    1. Notice I used one (=) sign here. In SSIS you use one = for assigning values.
  5. For EvalExpression enter the following: @bolDoesFileExist == False.
    1. As long as this expressions evaluates to true the loop will continue. Once the variable no longer is false (meaning that the file does exists) the loop will stop.
  6. For Assign Expression, leave this value empty. Click ok to close out the For Loop Editor.
  7. image
  8. That’s it!

You might want to also add a delay between loops, maybe 5 seconds maybe 5 minutes, completely up to you. But here is an easy way you can add that functionality.

Pull an Execute SQL Task into your For Loop container.

  1. Choose OLEDB for connection type
  2. Select any OLEDB connection, doesn’t matter which one you choose.
  3. Select Direct Input for SQLSourceType and type the following code for your delay statement:
      • WAITFOR DELAY ‘00:00:05’

image

Here is how the final package will look:

image

Thanks for looking! Enjoy.

SSIS Script Task–Does File Exists (Part 1)

So in this blog I am going to show you how you can write a quick Script Task inside of SSIS to see if a file exists. In my next blog in this two part series I will show you how you can then loop over this script task until the file does exist.

You can find Part 2 in this series here:

Business Problem:

So you know that you are going to receive a file that you need to process, and you want to process and load that file the moment it “exists”. If you run your package before the file exists the package will fail, so you want to first check to see if the file exist and then run your data flow task that will subsequently load the file.

Walkthrough

Open up your Visual Studios environment and create a new package.

  1. Pull the Script Task component into the data flow.
  2. First let’s create two variables, we will use these in our script task.
  3. image
    1. The value for the strFileName variable is the location of the file, please change yours as needed.
  4. Open the script task and make sure the Script Language is set to Visual Basic.
  5. Select “strFileName” for the read only variables.
  6. Select ‘”bolDoesFileExist” for read/write variables.
  7. image
  8. Click “Edit Script…” to open visual studios.
  9. Ok. Inside the VB project we first need to add a new class of code.
  10. In the Imports section add the following code: Imports System.IO
  11. image
  12. Now we enter our VB code. Find the section titled “Public Sub Main” and type the following code where it says ‘Add Code Here’.
  13. Dts.Variables(“bolDoesFileExist”).Value = File.Exists(Dts.Variables(“strFileName”).Value)
  14. Here is a screenshot of the final result:
  15. image
  16. Now simply click the x at the top right to close the visual studio project to close it out, you don’t have to click save.
  17. Back in the script task editor main screen click ok to compile the code. All Done!

 

Explanation of Visual Basic Code

The code inside the script task is in two parts.

  • The first part is the function “File.Exists”. The file exists function is reading the value of the strFileName variable and checking to see if that file exists. It will return a value of either True or False.
  • The second part is writing the result of the File Exists function to the variable bolDoesFileExist. So everytime this package runs the value of that variable will change based on whether that file does or does not exist.

Thanks for looking and I hope this helps. In my next post I will explain how you can loop over this task over and over again until the file does exist.