Recently I was tasked with removing special characters that randomly showed up in one column of a data set, these characters were coming from a proprietary source. The characters were the hash tag (#) and semi-colon (;). One or both characters would appear at the beginning, end or middle of the column. There are a few different ways you could deal with this type of situation, each of them varying in their degree of complexity and difficulty.
Fortunately for the customer we provided a very easy and quick solution. In this short blog we are going to take a look at the Data Cleansing transform offered in Task Factory. First let’s take a look at a before and after picture so you can visualize the data and end result.
Setting up the component
- Pull the Task Factory Data Cleansing Transform into your Data Flow.
- Connect your source data to the Data Cleansing transform.
- Right click on the Data Cleansing Transform and click Edit.
- First select the column you would like to “Replace”.
Next select Replace Specified characters or words with user defined value. (Screenshot 1)
After you select an action you will receive a parameter box where you can specify the characters you would like to replace and what you want to replace them with.
Now we will finish configuring the rest of the component. Please see screenshot below for final settings.
- Finally click ok to save your settings and execute the package!
That’s it. I tried to drag it out as much as I could but some things are just that simple. Take a look at some of the other actions included and enjoy! As always thanks for looking.