Recently I was tasked with removing special characters that randomly showed up in one column of a data set, these characters were coming from a proprietary source. The characters were the hash tag (#) and semi-colon (;). One or both characters would appear at the beginning, end or middle of the column. There are a few different ways you could deal with this type of situation, each of them varying in their degree of complexity and difficulty.
Fortunately for the customer we provided a very easy and quick solution. In this short blog we are going to take a look at the Data Cleansing transform offered in Task Factory. First let’s take a look at a before and after picture so you can visualize the data and end result.
Setting up the component
- Pull the Task Factory Data Cleansing Transform into your Data Flow.
- Connect your source data to the Data Cleansing transform.
- Right click on the Data Cleansing Transform and click Edit.
- First select the column you would like to “Replace”.
-
Next select Replace Specified characters or words with user defined value. (Screenshot 1)
-
After you select an action you will receive a parameter box where you can specify the characters you would like to replace and what you want to replace them with.
-
Now we will finish configuring the rest of the component. Please see screenshot below for final settings.
-
- Finally click ok to save your settings and execute the package!
Conclusion:
That’s it. I tried to drag it out as much as I could but some things are just that simple. Take a look at some of the other actions included and enjoy! As always thanks for looking.