Andy’s Neural Works

A wise folly can lead to great things!


A Primer on Source to Target Mapping

Published by

on

A Source to Target Map is a blueprint for putting together your systems. This is where you will build out the contract for how data will flow into (and/or out of) your system. This posting will go into the basics of creating that data contract.

Why Do This?

Confusion is a terrible thing in a project. In the more aggressive projects of this day, putting a thoughtful design down onto paper is often overlooked which creates confusion. Yes, you can break things into factory development sometimes, but you must always have a foundation and an agreed upon blueprint to build anything properly. That’s where a just effort such as mapping comes into play.

Starting off

Building a source to target map may seem straightforward. However, it does require a lot of legwork to get it done. You will need to understand all sides of the system by talking with the people who are the owners of the data. Do not expect everyone to be a technological master who can write magical SQL to do your job for you. They could be anything from an intern to a C-level staff member so have an open mind and a willingness to learn. As you work with the sources, you will become an expert on the data.

As you might guess, the documentation involves gluing together the source and target in a field by field listing. That would be a good guess but oversimplifies the final product. Let’s start with the basic definitions.

The source is defined as anything that will feed into the system. As discussed in the previous posting on “Control Your ETL,” there can be many types of feeds. It really does not matter as long as you can get access to it.

The target is any type of component where the final transformed set resides. Typically, this is some type of database system (often a warehouse) but also has numerous other possibilities. For example, it could also be a file system where the components are XML documents.

Where do you start?

When creating this documentation, you will need to layer it starting with the systems itself. First, you need to gather a listing of source and target systems along with an owner. This is your high-level component listing. Start with this listing:

  • System Name
  • System Owner
  • Location – where does it all reside? Is it a server (IP Address), directory, or other (tape drive even!)

From there, build out a simple list of objects into a data dictionary of sorts. That dictionary will be composed of the following at the minimum:

  • Data Source Name – This is the name of the system that contains the data elements/attributes. This comes from the list described above.
  • Table/File Name – The table name (fully described with database and owner) or object being utilized.
  • Field Name – The atomic container within the table/file.
  • Description – Give a good description in a few sentences.
  • Composition – Describe how the data is put together. For example, you have a date field. How is that date field listed? Is it YYYYMMDD or something entirely different? This is very important to know. Another example would be a fixed width file. You will want to list out the starting and ending positions of the field in the file.

Other possible additions to your dictionary could be:

  • Data types and lengths
  • Business entity and attributes along with key descriptions
  • Nullability

You will want to have a dictionary for all of your sources as well as the target. Do not neglect the basics. Without a proper foundation, you might open up the door to guess work.

Glue it together

Now that you have both sides, can you just list out the following and be done with it all? Can you just put down:

  • Source Table
  • Source Field
  • Target Table
  • Target Field

Is that all there is to it? No, think about what is missing. Take a simple case of a data type conversion from a date to a string. If you just list the two fields and expect the developer to understand you want the date to read “August 27, 2018” rather than “27-AUG-2018” you are gambling.

That middle piece are the rules for placement. They can be:

  • Data conversion rules
  • Transformation rules (ex: an equation)
  • Lookup/join rules
  • Data cleansing rules (ex: capitalize the start of each word in a last name)

Including these rules makes sure that all bases are covered. Yes, a field can map 1:1 with another field but that is rarely the case.

In the end, what you will have is the Source Info – Rules – Target Info mapping. Putting it into a tool is possible, but honestly, those tools can be overpriced and complicated compared to a simple spreadsheet.

Conclusion

As can be seen in other industries, putting together a blueprint is essential for building a solid structure. If you put the effort into creating a source to target map for your project, there will be no guessing and confusion as to what goes where and how. The team will have that contract to base their work upon. The “hard” work is to make sure that contract is accurate and organized. That work is essential and should not be overlooked as disorganized construction does not last.