Clinical data comes in all different forms and structures for the same piece of information. For example, age could be reported in years, or months or even in days. Without normalization i.e. converting disparate data into a single dataset view, it’s very hard to derive useful, meaningful clinical intelligence out of any medical data.
This not only applies to the data level attributes around patient but also the attributes that uniquely identifies the patient itself. The identifying attributes of patients such as any PII information (name, address, dob, ssn, insurer information, credit card, email, phone) could itself be presented in different formats and styles posing a real challenge linking all records back to one individual.
The other most important aspect that poses a huge challenge in clinical data normalization is around the fact that vast amount of information being stored in healthcare systems are in text format.
The above factors has been outlined just to give an idea of why clinical data normalization is unique and different compared to the traditional data normalization. It also shows the importance to build a generalized data normalization pipeline that takes into account not only the traditional variations with regards to transactions, numbers but also textual data that has key information to unveil the “real” clinical intelligence