My two 'bits' on technology!

Importance of Data Modeling in Data Architecture

This is a article contributed by guest author: Suresh Yaram, Data Architect.

Here, let’s see why data modeling is important in the implementation of data architecture for an enterprise application, be it a CRM application or a Trading application. Since, the data is an important asset in an enterprise, it is very important to understand the data usage and access across applications within the enterprise. To achieve this, it is mandatory to maintain data models throughout the life span of applications by following appropriate Data Model Development Methodology. However, the data model development cannot happen in isolation without looking at the other data models already in-place within the enterprise. The fragmented and siloed way of data model development can lead to redundant, inconsistent and overlapping models. Hence, it is very important to identify and catalog all data models across various projects and see any of them can be reused for the current requirement.

Next comes the data modeling methodology, we may adopt “top-down” or “bottom-up” or “re-engineering”. Let’s see various enterprise scenarios to apply each methodology.

The top-down modeling methodology is recommended while building data model for the entire enterprise as per the general information requirements, without considering business requirements. This methodology ensures all the data elements are captured without any gaps and reuse the part of model as per data requirements for an application.

The bottom-up modeling methodology is recommended while building data model for a new application module by as per the defined business/functional requirements.

Finally, the re-engineering methodology works in a scenario where an existing application is to be enhanced as per new business/functional requirements.

Next comes the data model development life cycle to be followed. First, it starts with building a Conceptual Data Model (CDM), followed by a Logical Data Model (LDM) and finally ends with the implementation of Physical Data Model (PDM). Of course, the entire process happens in an iterative manner.

The Conceptual Data Model (CDM) is developed as a first step in case of a new development project that represents only business entities at high level without specifying attributes.  The CDM is mainly intended to explain the model to non-technical senior management people.

The Logical Data Model (LDM) is developed as a next step that identifies Entities, Attributes along with the keys (PK, FK,) and its relationships (1:1, 1: M, M: M). The entities and attributes will have business friendly names and M: M relationships are not resolved in LDM. The intended audience for this model is Business Analysts and Application Users to make them understand the data model and get the sign-off.

Finally, the Physical Data Model (PDM) is developed containing database specific implementation of database objects (tables, indexes, constraints). In PDM, the entity and attribute names are shortened by applying glossary of abbreviations and naming conventions and the M: M relationships are resolved. The intended audience for this model is application developers and database administrators.

Next comes the modeling tools used in practice. The most popular data modeling tools being used by IT organizations are: CA’s Erwin Data Modeler, Embarcadero Technologies’ ER/Studio, Sybase’s Power Designer Studio and Quest Software’s Toad Data Modeler.

So far, we have seen how the data model development can happen under various scenarios. However, sometimes we encounter with a situation where we need to review an existing data model. So, it is very important to have a review checklist providing guidelines for critiquing a data model. The sample review checklist could be as follows.

  • Data modeling notation used. Btw, it is advisable to use Information Engineering (IE) notation out of available notations because of its simplicity
  • Check whether naming conversions and glossary are defined for the attributes used in PDM
  • Resolution of M:M relationships in PDM using associative entities
  • Usage of Generalization Hierarchies (super-types & sub-types) to reduce the usage of redundant attributes

Before I close, let me discuss a few best practices to be followed while developing a data model.

  • Division of entire model into various subject areas that improves manageability of model
  • Provide enough documentation for every entity, attribute and the rationale behind the relationship created that makes the model more meaningful to the intended audience.
  • Make sure to have alternate keys (natural keys) also defined while defining surrogate keys
  • Leverage Generic Structures as far as possible: use abstraction to increase the types of information a design can accommodate using generic concepts. For example, an entity by name “Party” can be defined that accommodates an Employee or a Customer or an Organization providing service etc.
  • Avoid circular and redundant relationships
  • Avoid empty entities (entities with no attributes) which are generally created as placeholders for next phase.

Appreciate your comments..

About Suresh Yaram

Suresh Yaram holds a Masters Degree in Technology from IIT, Kharagpur. Suresh has over 14 years of IT industry experience in data management space, providing consulting services in Data Architecture, Data Modeling, Database Design and Development of OLTP & ETL applications. Suresh has expertise in handling of large data models, providing strategy for Data Life Cycle Management, Data Migration, and Data Governance (DG) polices such as Master Data Management (MDM) and Metadata Management. Suresh is TOGAF certified Enterprise Data Architect and he can be reached at:

Powered by WordPress | Designed by