Author: Soumen Chakraborty, Director - Data Management

Unlocking Efficiency: The Power of Auto Data Mapping Tools for a Data-Driven Enterprise

December 6th, 2023 WRITTEN BY Soumen Chakraborty, Director - Data Management Tags: AI, artificial intelligence, auto data mapping tools, data driven, data governance, data management, data modeling, Industry-agnostic, mapping

Written By Soumen Chakraborty and Vaibhav Sathe

In the fast-paced world of data-driven decision making, enterprises are constantly grappling with vast amounts of data scattered across diverse sources. Making sense of this data and ensuring its seamless integration is a challenge that many data teams face. Enter the hero of the hour: AI-Driven Auto Data Mapping Tools.

Understanding the Need:

Consider this scenario: Your enterprise relies on data from various departments – sales, marketing, finance, and more. Each department might use different terms, structures, and formats to store their data. Moreover, each company depends on a multitude of third-party data sources, over which they often have minimal to no control. Manual mapping of these diverse datasets is not only time-consuming but also resource intensive, costly, and prone to errors.

Traditional data mapping tools offer some automation, but they highly depend on the tool user’s skill set. However, the modern auto data mapping tools take it a step further. They leverage advanced algorithms to analyze not just data fields but also data, metadata, context, and semantics. This comprehensive approach ensures a deeper understanding of the data, resulting in more accurate and contextually relevant mappings.

How it helps?

Precise Mapping:

There is a high chance of human error, especially when dealing with large datasets. Auto data mapping tools excel at recognizing intricate patterns within datasets. Whether it is identifying synonyms, acronyms, or variations in data representations, these tools analyze the nuances to provide precise mappings. Thus, auto data mapping tools significantly reduce the risk of mistakes in data mapping, ensuring that your reports and analytics are based on accurate information.

Practical Example: In a healthcare dataset, where “DOB” may represent both “Date of Birth” and “Date of Admission,” an auto data mapping tool can discern the semantics and map each instance accurately.

It can also automate the process of linking data fields and relationships. For instance, your marketing team uses “CustomerID,” while the finance team refers to it as “ClientID” and some other team identifies it as “Account Number.” An auto data mapping tool can recognize these connections, eliminating the need for tedious manual matching.

Accelerated Data Modeling:

In a traditional data modeling approach, data analysts manually analyze each dataset, identify relevant fields, and establish relationships. This process is time-consuming and prone to errors, especially as datasets grow in complexity.

With auto data mapping, advanced algorithms can analyze datasets swiftly, recognizing patterns and relationships automatically. it can have the capability to potentially anticipate the relationships and logical modeling required for integrating a new data source with the existing dataset.

Practical Example:

Consider a scenario where the retail company introduces a new dataset related to online customer reviews. Without auto data mapping, analysts would need to manually identify how this new dataset connects with existing datasets. However, with auto data mapping, the tool can predict relationships by recognizing common attributes such as customer IDs or product codes. This accelerates the data modeling process, allowing analysts to quickly integrate the new dataset into the existing data model without extensive manual intervention.

Adapting to Change:

In the dynamic business landscape, changes in data structures are inevitable. When a new department comes on board or an existing one modifies its data format, auto data mapping tools automatically adjust to these changes. It’s like having a flexible assistant that effortlessly keeps up with your evolving data needs.

Practical Example: Imagine your company acquires a new software system with a different data format. A reliable auto data mapping tool can seamlessly integrate this new data source without requiring a complete overhaul of your existing mapping by predicting the new mapping dynamically.

Collaboration Made Easy:

Data teams often work in silos, each with its own set of terminology and structures. Auto data mapping tools create a common ground by providing a standardized approach to data mapping. This not only fosters collaboration but also ensures that everyone is on the same page, speaking the same data language.

Practical Example: In a collaborative environment, such tool can enable data SMEs from different departments to share insights and collectively refine semantic mappings, debate/define standards, promoting a shared understanding of data across the organization.

Mapping Version Control:

Auto data mapping tools introduce mapping version control features, allowing data teams to track changes, revert to previous versions, and maintain a clear history of mapping modifications. This is invaluable to collaborative environments where multiple stakeholders contribute to data mapping.

In a dynamic data environment, where frequent updates and changes occur, mapping version control becomes crucial. Auto data mapping tools can provide the necessary systematic approach to Source-To-Target mapping versioning, ensuring transparency and collaboration among data teams.

Practical Example:

Such a tool can do precise tracking of mapping changes over time, offering a clear history of modifications with details about the user responsible and the purpose behind each mapping. In scenarios where unintended changes occur, the ability to easily revert to previous versions can ensure swift restoration of accurate data mappings, minimizing disruptions. Collaborative workflows are significantly enhanced, as multiple team members can concurrently work on different aspects of the mapping, with the tool seamlessly managing the merging of changes. Moreover, the audit trail provided by the version control tool can contribute to efficient compliance management, offering transparency and demonstrating adherence to data governance standards.

Compliance and Governance:

In an era of data regulations, ensuring compliance is non-negotiable. Auto data mapping tools contribute to data governance efforts by providing transparency into how data is mapped and transformed. This transparency is crucial for audits and compliance checks.

Practical Example: Consider a scenario where your industry faces new data privacy regulations. An auto data mapping tool can help you quickly identify and update mappings that are needed to comply with the new rules, ensuring your organization stays within legal boundaries.

Cost Reduction:

Manual data mapping is resource intensive. Auto data mapping tool can streamline the integration process, saving time and resources. This efficiency translates to cost savings for your enterprise.

Practical Example: Imagine the person-hours saved when your data team does not have to manually reconfigure mappings every time a new dataset is added.

Improved Decision Making:

A clear understanding of data relationships is crucial for effective decision making. Understanding the context in which data is used is crucial for effective integration. Auto-Data Mapping tools take into account the broader context of data fields, ensuring that mappings align with the intended use and purpose. Auto data mapping tools provide this clarity, empowering data analysts and scientists to work with well-organized and accurately mapped data.

Practical Example: Consider a sales dataset where “Revenue” may be reported at both the product and regional levels. An auto data mapping tool can discern the context, mapping the data based on its relevance to specific reporting requirements.

With accurate data mappings, your business intelligence team can confidently create reports and analysis that the leadership can trust, leading to more informed decisions.

What tools to use?

Despite the numerous benefits of auto data mapping, there is a notable shortage of effective tools in the industry. This is primarily due to a lack of awareness regarding the needs and implications of having or not having such a tool. Additionally, there is a prevailing notion that ETL tools/developers can adequately address these requirements, leading to a lack of interest in dedicated data mapping tools. However, this should not be the optimal solution for today’s data-driven organizations.
Building data plumbing without proper data mapping is like constructing a house without a blueprint—it just doesn’t work! Data Mapping, being both functional metadata and a prerequisite for creating accurate data integration pipelines, should be crafted, and handled independently. Otherwise, there is a potential risk of losing vital information concealed within diverse standalone data integration pipelines. Organizations often pay a hefty price by not maintaining separate mapping of source to target outside the code. It causes a lack of awareness of lineage and makes real-time monitoring or modern needs like data observability almost impossible, because nobody knows what is happening in those pipelines without decoding the entire pipeline.

With this consideration in mind, Fresh Gravity has crafted a tool named Penguin, a comprehensive AI-driven data matcher and mapper tool that helps enterprises define and create a uniform and consistent global schema from heterogeneous data sources. A clever data mapping tool that not only matches the abilities of auto data mapping tools but also brings in a sharp industry focus, adaptive learning with industry smarts, and collaborative intelligence to supercharge data integration efforts. For companies handling intricate data and numerous data integration pipelines, leveraging a tool like Penguin alongside a metadata-driven data integration framework is crucial for maximizing the benefits of automated data integration. It makes creating maps easy, helps teams work together smoothly, and keeps track of changes.

In conclusion, auto data mapping tools are indispensable for modern enterprises seeking to navigate the complex landscape of data integration. By enhancing efficiency, accelerating data modeling, ensuring accuracy, fostering collaboration, and facilitating compliance, these tools pave the way for organizations to derive maximum value from their data. Fresh Gravity’s dedication to excellence in these areas makes our tool valuable for succeeding with data. So, embrace the power of automation, and watch your enterprise thrive in the era of data excellence.

If you would like to know more about our auto data mapping tool, Penguin, please feel free to write to us @ info@freshgravity.com.

Data Management, Industry-agnostic

Data Observability is the new Data Quality – What, Why and How ?

June 6th, 2023 WRITTEN BY Soumen Chakraborty, Director - Data Management Tags: analyse data, Data, data management, data monitoring, decisions, Industry-agnostic, observability

Written By : Soumen Chakraborty and Vaibhav Sathe

In today’s data-driven world, organizations are relying more and more on data to make informed decisions. With the increasing volume, velocity, and variety of data, ensuring data quality has become a critical aspect of data management. However, as data pipelines become more complex and dynamic, traditional data quality practices are no longer enough. This is where data observability comes into play. In this blog post, we will explore what data observability is, why it is important, and how to implement it.

What is Data Observability?

Data observability is a set of practices that enable data teams to monitor and track the health and performance of their data pipelines in real time. This includes tracking metrics such as data completeness, accuracy, consistency, latency, throughput, and error rates. Data observability tools and platforms allow organizations to monitor and analyze data pipeline performance, identify, and resolve issues quickly, and improve the reliability and usefulness of their data.

The concept of data observability comes from the field of software engineering, where it is used to monitor and debug complex software systems. In data management, data observability is an extension of traditional data quality practices, with a greater emphasis on real-time monitoring and alerting. It is a proactive approach to data quality that focuses on identifying and addressing issues as they occur, rather than waiting until data quality problems are discovered downstream.

Why is Data Observability important?

Data observability is becoming increasingly important as organizations rely more on data to make critical decisions. With data pipelines becoming more complex and dynamic, ensuring data quality can be a challenging task. Traditional data quality practices, such as data profiling and data cleansing, are still important, but they are no longer sufficient.

Let’s consider an example to understand why data observability is needed over traditional data quality practices. Imagine a company that relies on a data pipeline to process and analyze customer data. The data pipeline consists of multiple stages: extraction, transformation, and loading into a data warehouse. The company has implemented traditional data quality practices, such as data profiling and data cleansing, to ensure data quality.

However, one day the company’s marketing team notices that some of the customer data is missing in their analysis. The team investigates and discovers that the data pipeline had a connectivity issue, which caused some data to be dropped during the transformation stage. The traditional data quality practices did not catch this issue, as they only checked the data after it was loaded into the data warehouse.

With data observability, the company could have detected the connectivity issue in real time and fixed it before any data was lost. By monitoring data pipeline performance in real-time, data observability can help organizations identify and resolve issues quickly, reducing the risk of data-related errors and improving overall data pipeline performance.

In this example, traditional data quality practices were not sufficient to detect the connectivity issue, highlighting the importance of implementing data observability to ensure the health and performance of data pipelines.

Data observability provides organizations with real-time insights into the health and performance of their data pipelines. This allows organizations to identify and resolve issues quickly, reducing the risk of data-related errors and improving the reliability and usefulness of their data. With data observability, organizations can make more informed decisions based on high-quality data.

How to Implement Data Observability ?

Implementing data observability requires a combination of technology and process changes. Here are some key steps to follow:

Define Metrics: Start by defining the metrics that you want to track. This could include metrics related to data quality, such as completeness, accuracy, and consistency, as well as metrics related to data pipeline performance, such as throughput, latency, and error rates.

Choose Tools: Choose the right tools to help you monitor and track these metrics. This could include data quality tools, monitoring tools, or observability platforms.

Monitor Data: Use these tools to monitor the behavior and performance of data pipelines in real time. This will help you to identify and resolve issues quickly.

Analyze Data: Analyze the data that you are collecting to identify trends and patterns. This can help you to identify potential issues before they become problems.

Act: Finally, take action based on the insights that you have gained from your monitoring and analysis. This could include making changes to your data pipeline or addressing issues with specific data sources.

Benefits of Data Observability

Implementing data observability provides numerous benefits, including:

Improved Data Quality: By monitoring data pipeline performance in real time, organizations can quickly identify and address data quality issues, improving the reliability and usefulness of their data.

Faster Issue Resolution: With real-time monitoring and alerting, organizations can identify and resolve data pipeline issues quickly, reducing the risk of data-related errors and improving overall data pipeline performance.

Better Decision Making: With high-quality data, organizations can make more informed decisions, leading to improved business outcomes.

Increased Efficiency: By identifying and addressing data pipeline issues quickly, organizations can reduce the time and effort required to manage data pipelines, increasing overall efficiency.

Data observability is a new concept that is becoming increasingly important in the field of data management. By providing real-time monitoring and alerting of data pipelines, data observability can help to ensure the quality, reliability, and usefulness of data. Implementing data observability requires a combination of technology and process changes, but the benefits are significant and can help organizations to make better decisions based on high-quality data.

Data Management, Industry-agnostic

Why You Need a Metadata-driven Data Integration Framework

April 17th, 2023 WRITTEN BY Soumen Chakraborty, Director - Data Management Tags: data integration, data management, framework, Industry-agnostic, metadata

Written By Soumen Chakraborty, Director, Data Management

In today’s world, IT Professionals and business stakeholders alike know that data is the most valuable asset for organizations. To manage that asset efficiently, organizations are adopting a modern data stack. Once they adopt the Modern Data Stack to democratize the creation, processing, and analysis of data, they need a reliable and efficient data integration platform to prevent that data ecosystem from turning into an unwieldy beast due to organic growth. A metadata-driven data integration framework is one such methodology that can help organizations manage and integrate their data in a more efficient manner with the modern data stack. In this blog post, we will explore what a metadata-driven data integration framework is, its benefits over the traditional approach, its use cases, and how Fresh Gravity can help expedite building Data Integration (DI) platforms using this framework.

What is a Metadata-driven data integration framework?

A metadata-driven data integration framework isn’t just a combination of traditional and contemporary technologies, but a design concept that relies on metadata to manage and integrate data from various sources. Metadata is data that provides information about other data. It includes information about data structure, data types, data format, and data relationships. In a metadata-driven data integration framework, metadata is used to describe the data sources, transformation rules, and target data structures. This metadata is used to generate dynamic mapping/code which can then be used in the data integration process.

Six Reasons that metadata-driven data integration is superior to traditional data integration

(1) Standardization: Metadata-driven data integration provides a standardized approach to integrating data from multiple sources. It ensures that all data sources are integrated using a set of managed rules and standards, improving data quality, and reducing the risk of errors. Traditional data integration, on the other hand, relies on manual coding and can lead to inconsistencies and errors.

(2) Reusability: Metadata-driven data integration promotes the create-once-and-re-use approach. The mappings between the source and target data structures can be reused for future data integration projects, which further reduces development time and cost. In a traditional approach, developers build point-to-point pipelines which cater to specific use cases and are often not reusable.

(3) Automation: Metadata-driven data integration automates various steps of the data integration process, eliminating the need for manual mapping effort. Using advanced metadata-managers, developers can automate source-to-target mapping. In this augmented approach metadata manager can use a Machine Learning (ML) driven auto-data-mapper/classifier to analyze and compare the metadata, data, semantics, contexts, relations across data sets and predict the source-to-target mapping along with the rules needed to transform source data into the desired target data. On the contrary, traditional data integration requires a Business Analyst (BA) to manually profile data, prepare the source-to-target mapping and then developers to write custom code for each data mapping, which is both inefficient and time-consuming.

(4) Flexibility: Metadata-driven data integration provides a flexible and scalable solution. Since it is configuration driven, there is no need to code every time for a new source or requirements. Businesses can add new data sources and data structures as their needs change, and they can scale their data integration processes to meet their growing data integration needs. Traditional data integration is limited by the skills and availability of developers, making it less flexible and scalable.

(5) Easier Maintenance: Metadata-driven data integration is easier to maintain than traditional data integration. Changes to the data integration process can be made by updating the metadata, rather than modifying the code. This makes it easier to update and maintain the data integration process over time.

(6) Improved Collaboration: Metadata-driven data integration promotes collaboration between developers and business users. Business users can create and manage metadata without requiring any programming skills, which improves communication and collaboration between the IT department and the business users.

In summary, metadata-driven data integration is better than traditional data integration because it provides a standardized, automated, flexible, and easier-to-maintain approach to integrating data from multiple sources. It promotes collaboration between developers and business users and can be scaled to meet the growing data integration needs of businesses.

The Three Components of a Metadata-driven Data Integration Framework

A metadata-driven integration framework includes three major components.

The first component is the metadata repository. The metadata repository is a centralized database that stores the metadata about the data sources, data structures, and business rules. The metadata repository also stores the mappings between the source and target data structures, orchestration rules, job-run/audit information, water-mark tables, and other supporting configuration information that’s relevant for the metadata-driven pipelines.

The second component is the metadata management tool. The metadata management tool is used to create, update, and manage the metadata stored in the metadata repository. The metadata management tool should provide an intuitive user interface that allows non-technical or business users to create and edit metadata for source to target mapping along with transformations, orchestration, exception handling, data validation rules without requiring any programming skills. As mentioned above, with the help of ML-driven data classification algorithm, this metadata manager can also be upgraded to an auto-data-mapper or classifier, that can auto generate source-to-target mapping with little to no human intervention.

The third component is the integration engine. The integration engine is responsible for reading the metadata from the metadata repository and using it to perform various actions to integrate data from various sources. The integration engine uses the mappings stored in the metadata repository to transform the data from the source format to the target format. To build such an engine you don’t have to re-invent the wheel, as lots of off-the-shelf integration and orchestration tools like Talend, Informatica, Matillion, Glue, Azure Data Factory, DBT, Airflow, and Databricks can support this design with some customization/combination. Also, tools like Fivetran, Stitch, and DBT are already several steps ahead in adopting this methodology. Therefore, technology is not a challenge to adopt this framework.

Eight Key Principles of building a metadata driven data integration framework

(1) Metadata is the Foundation: Metadata should be considered as the foundation of the data integration framework. It should be used to describe the data assets, including their structure, content, quality, lineage, and usage.

(2) Standards-Based: To ensure consistency and interoperability, it’s important to use a standardized metadata model that is applicable to all data assets being integrated. This model should cover key aspects of data integration such as data structure, data quality, data lineage, and data usage.

(3) Business-Focused: The metadata should be business-focused, meaning that it should describe the data in terms that are meaningful to the business stakeholders. This includes using business language to describe the data, as well as aligning the metadata with the business goals and objectives.

(4) Integrated: The metadata-driven data integration framework should be integrated with other systems and technologies used in the organization. This includes data profiling tools, data quality tools, data governance tools, data modeling tools, and data visualization tools.

(5) Agile: The metadata-driven data integration framework should be agile and adaptable to changing business requirements. This means that the framework should be able to accommodate new data assets, new metadata standards, and new data integration scenarios as they arise.

(6) Automated: The data integration framework should be automated to the extent possible, to reduce manual effort and increase efficiency. This includes using tools to automate data mapping, transformation, and loading processes.

(7) Governed: The metadata-driven data integration framework should be governed by a set of policies and procedures. This includes defining roles and responsibilities for managing the metadata, as well as defining processes for resolving metadata-related issues.

(8) Measurable: The metadata-driven data integration framework should be measurable, with key performance indicators (KPIs) established to track its effectiveness. This includes measuring data quality, data lineage, and data usage.

By following these key principles, organizations can build a robust and effective metadata-driven data integration framework that supports their business goals and objectives.

Some of the use cases for a metadata-driven data integration framework are –

(1) When migrating data to a new system, metadata-driven data integration framework can be used to map and migrate the data from the old system to the new system. This reduces the time and effort required to migrate data and ensures that the data is consistent and accurate.

(2) Metadata-driven data integration frameworks can be used to integrate data from multiple sources such as databases, APIs, and files. This makes it easier to manage and analyze data from various sources. Once the metadata-driven pipeline is built then business users can reuse that pipeline and access or integrate data from different sources just by defining source-to-target mapping without the need to program skills. This enables self-service data preparation and analysis, which can improve data democratization and empower business users to make data-driven decisions.

(3) Metadata-driven data integration frameworks can be used for real-time data integration. Using advanced schema/metadata registry, data can be mapped, integrated, and analyzed in real-time, providing organizations with up-to-date insights.

(4) Metadata-driven data integration can help organizations approach data with a product mindset by providing a comprehensive understanding of the data, its attributes, and its use cases. By leveraging metadata to describe the structure, content, and business rules of the data, metadata-driven data integration can enable teams to build and manage data products with the same rigor and discipline as they would with any other product. It can help the analytics engineer (comparatively new but very important specialized role in data analytics) curate the catalog more efficiently so that the researchers can do their work more effectively.

(5) Finally, popular contemporary concepts (like Data Fabric) need a robust data integration backbone to succeed. Only a metadata-driven approach can help with standardizing and unifying metadata across different systems and platforms, improving data governance, enabling the reuse of data integration processes, and supporting self-service data preparation and analysis. It can make a data integration platform easily compatible with various data delivery styles (including, but not limited to, ETL, ELT, streaming, replication, messaging, and data virtualization or data microservices). Therefore, it’s essential to adopt this approach for implementing Data Fabric.

Making it Work for You

A Metadata-driven data integration framework is a solution that simplifies the data integration process. It basically turns traditionally ignored passive metadata into an active metadata. It provides a standardized, automated, and flexible approach to integrating data from multiple sources. The metadata-driven integration framework reduces development time and cost and improves data quality. As businesses continue to rely on data, metadata-driven data integration framework will become even more important in the future.

At Fresh Gravity, we follow this framework and have built a reusable, ready-to-deploy data integration package that follows the design principles outlined above. We have successfully built various Data Integration (DI) platforms using this approach with tools like Talend, Matillion, Glue, Azure Data Factory, and Databricks, among others. Here are some of the key benefits of using Fresh Gravity’s ready-made Metadata-driven data integration package:

(1) The base version of the integration package can be deployed in 4-6 weeks, as it comes with ready-to-use boilerplate pipelines for preferred DI tools

(2) All the pre-built pipelines are not only designed for metadata-driven data processing, but also equipped to handle custom orchestrations, error handling, and other important tasks along with ELT (Extract, Load, Transform) based data massaging

(3) It comes with a pre-defined and ready–to–deploy Metadata Repository

(4) It comes with an intuitive UI to add/update/manage Metadata seamlessly

(5) As an added feature, Fresh Gravity has also developed an AI-driven auto-data-mapper/metadata manager, called Penguin, that simplifies and accelerates the data mapping process by automatically analyzing the metadata, data, semantics, contexts, relations across data sets and predicting the source-to-target mapping for any given data sets

(6) Finally, it comes with an out–of–the–box audit-balance-control log to ensure better operational control

Please reach out to Soumen Chakraborty at soumen.chakraborty@freshgravity.com if you want to schedule a demonstration of Fresh Gravity’s Metadata-driven Data Integration Framework.

Please follow us at Fresh Gravity for more insightful blogs.

Data Management, Industry-agnostic

A Coder’s Legacy: 7 Guidelines if you work in the Data Management space

March 9th, 2023 WRITTEN BY Soumen Chakraborty, Director - Data Management Tags: coder, coding, Data, data management, GIT, Industry-agnostic

Written By Soumen Chakraborty, Director, Data Management

In my opinion, a coder can be guilty of two things. Either we over-engineer i.e., try to solve everything in one go, instead of following an iterative approach, OR we under-engineer, i.e., just code without understanding the impact. What we need is to attain the ‘middle ground’.

Here are 7 guidelines to ensure we are always in the ‘middle ground’:

1) Don’t just code the requirement. You must understand the problem fully. You’re a Data Person, you should care about the problem from the data’s perspective. Building complex code is cool, but spend more time understanding and analyzing the requirements from the data’s point of view. That is more important than what tool, language, or technology you are using to process it.

2) Unit Testing is part of coding, not a separate exercise. Dedicate 25-30% of development time to Unit Testing. As an example, if it takes 8 hours to code, you should allocate a minimum of 2 hours to Unit Test that code. In my opinion, 75-85% of testing coverage should come from Unit Testing and the rest from Test Automation. Remember, SIT (System Integration Testing) is not for testing one piece of the puzzle (your code only) but the entire puzzle board (all integrated code). So don’t rely on your best friend on the QA (Quality Assurance) team to figure out what you did last summer. Spend time thinking about the test cases, for example, a data integrity check before and after processing, or code performance metrics. If you are NOT clear on the unit test cases, then don’t start coding. Seek more clarity until you can visualize the output. Keep in mind, that unit testing is not just checking if the code runs but checking if it generates the right output in the specified time.

3) Don’t use a hammer to crack a nut. You don’t need to consider all possible edge cases in the world while designing. Perfect code that sits in your machine has no impact, whereas merely “okay” code in production adds value. Keep your design simple but ensure the code is nimble; you can always increase complexity later if needed. Question the design. One of the most common reasons for poor design is NOT understanding the underlying technology enough and trying to solve every need with a custom approach. For example: if you have more supporting custom tables to hold your code processing information than actual data tables, then you are either not using the right tools for processing, OR not using the out-of-the-box features efficiently. This design is neither sustainable nor scalable.

4) Don’t let your experience take over your imagination. Very often we refuse to see the problem with fresh eyes and always try to tie every new problem back to problems we have solved earlier. That’s the wrong approach. Keep in mind, we are living in an age where technological advancement occurs rapidly. Do your due diligence and see what’s new before dusting off your old toolbox.

5) Asking for help and using Google (now ChatGPT), is the most powerful skill. There is no point in spending days trying to solve a problem yourself when someone has already done it or can do it for you within minutes. However, before asking for help, document the logical steps you’ve followed with pseudo code and summarize why you think it’s not working. This logical breakdown not only helps an expert to make a resolution faster but also helps you search for the right content.

6) Reusability is the key. Make sure your code is well documented, clearly comment on your code, make it modular (break your code into logical units that can be tested individually), and make it configuration-driven. Anyone (including you) should be able to easily understand (remember) what you did a few months or even years ago.

7) GIT is your best friend, NOT some annoying Ex from your past. So, please stop treating GIT as an “extra” task! Once you make using GIT a habit you will realize how it makes your life easier. Follow some basic rules of thumb: Take a feature branch approach, always pull before push, push daily (and encourage others to do the same), merge feature with dev only after the feature is tested, and do not push to master. Trust me on this, you will thank me later. Code Repos are invented to help developers, not the other way around.

In the end, it’s all about having fun. Keep in mind, that the code you write, whether small or big, easy, or complex, is your unique creation. It’s your legacy, so treat it well. Otherwise, what’s the point?

If you have any thoughts, comments, ideas, or feedback, please reach out at soumen.chakraborty@freshgravity.com.

Author: Soumen Chakraborty, Director - Data Management

Unlocking Efficiency: The Power of Auto Data Mapping Tools for a Data-Driven Enterprise

Data Observability is the new Data Quality – What, Why and How ?

Why You Need a Metadata-driven Data Integration Framework

A Coder’s Legacy: 7 Guidelines if you work in the Data Management space

Fresh Gravity, Inc

CAPABILITIES

INDUSTRIES

ABOUT US

JOIN US