8 mins read

“Navigating the Future: A Leader’s Odyssey in Data and Analytics Engineering, Cloud Migration, and AI Innovation”

Pankaj Gupta Fellow BCS, Fellow RSA
Pankaj Gupta Fellow BCS, Fellow RSA

Vighnesh: Can you describe your role as a Manager of Data and Analytics Engineering in your organization and the key responsibilities it entailed?

Pankaj: As the Manager of Data and Analytics Engineering, my main task is to lead and oversee the team that handles data and analytics. I make sure we handle, process, and understand data well to find useful insights for our business and help in decision-making.

For us, data is crucial, especially since we’re a financial institution. You can imagine how vital it is for our operations. In today’s fast-paced world, where things happen in real-time, the quicker we get and process data, the sooner we can analyze it and extract valuable information.

Vighnesh: You’ve worked extensively with technologies like AWS cloud, data engineering, microservice, open source. Can you share a challenging project where you leveraged these technologies?

Pankaj: These are just a few examples and technologies keep changing. I have been in the industry for over 16 years, starting initially with mainframes back in 2006 while working with Barclays Bank. I consider one of the most challenging projects to be my involvement with a major U.S. bank in 2017. I was tasked with creating a data ingestion platform to expedite the bank’s adoption of artificial intelligence. The primary challenge was establishing a universal framework that could be applied globally to cater to the data ingestion requirements from thousands of internal and external applications for the bank. This framework was config driven with auto generation of deployable artifacts.

Vighnesh: As a leader of a high-performing engineering team, how do you approach team management and foster a culture of continuous learning and growth?

Pankaj: – I firmly believe that any team or team member can achieve high performance when empowered with autonomy and ownership over their projects. When team members comprehend the business value of their work and the impact they make on the organization and society, they thrive. Regular feedback, not just one-way but a two-way dialogue where I also seek improvement suggestions, fosters a sense of belonging and teamwork. Recognizing their efforts and offering flexibility in work arrangements are equally pivotal. Maintaining a healthy work-life balance ensures that team members are motivated to consistently deliver their best.

Vighnesh: Given your experience in implementing AI and ML solutions, can you discuss a project where AI/ML was critical in solving a business problem?

Pankaj : Imagine data as the power source for smart computers. In the world of Artificial Intelligence (AI) and Machine Learning (ML), data is like their lifeblood. It’s what teaches them, makes them better, and enables them to do impressive things. We apply advanced machine learning and artificial intelligence techniques to spot and stop fraud. A great example is creating ‘fraud scores.’ These are ratings that help us identify and prevent fraud. The models use past data to figure out what normal behavior looks like and how to spot anything fishy. They learn from examples of both legit and fraudulent activities in the data.”

Vighnesh: How do you balance the technical aspects of AI/ML with the ethical considerations that come with these technologies?

Pankaj : Think of creating AI and ML like going on a journey. We want to make sure we’re doing it in the right way. So, first, we focus on transparency—making sure everyone understands how our smart machines make decisions. It’s not just about having accurate algorithms; it’s about showing others how these digital brains figure things out. Now, think of strict data protection as our guide on this journey. This guide makes sure that all the information we use in our smart machines is treated with care. We don’t let any data slip away because we’re training our model to learn the right way.

Once our model is trained, it’s like a well-prepared explorer with the right knowledge. But our journey doesn’t stop there. We start a mission to find and fix any unfairness in the data and algorithms. It’s our way of making sure everything is fair and we don’t end up with unfair results. In this journey, every step is taken carefully to follow ethical rules, just like a responsible traveler in a new place. As our smart machines get better, we promise to keep doing things the right way—making good decisions and being fair and open about it.

Vighnesh: How do you stay updated with emerging technologies and industry best practices, especially in AI and machine learning?

Pankaj: I learn a lot by paying attention to others and reading what experts have to say. I regularly check out trusted publications, journals, and blogs in my field. This helps me stay in the loop about the latest research, real-life examples, and what’s happening in the industry.

These days, there are plenty of courses  and articles online. It’s like having a bunch of resources at your fingertips to stay updated on everything going on in the industry. So, learning and staying informed has become more accessible for everyone.

Vighnesh: You have experience in cloud computing and AWS. Can you describe your involvement in the AWS cloud migration project  in your career ?

Pankaj : The migration to the cloud has been a fascinating journey for me. I played a crucial role in ensuring that all the application processes under my leadership were seamlessly transitioned. This involved meticulous planning, thorough testing—both functionally and technically, especially in terms of performance. Imagine dealing with thousands of processes, loading hundreds of tables, and managing complex data pipelines. It was a substantial undertaking.

One of the key lessons I’ve learned is the importance of not rushing into cloud migration. Taking it slow, building robust frameworks, and initiating the migration with clear agreements between the technical team and stakeholders proved to be pivotal. This approach ensured a smooth transition and minimized potential challenges during the migration process.

Vighnesh: What were some of the key challenges and learnings from this cloud migration?

Pankaj : One of the most challenging aspects of the migration was convincing our business users to embrace a new platform. Picture this – these users had been running their processes on legacy system  for nearly two decades, and now, as part of our initiative, we migrated the data to a new platform, specifically Snowflake. It was a significant shift for them, and understandably, building trust in this unfamiliar territory took some time.

Another hurdle we faced was ensuring that complex processes delivered the same results in the new platform as they did in the legacy one. Migrating petabytes of data is no walk in the park,  it demands coordination and concerted efforts from multiple teams. Trust me, aligning such large volumes of data seamlessly requires meticulous planning and the synchronized dedication of various teams involved in the migration process.

Vighnesh: With your extensive experience in data warehousing and ETL development, can you share insights into your approach to designing efficient data pipelines?

Pankaj: Absolutely, designing data pipelines in the realm of Banking and Finance is both challenging and exciting. The huge amount of data created because of digitalization is mind-blowing. It’s like an art form to handle it effectively. Here is what I think.

The first and foremost consideration is having an in-depth understanding of the data. This includes knowing its volume, frequency of updates, and identifying any Personally Identifiable Information (PII) that needs special handling—either through tokenization or masking. Data privacy is non-negotiable, and designing efficient pipelines starts with securing sensitive information.

I firmly believe that the foundation of any data pipeline should be robust data security. No matter how fast a pipeline can process data, if it compromises on securing PII or gets hacked at either database layer or processing layer, it’s essentially useless. So, the primary check is always on data security.

Understanding the nature of the data—whether it’s streaming events or batch ingestion—becomes crucial. Additionally, considering factors like the frequency of data updates and ensuring that the pipeline is adaptable to handle both real-time and batch processing scenarios is key.

Designing pipelines that are generic and developer-friendly is a priority. Making it easy for developers to adapt by changing configurations rather than rewriting code ensures efficiency and reduces development time. This is where microservices and loosely coupled systems shine.

Efficiency often comes from parallel processing. By distributing tasks across multiple processors, parallel processing not only speeds up the pipeline but also allows for better scalability as data volumes grow.

Ensuring that the pipeline is restartable from the point of failure is crucial. This not only aids in fault tolerance but also prevents the need to reprocess the entire dataset in case of fixes or updates.

In simple terms, creating efficient data pipelines for banks and financial organizations is like finding the perfect balance. It involves really understanding the data, making sure it’s secure, thinking about how it’s processed, keeping things easy for developers, using multiple processors at the same time, and making sure the system can restart smoothly if there’s a problem. It’s a bit like being a skilled chef – you need the right ingredients (data), the right techniques (security, processing), and the ability to adapt (generic designs, restartability) to create something that works perfectly. It’s a mix of technical know-how and understanding the unique challenges in the finance world.

Vighnesh: How do you ensure the scalability and performance of these data solutions?

Pankaj: In the context of modern infrastructure, where the cloud offers remarkable capabilities, our systems are strategically designed for horizontal scalability. This means that as demand increases, we have the flexibility to scale our resources horizontally, ensuring that our applications can seamlessly handle a higher load without compromising performance.

As part of our proactive approach, we closely monitor long-running jobs. If a process takes longer than expected, it triggers an alert, allowing us to swiftly address any potential bottlenecks or inefficiencies in real-time. For those instances where we do identify bottlenecks, we conduct thorough root cause analysis. This involves delving deep into the system to understand why a particular process is taking longer than expected. Once we pinpoint the root cause, our focus shifts to making the process more efficient, optimizing it for better performance.

In essence, our monitoring practices are geared towards early detection, proactive response, and a continuous cycle of improvement. The overarching goal is to ensure that our systems not only meet the demands placed on them but also operate at peak efficiency, ultimately delivering maximum value to the business.

Vighnesh: How do you align your data and analytics strategies with broader business objectives?

Pankaj: Certainly, working together closely with different teams is crucial to make sure our plans for handling data and analytics fit well with the bigger goals of the business. We regularly have meetings with the folks in various parts of the business to understand what’s most important to them. This helps us shape our strategies in a way that matches their changing requirements. It’s like staying in sync with the heartbeat of the business, adjusting our plans to keep up with what matters most to them as things evolve.

Our agile approach plays a pivotal role in this alignment. Unlike traditional waterfall models, agile methodologies enable us to define and achieve smaller, incremental goals. This proves invaluable in responding swiftly to changing business dynamics. We can pivot our strategies based on immediate business requirements rather than waiting for the completion of lengthy projects.

In simple terms, we see data as a powerful tool that drives our whole organization towards its big business goals. It’s like the engine that pushes us forward. To make sure our use of data is always in sync with what the business really needs, we stay closely connected with what’s happening in the business world. This way, we can be flexible and quick to adapt to the ever-changing needs of the business.

Vighnesh: Outside of your professional work, what do you do to further your skills and knowledge in the field of data and analytics?

Pankaj: I’m quite engaged on social media, particularly on LinkedIn. I make it a habit to share my insights through various articles. I’m proud to be recognized as a top voice in the data engineering field. Writing journal articles for different research forums is a significant part of my routine, and it involves a good deal of reading. I firmly believe that staying updated involves not just observing other leaders but also keeping an eye on the latest trends in the field.

Leave a Reply