#244: Navigating Data Quality: Insights from the Chief Operator of Data Quality Camp

This week on the Data Futurology podcast, we host Chad Sanderson, the Chief Operator of Data Quality Camp.

Over the ten years Sanderson has been involved in data, he has held key roles in companies including Convoy, a late-stage freight technology company, and Microsoft, where he worked on the AI platform team.

Sanderson’s experience with these companies made him realise that there was a need for a platform where data specialists could come together and discuss strategies for maintaining high-quality data in their organisations.

His group, Data Quality Camp, has since attracted nearly 8,000 members, and has become a real meeting place to discuss everything from the technical implementation of a data strategy, through to helping members find work in an increasingly dynamic and disrupted workplace environment.

On the podcast, Sanderson highlights the strategies he has seen to deliver high-quality data environments, some of the traps and pitfalls to avoid, and how data specialists can better engage with and gain buy-in from the other lines of business within the organisation.

For insights direct from someone at the heart of the data quality conversation, don’t miss this in-depth conversation with Chad Sanderson.

Join the Data Quality Camp on Slack (https://dataquality.camp/slack)

Connect with Chad: https://www.linkedin.com/in/chad-sanderson/

Thank you to our sponsor, Talent Insights Group!

Join us for our next events: Advancing AI and Data Engineering Sydney (5-7 September) and OpsWorld: Deploying Data & ML Products (Melbourne, 24-25 October): https://www.datafuturology.com/events 

Join our Slack Community: https://join.slack.com/t/datafuturologycircle/shared_invite/zt-z19cq4eq-ET6O49o2uySgvQWjM6a5ng

“If the business has a really strong financial incentive to focus on data quality, and they can say something like “hey, if we get our data in shape, we're going to make $100 million or a billion dollars,” then it might be an easier sell. But most companies are not like this. Most companies do not see their data model as a revenue generator. Usually, only the data team sees it that way. And that's going to make selling to any leader in the company very challenging.”

Chad Sanderson, Chief Operator of Data Quality Camp

 

WHAT WE DISCUSSED

00:00 Introduction

2.36 Sanderson introduces himself and the background to Data Quality Camp.

3.26 Sanderson explains what inspired him to start the Data Quality Camp community.

4.27 Some of the challenges Sanderson has seen affect data scientists in ensuring data quality at scale.

7.48 Sanderson shares his views on best practices for fixing and maintaining data.

9.49 Sanderson shares his views on how data scientists and organisations should think about improving data quality.

13.49 Why taking the “big bang” approach to data quality might not be the most effective approach.

19:32 Sanderson shares his idea of a “maturity scale” for data projects.

20:21 Sanderson explains how data scientists should set a project up for success from the start.

22:12: Sanderson describes some of the most common problems and issues when rolling out contracts.

29.03 Once data quality has been assured at the top, Sanderson describes his vision for what happens further downstream with regard to alerts, monitoring, and exceptions.

32.13 Data scientists can struggle to get support and investment in their initiatives. Sanderson describes his ideal approach to this challenge.

35.33 Sanderson describes what he finds particularly rewarding about building his Data Quality Camp community. 


EPISODE HIGHLIGHTS

  • “I was trying to roll out a data quality infrastructure project at Convoy, and there wasn't much good information about exactly how to do that. There were a lot of vendors involved, and ultimately not being able to find objective information from experts made things really difficult to deal with.”

  • “One of the big issues that often happens is with regards to change management, where you've got various people in a company that might own different pieces of data, and they're leveraging that data for their particular use case and the changes that they make to their datasets may not beneficial to downstream teams that have taken dependencies on those data sets.”

  • “The analogy that I like to use is if you don't have any type of preventative system in the operational layer then it's a bit like having a fire alarm but no firemen. You're going to get the notification and you're going to see the loud beeping sound and it's probably going to wake you up in the middle of the night. But if there really is a fire there, then that's a big problem. You need someone to come and actually put it out for you or else your house is going to burn up.”

  • “What I've tried to do in every new data role that I start is to first account for where the data is being used. That's very important… If you're not quantifying it, then it's really hard to ever lobby for better data quality solutions.”

  • “If you just have a bunch of alerts that are firing off all the time and tests that are failing all the time and you're blocking PRS all the time, then ultimately the question arises; 'does this stuff actually matter? Is there a good reason why I'm being bombarded with these alerts?' And if the answer is no, and if 90% of the alerts are not useful, I'm just going to start tuning them out altogether. Then you've shot yourself in the foot.”

  • “If the business has a really strong financial incentive to focus on data quality, and they can say something like “hey, if we get our data in shape, we're going to make $100 million or a billion dollars,” then it might be an easier sell. But most companies are not like this. Most companies do not see their data model as a revenue generator. Usually, only the data team sees it that way. And that's going to make selling to any leader in the company very challenging.”

  • “I always recommend having a meeting with those producers because oftentimes they are shocked. Every time that I've done this, they're very surprised that there are so many imported use cases that have taken a dependency on their data that they didn't even know existed."

  • “What you want to happen is that when an engineer makes a change you can pick the brand-new schema from the output of this migration, compare it against the contract, see if there's any incompatibilities. If there is, then you try to surface some communication in the PR, which is kind of the perfect time to alert someone because it's the moment that you're kind of being forced to take accountability for the situation.”

  • “One thing that I’ve seen not enough attention paid to is the communication and visibility aspect of the contract. It's used like a bludgeon to force the data producers to do things the way that the data consumers want. While that can work in some organisations, in a lot of companies that can create organisational friction.”

  • “It really needs to be a collaborative effort. I always recommend that the consumer goes to the producer and says, 'Here's the contract I propose. That's going to work for my downstream data assets.’ And that's something that can evolve over time.”

  • “If the monitor isn't directly actionable, it's not useful. It must be actionable in order to be valuable. What I recommend is including the monitors and the test as part of the data contract workflow.”


At Data Futurology, we are always working to bring you use cases, new approaches and everything related to the most relevant topics in data science to help you get the most value out of these technologies! Check out our upcoming events for more amazing content. And as always, we appreciate your Reviews, Follows, Likes, Shares and Ratings. It really helps new data scientists find us.