#240: Overcoming the challenges facing modern data engineering teams

This week on the Data Futurology podcast we host Paul Milinkovic, the APAC Regional Director for the leading data integration platform, StreamSets. Milinkovic joins us to share his insights into data engineers' challenges and the pipelines they manage and maintain.

One statistic really highlights just how challenging work environments have become for data engineers: 76 per cent of organisations have a pipeline break at least monthly and for 36 per cent, it's weekly. Rather than contributing strategically to their organisations, engineers split their time between diagnosis and repair, and building new pipelines. This costs the organisation, as half the time the engineer isn’t being used strategically. It also leads to cultures of over-working, burnout, and high levels of churn within the data engineering team.

Another challenge data teams struggle with is competing priorities. When multiple lines of business need pipelines developed, teams often need to triage to accommodate priority tasks, and this affects overall company outcomes. Being able to help organisations deliver a low or no-code environment that is highly visual and accessible to non-data specialists has been a critical benefit for organisations that have adopted StreamSets.

Milinkovic then shares two case studies where StreamSets has helped with overcoming these challenges. In one, a bank achieved a seemingly impossible task – becoming compliant with looming Consumer Data Act requirements within four months. Then, a second bank was able to leverage StreamSets to its data to detect and thwart $9 million in fraudulent activity in a single month.

For more deep insights into overcoming the challenges facing modern data engineering teams, tune into the podcast!

Website: https://streamsets.com

Follow on LinkedIn: https://www.linkedin.com/company/streamsets/

Whitepapers: 

https://go.streamsets.com/Whitepaper-Dollars_and_Sense_UGLP.html?utm_medium=website&utm_source=DataFuturology&utm_campaign=eg_dollars_and_sense_of_dataops

https://go.streamsets.com/Whitepaper-Dollars_and_Sense_UGLP.html?utm_medium=website&utm_source=DataFuturology&utm_campaign=eg_dollars_and_sense_of_dataops

 https://go.streamsets.com/230214-lifting-the-lid-on-data-integration-UGLP.html?utm_me[…]turology&utm_campaign=eg_lifting_the_lid_on_data_integration

“If someone leaves the organisation and it’s going to take ten weeks to replace them, and your pipeline breaks in that time, what impact does that have on the business? And this is all because you’ve put too much pressure on that person and they’re not getting fulfillment in their job.”

 Paul Milinkovic, the APAC Regional Director for the leading data integration platform, StreamSets

WHAT WE DISCUSSED

00:00 Introduction 

02:22: Felipe introduces Paul Milinkovic. 

03:38: Milinkovic shares his background and his history with data at various levels and applications. 

06:04: Milinkovic overviews StreamSets – when and why the company was founded, and what its core capabilities are. 

09:04: What are the main issues that StreamSets helps data engineering teams solve? 

12:57: How does StreamSets address traditional data pipeline design and build challenges? 

12:33: What are the benefits of having a solution that is visual and accessible to non-technical users? 

22:51: One of the common questions with the self-service approach to data is governance. How can that be handled while still allowing full flexibility? 

26:46: Data engineers care a great deal about the quality and accuracy of data and the platforms that it sits on. Milinkovic explains why it is so important that they have the tools to be able to deliver that to the organisation. 

31:24: What is the financial impact of data engineering teams spending as much time fixing pipelines as they are? 

33:49: Milinkovic shares some case studies and use cases to highlight the value of StreamSets’ approach to data engineering.

EPISODE HIGHLIGHTS

  • “I like the story about how StreamSets started. One of the founders worked at a bank in America and that bank merged with another bank. To understand which bank the people's accounts came from, they prepended a “00” of a “01” and turned the account number from an eight-digit number to a ten-digit number. And then what happened? All the data stuff broke, and then they had to work out where that was as part of that integration process. The people on the integration side, even on the IT side, never gave this any thought and so the founder thought ‘there's just got to be a better way for this.’”

  • “Anyone that works in the data space knows that data integration is extremely different to application integration.”

  • “When you're looking at traditional methods of creating data pipelines and creating that data integration, it can take an inordinate amount of time. There have been some legacy products that have tried to do something, but they started in the late 90’s and early 2000’s. They’ve tried to move with the times, but 25 years later they haven’t necessarily done that very well.”

  • “A BI project that I was working on for a large telco promised that you would get yesterday's sales figures by 09:00 a.m. the next day. But that doesn't cut it anymore. When you want your sales figures, you want them up to the minute. You don't want to see sales figures that were 24 hours old.”

  • “We are a no-code or low-code solution, but some coders love to code and we make that available. You can code the entire pipeline if you want to. But one of the challenges is when a person leaves, how do you maintain the code? The way we address that is to offer a coding solution where, when the coding is done it is then available on the canvas. Then, somebody that understands code can see the visual pipeline in a modern canvas.”

  • “Someone in the business will say ‘I need this billing data, from this point, and so on,’ and the data leaders or sprint managers will say ‘I’ve got eight weeks of sprints already full. It’s June 15 so we can start that on September 1,’ and the person will say ‘Well, no, I can’t wait ten weeks for the data.’ So then what do you do when your data team can’t deliver everything that’s needed? Because we provide an easy user interface, we empower those teams to be able to effectively self-serve.”

  • “The democratisation of data is a win-win for the organisation because the data engineers start to do more interesting work, the stuff that they want to do, and the business gets what they need from their data because they are self-enabled.”

  • “We’ve seen research that 76 per cent of organisations have a pipeline break at least monthly and 36 per cent that their pipelines break at least once a week. The time and effort fixing that means teams are spending up to 50 per cent of their time finding and fixing where those breakages are.”

  • “I heard of a case where a pipeline was broken for a month before somebody realised, because there were no alerts in it. Somebody changed something and they were making business decisions off month-old data that they thought was up to date.”

  • “Too often we see data engineers and teams put in these massive hours. It’s one thing to do things that are outside the norm, but if it ends up becoming the norm, people just go ‘I'm not going to do this anymore,’ because they've become burnt out. 20 years ago things might have been different, but I think the pendulum is certainly swinging in that direction where life and work need to be balanced, and if organisations are putting too much pressure on their data teams to deliver and it’s just not possible, those people are simply going to go and work somewhere else.”

  • “If someone leaves the organisation and it’s going to take ten weeks to replace them, and your pipeline breaks in that time, what impact does that have on the business? And this is all because you’ve put too much pressure on that person and they’re not getting fulfillment in their job.”

  • “Let’s say that you’ve got a data engineering team that costs you $150,000 a year in salary, and $1.5 million in salaries alone. If you're spending half the time fixing pipelines, you're spending $750,000 a year doing nothing but finding and fixing where these issues are. Just think about what that money could do to improve where the business is?”

  • “We worked with a tier-2 bank that had given themselves four months to comply with the Consumer Data Act in the banking sector. If you were told you had four months to get this done and be in compliance because otherwise there are governmental fines, are you going to bet your house on that? Well, they were able to get it done when they used StreamSets, and get it done in the time frame.”


At Data Futurology, we are always working to bring you use cases, new approaches and everything related to the most relevant topics in data science to help you get the most value out of these technologies! Check out our upcoming events for more amazing content. And as always, we appreciate your Reviews, Follows, Likes, Shares and Ratings. It really helps new data scientists find us.