#135 How AI is Transforming Retail with Khalifeh Al Jadda - Director of Data Science

Untitled design (16).png

We have Khalifeh Al Jadda, Director of Data Science at Home Depot. Khalifeh has solid knowledge in large scale machine learning and data mining techniques.

He tells us how in retail some businesses are still running manually, without the use of automation tools, so the job of data science leaders is to educate business partners and show them by example and with data the value that data science and artificial intelligence can deliver to their organizations. 

Khalifeh tells us about how they are using visual search to provide recommendations of products visually similar to the ones people are searching for. As an example, he talks about the Color App created by The Home Depot. This helps their customers visualize different paint colors in their houses, they just need to take a picture of the wall and they can preview different colors on it.

Later on, he explains their workflow is managed by product managers and starts with the data scientist taking the responsibility of building the machine learning model, training the model, validating it and even going all the way towards testing it. Then, machine learning engineers take over and scale up the code, they clean it, perform unit testing and do everything needed for it to be ready for production.  

Stay tuned to learn how he is applying semantics, query understanding and machine learning to understand search and improve recommendations. 

Enjoy the show!

We speak about:

  • [1:20] What was your journey leading up to being The Home Depot’s Director of Data Science?

  • [9:50] What are your overall responsibilities [at Home Depot]?

  • [15:25] Can you tell us about the work that you’ve been doing in the search space?

  • [21:20] Do you think the semantic approach will be used less over time and we’ll use the vector and embedding approaches more or will they be complimentary?

  • [23:57] What are the ways you work with the domain experts to get their knowledge into the semantic layer?

  • [25:15] When it comes to the search, how do you measure success?

  • [27:34] To do the A/B testing, do you have a centralized platform that helps with experimentation or is it up to each individual project to track the improvements of their work?

  • [29:40] Tell me what you have been doing on the visual search side.

  • [34:55] How do you get that functionality out to people?

  • [37:25] How is the recommendation different and can you tell us a little bit more about it?

  • [44:10] Do other customer’s interactions, different from purchasing, feed into the recommendation algorithms?

  • [45:35] How do you choose between all the recommendation algorithms when displaying recommendations to customers?

  • [47:40] How do you pick what to prioritize or what projects to focus on?

  • [49:50] Is there tension from other parts of the business of them wanting to buy data science products from third parties instead of using your team, and if so, how do you manage this tension?

  • [54:17] With everything that you’ve done in your career, what are you most proud of?

Resources:

Khalifeh’s LinkedIn: https://www.linkedin.com/in/khalifeh-al-jadda-ph-d-929a5020/  

The Home Depot on LinkedIn: https://www.linkedin.com/company/the-home-depot/ / 

Quotes:

  • "I started my journey in the industry from that point, which was the first internship I got in 2015. That’s the first advice I give to anyone, if you are in graduate school, if you are a student, make sure to pursue an internship before graduation. It’ll make your life much easier after graduation."

  • "It’s been an interesting journey, a great journey. I hope that everyone actually goes through those challenges in their careers, because  it makes you a better and stronger data scientist."

  • "You cannot build a team with only computer vision people, or a team with just statistical people. You need to bring people from different backgrounds. That’s what my organization includes. It includes people from all backgrounds."

  • "This is the next generation of search that we are interested in at this point. This summarizes where the search is headed in the industry, from semantic using the query parsing, query understanding, query rewriting and expansion towards deep learning and vector based which is the next generation that we are working on. "

  • "From experience, when you have domain knowledge in house, domain knowledge is not easy to integrate with deep learning models. Besides providing level data from the domain experts, that’s something, but having the knowledge that those merchants or the people who have been doing business. Having that knowledge represented somewhere it is easier to represent that knowledge using the semantic search capabilities that we talked about, how to parse the query, how to tag those entities in the query and what to show and not show for specific queries. I think if you want to leverage the domain knowledge that you have in house, then you need to come up with a hardware technique or a hardware strategy where vector search is going to be good absolutely for queries and search. So when I have things that I’ve never seen before, when I have long questions that people put in the search box, parsing and tokenization and all those things are most likely going to fail miserably. The best thing in this case is going back to deep learning and let deep learning do the magic of generalization by actually trying to understand what is the meaning behind this question or the context behind this question in the search box. But for the queries that you are pretty much sure what they are about, they are very popular in your website and you know exactly how to handle them with the parsing and tokenization, I think you don’t need to retire that system for this type of queries. My vision is it’s going to go, at least for the near future, hand in hand, where you’ll still have semantic search somehow for the parsing and popular queries. Then for the tel queries you can rely on deep learning to answer those questions."

  • "It's about closing a feedback loop. You need to have a system where those people with domain knowledge actually can look at the results that your system generates and then they can provide the feedback. Then this feedback can be feeded into your algorithms to make them smarter and better over time."

  • "It is a centralized organization or team that orchestrates the A/B test. The reason for that is that if you let each team run their own A/B test you have two risks here. The first risk is that some tests are going to overlap. So if you are testing a feature and there is another test going on at the same time for the same kind of module or the same page on the website then there will, most likely, be a conflict there. You don’t know when you get the results if this is because of your feature or because of the other feature that was tested at the same time. The second risk that you take, if you do the A/B test separately, is that who is going to interpret the results and be objective about it. We know that when we build something we become attached to it, it’s your baby, and then when you test it you are going to try to find any way to justify that you are doing a good job. That 's dangerous. I think we’ve all been through that at some point in our careers where we built something and got so attached to it that we don’t want to make any changes to it and we believe it’s the best thing in the world and we just want to get it out. "

  • "Usually we work with product managers and engineers, so the data scientist takes the responsibility of building the model, the machine learning model, training the model, validating and even going all the way towards testing. Once that model proved that it works, then the data scientist works with the engineers. At that point the engineers take that work and pre-factor the code, because you know data scientists are not good software engineers, let’s face it. Many people in data science, they did not actually come from computer science backgrounds. They do know and are very smart, they  know how to design very sophisticated machine learning models in Python or Tensorflow, but the code they write is good for validating and testing but it’s not good to be serving in a production environment. The code needs to be scaled up, the code needs to be clean, unit tested and all those things you need to have for production, so their code just won't do it. The engineers take care of that, so the data scientists hand over their models to the engineers, we call them machine learning engineers in the company, they take that and do all the requirements and required jobs to make it production-ready, then they deploy it to production. It’s a workflow from the data scientists to the engineers and there are product managers that manage this workflow, they orchestrate the work between the data scientists and engineers and this is how we work."

  • "It starts with search, after search they start engaging with recommendations. Those are like the two core functionalities that any ecommerce website needs absolutely to focus on."

  • "We align our product managers to identify the business opportunities. According to the business opportunities this is what drives the prioritization, plus a customer pinpoint. What really is the client or customer pinpoint that we should solve as soon as possible. Those are the two factors that usually impact our decision of prioritization, the business opportunities plus customers pinpoint. How many customers are we going to help if we solve this problem and then also on the business side what is the opportunity from the business perspective, the dollar value that we are going to gain if we implement that feature or this algorithm."

  • "We are not against using vendors whenever there is a need and this is a message for all the data science leaders and data scientists. I can tell you at the beginning of my career I was so sensitive to that topic that when someone said they were going to get a vendor to do something, I always said we should do it or we could do it, but it turns out sometimes you have a limited capacity in house. You don't have 200-300 people doing data science in your team so you have limited capacity and the things that you can do are a lot, the business opportunities are a lot. So if there is something that a vendor can provide or offer for now, which your team cannot get into, then why not? They can come in, they can do it, the business will get the benefit of it, by the time we get to that work or that problem to solve we have already that vendor’s algorithm. That vendor’s algorithm has been powering, we have the metrics that we collected, the performance of that algorithm so when we develop something we can test it against it."

We are now on YouTube! Watch the episode here: https://youtu.be/2FuvFLQvGGQ

And as always, we appreciate your Reviews, Follows, Likes, Shares and Ratings. It really helps new data scientists find us.

Thank you so much, and enjoy the show!