#241: Building AI systems with quality, holistic data

At the recent Advancing AI event in Melbourne, we were privileged to have a presentation by Vinay Joseph, the Pre-Sales Lead for IDOL at OpenText in APAC.

Vinay gives an overview of the features of IDOL and how they can help data science teams bring automation and AI to the use of unstructured data. He presents a wide range of case studies and use cases. These include how law enforcement and the military, right through to news organisations and political campaigns might be able to use the data to draw real-time and in-depth insights that would otherwise be inaccessible.

OpenText offers cloud-native solutions through an integrated and flexible Information Management platform. IDOL is OpenText’s solution to unstructured data challenges. In his presentation, Vinay summarises where those challenges in handling unstructured data come from. Firstly, there are massive amounts of data to collect, categorise and manage, and then there are the security considerations when working with unstructured data in the first place. An increasing number of organisations are finding themselves out of compliance simply because they are unaware of what personal data they are actually holding within their unstructured data environments.

For more information on overcoming unstructured data challenges, tune in to this presentation.

Thank you to our sponsor, Talent Insights Group for supporting these important industry conversations. 

Join us for our next events, Data Engineering and Advancing AI Sydney (5-7 September) where OpenText will be joining us again as Silver Partners https://www.datafuturology.com/events 

More about OpenText: https://www.opentext.com 

Join our Slack Community: https://join.slack.com/t/datafuturologycircle/shared_invite/zt-z19cq4eq-ET6O49o2uySgvQWjM6a5ng


“We have heard about PII violations in Australia. That often happens because people have unstructured information sitting in file shares or sitting on a web server, and so on and so forth. And they don't know that it contains a name, they don't know that it contains an Australian address, or it contains a driver's license. So being able to detect that using analytics on unstructured text is really where we're able to provide solutions.”

- Pre-Sales Lead for IDOL at OpenText in APAC

WHAT WE DISCUSSED

0:00: Felipe introduces Vinay Joseph and the theme of his presentation.

1:09: There’s an exponential growth in human data, in the form of unstructured data. How can data scientists approach this analytics challenge?

2:38: The security and regulatory challenges of unstructured data – why compliance violations happen and how they can be prevented.

3:54: What is the IDOL platform, and what does it do?

8:18: Joseph explains whether the IDOL platform is pitched more towards business or technical users.

9:10: Joseph overviews what the consumption layer of the IDOL platform looks like for non-technical users.

10:31: Joseph highlights the kind of scale that the IDOL platform has been developed to operate within.

11:31: Joseph highlights a case study of how IDOL helps law enforcement monitor the information that has been stolen and traded on the dark web.

13:59: Joseph shares another case study of how the IDOL platform has assisted the US military with surveillance.

14:31: IDOL can also assist with text mining and entity extraction, and Joseph explains its role there.

16:36: Sentiment analysis is a complex problem within unstructured data. Joseph explains how IDOLD has been built to handle that challenge.

19:42: Once an organisation has collected all its unstructured data, what comes next? Joseph explains the personalisation features of IDOL.

21:04: Graph analysis is another major challenge in unstructured data. Joseph explains how IDOL can help with this challenge.

21:33: IDOL can also assist with video and image analytics, which Joseph explains, highlighting several use cases.

EPISODE HIGHLIGHTS

  • “If you have an expert who can do a particular job, how can you scale that across, say, nine million documents or news articles, so on and so forth?”

  • “We have heard about PII violations in Australia. That often happens because people have unstructured information sitting in file shares or sitting on a web server, and so on and so forth. And they don't know that it contains a name, they don't know that it contains an Australian address, or it contains a driver's license. So being able to detect that using analytics on unstructured text is really where we're able to provide solutions.”

  • “With IDOL, business users could layer in some sort of dashboard using tableau, or it could even be an open-source application such as Fine, which we ship out of the box. You could even have your own ECM systems and your own ingestion pipeline where you might take our analytics and then you might ingest it into something else from there, like a workflow platform.”

  • “We have this thing called the dark web, and there's a lot of private information that's being stolen and sold over there. So, when we talk to our law enforcement customers, what they say is, can you go into this particular location and find out if the following addresses or names or driver's licenses are available? Now, that's a very complex problem because the dark web is huge and it's not like your normal Internet where you can click one link and go to the next link. It's completely mangled up.”

  • “Sentiment analysis is a nuanced field. You can perform sentiment analysis on a Tweet or you can have financial sentiment analysis, as examples. So, for example, Forbes might publish an article that will talk about the financial impact of an event to an organisation. That information might be of interest to a hedge fund that is analysing this information on a regular basis. That's the kind of stuff that we can pull out, we can see if that share price is going up, is that going to go down, and so on..

  • “Here is an example of our Topic Map, built out of nine million news articles. Without any training whatsoever, we can pull out important facts mathematically out of the document corpus. Our customers find this very interesting because it helps them with things such as anomaly detection. I wasn't looking for that, but I found it. And that information is useful to my organisation because now I need to act on it.”

  • “When a user searches for the word “Java”, for example, it could mean a programming language. So, you could return articles on object-oriented programming, etcetera. Or it could be an island in Indonesia. It depends on who's doing the search, and we can intelligently map out who the user is and present results accordingly.”

  • “Fact extraction is something that's extremely useful. We need to look at documents, identify the fact, extract it, validate it, and then store it so that you can query it.”

  • “Now with speech-to-text, you have to take into account people speak English differently in different parts of the world. That's why we've gotten language packs for the Australian accent, for the Singapore accent, for the British accent, and we can build your own accent language pack based on audio ML models.”


At Data Futurology, we are always working to bring you use cases, new approaches and everything related to the most relevant topics in data science to help you get the most value out of these technologies! Check out our upcoming events for more amazing content. And as always, we appreciate your Reviews, Follows, Likes, Shares and Ratings. It really helps new data scientists find us.