How To Best Implement a Data Catalog

George Firican
2 min readMar 20, 2023

Data catalogs are here to stay and a great asset to data-driven organizations. But how do you best implement a data catalog?

In this episode, Rupal Sumaria joins us to explore how to implement a data catalog. Rupal is the Head of Data Governance at Penguin Random House in the UK, which successfully implemented a data catalog recently.

In this session, Rupal walks us through steps to follow when implementing a catalog, the must-have features, and importing data sets. Besides, she dives deep into the pitfalls to avoid, the best practices to follow, and some of the best lessons she has learned.

You will want to hear this episode if you are interested in:

  • [00:16] About Rupal Sumaria
  • [01:43] What a data catalog means and its purpose
  • [02:50] What drove Penguin Random House to implement a data catalog
  • [04:27] Why we need a data catalog
  • [05:30] The impact of having the right team in place and evaluating all the data catalog providers
  • [06:40] The must-haves in a data catalog
  • [07:16] Choosing a vendor and having a good user interface
  • [07:51] What made the data catalog a big success for Penguin Random House
  • [08:48] Double checking and verifying automation steps
  • [09:15] Focusing on the main content while importing data sets and ensuring it’s curated
  • [13:11] Pitfalls in implementing a data catalog
  • [15:30] How her business intelligence knowledge impacted her journey in the data catalog
  • [17:11] What happens when new data sets get changed
  • [18:25] Types of metadata being surfaced in the data catalog
  • [19:44] Penguin’s top consumers
  • [21:44] Handling sensitive data sets
  • [23:00] Best lessons learned from using a data catalog
  • [25:39] Success metrics in the data catalog

Notable Quotes

  • A good provider will do lots of demos for you and not just yourself as a data governor or expert but bring your users into that journey.
  • We don’t want to spend a lot of time buffing around with the toolset. That’s not what a good provider is. They are actually to help you, not make your life harder.
  • If you try to blow the ocean and try to import everything, you’re not getting any value out of it. Focus on the main data first.
  • The average information professional works with data, consumes data, and uses 80% of their time in the week searching for what they need to work with.

Resources

Connect with LightsOnData

--

--

George Firican

Data governance & BI professional, ranked among Top 5 Global Thought Leaders on Big Data, founder of LightsOnData.com and Co-Host of the Lights On Data Show.