The industry is tackling growing challenges in enabling productive data science in data-driven enterprises while maintaining data governance and compliance. Several popular open-source and commercial projects have emerged over the last few years, employing graph-based practices for managing and leveraging metadata. This event brings together these projects along with a larger community of metadata experts to help chart our common ground and work ahead.
Practitioners in this space have been invited to participate in an “unconference” styled workshop, to compare notes and develop reports about both (1) the metadata use cases their organizations have and how their practices have evolved, and (2) the challenges with metadata which their organizations face and how they address those.
This meetup is the public forum where these reports from the Metaspace practitioners workshop will be presented, along with live Q&A from the audience, for metadata questions answered by a collection of industry expert practitioners.
In addition, we’ll have a few selected lightning talks made available online ahead of time, to help prompt further audience discussions. We’ll also make an online survey form available for submitting questions for Q&A.
Metadata Organizing Committee / Principal Staff Software Engineer
Creator: DataHub
Modern enterprises not only have a myriad of data sources, from real-time events, transactional, Big Data, and many other systems, but they also boast a rich ecosystem of thousands of APIs & treasure of deep technical metadata. How do you organize and gain insights from all of this? In addition, there is a trove of metadata from sources such as data transformations, SQL queries, security scans, slack chats, thousands user hierarchies, orgs & locations, access controls, Wiki pages, JIRA tickets and more. Normally, these sources are all disconnected from each other, and valuable insights are missed. Enterprise Metadata is central to data management, leading to effective Data Governance. In this context, we believe Enterprise metadata transcends the traditional Data Catalog, We envision Graph of Enterprise Metadata as a comprehensive knowledge graph that connects and puts all the critical metadata under one umbrella, which will eventually lead us to effective Data Governance.
Speakers: Daniel Rincon Silva & Deepak Chandramouli
As data becomes core to every product, data operations become critical. The OpenLineage API
enables data pipeline observability.
Speaker: Julien Le Dem is the CTO and Co-Founder of Datakin. He co-created Apache Parquet and is involved in several open source projects including Marquez (LF AI), Apache Pig, Apache Arrow, Apache Iceberg and a few others. Previously, he was a senior principal at Wework; principal architect at Dremio; tech lead for Twitter’s data processing tools, where he also obtained a two-character Twitter handle (@J_); and a principal engineer and tech lead working on content platforms at Yahoo, where he received his Hadoop initiation. His French accent makes his talks particularly attractive.
Most people think that a catalog is just good for search and discovery.
But what all can you do if you had an amazing metadata platform sitting behind your search and discovery application?
Shirshanka explores just that by talking about the different use-cases for metadata that are powered by DataHub’s metadata platform at LinkedIn.
Speaker: Shirshanka Das is a Principal Staff Software Engineer in the Data team at LinkedIn. He is responsible for creating and driving the vision for the LinkedIn DataHub and Apache Gobblin projects which power metadata and big data management at the company.
Improving metadata for Earth observations is a long-tern goal of mine. Assessing metadata completeness against community recommendations helps us understand past decisions by researchers and repositories and learn lessons for improving metadata today. Examples of assessments from several large repositories demonstrate how the assessment process can motivate and facilitate continuous metadata improvement efforts.
Speaker: documentation at NOAA’s National Centers for Environmental Information.
Speaker: Ted Habermann created Metadata Game Changers to focus on helping organizations improve metadata for data discovery, access, and understanding. Previously he was the Director of Earth Science at The HDF Group and worked for many years to improve data management, access, interoperability and documentation at NOAA’s National Centers for Environmental Information.
This lightning talk will take a brief look at the surprisingly social life of metadata. In the research the academic citation is the most important type of metadata, but why is it that the way a citation looks can affect the way an entire industry behaves? We will take a quick look!
Speaker: Ian Mulvany is CTO at BMJ. Previously he was head of transformation at SAGE Publishing. He helped setup SAGE’s methods innovation incubator SAGE Ocean following a lean product development approach. He ran technology operations for eLife, was head of product for Mendeley and ran a number of early web2.0 products for Nature Publishing Group.
He is passionate about creating digital tools that support the research enterprise. He is interested in the interplay between different stakeholders that can lead to the sustainably of these kinds of tools.
4:00pm - 4:15pm Welcome address
Topic 1: Use-Cases for Leveraging Metadata
4:15pm - 4:30pm Presentation of outcomes from the Metaspace practitioners workshop
4:30pm - 5:00pm Panel discussion with audience Q&A
Topic 2: Hurdles in Practice for “Excellent” Metadata
5:00pm - 5:15pm Presentation of outcomes from the Metaspace practitioners workshop
5:15pm - 5:45pm Panel discussion with audience Q&A
5:45pm - 6:00pm Closing remarks