W3C Graph Data Workshop Trip Report

This week, March 4-6 2019, was the W3C Graph Data Workshop – Creating Bridges: RDF, Property Graph and SQL

When I come to meetings/workshops like this, I always ask myself what does success look like: “IF X THEN this will have been a successful meeting”. So, I told myself:

 

IF there is a consensus within the community that we need to standardize mappings between Property Graphs and RDF Graphs THEN this will have been a successful meeting.  

 

I can report that, per my definition, this was a successful meeting! It actually surpassed my expectations.

 

In order to keep track of the main outcomes of each talk/session I attend, I’m following a technique to summarize immediately the takeaways in a crisp and succinct manner (and if I can’t that means I didn’t understand). What better way of doing that than in a tweet (or two or three). Therefore the majority of this trip report are pointers to my tweets 🙂. In a nutshell the tl;dr:
– There is a unified and vibrant graph community.
– A W3C Business Group will be formed and serve as a liaison between different interested parties.
– There is a push for RDF*/SPARQL* to be a W3C Member submission.
– There is interest to standardize a Property Graph data model with a schema.
– There is interest to standardize mappings between Property Graphs and RDF.

 

Kudos to Dave Raggett and Alastair Green for chairing this event. The organization was fantastic. Additionally, the website has all the position papers, lighting talk slides and minutes in google docs for every single session. Please go there directly to get all the detailed information directly from the source.

Brad Bebee’s Keynote

The workshop started with a keynote by Brad Bebee from Amazon Neptune. The main takeaway of his talk was:

 

We all know that the common uses cases for graph are: social networks, recommendations, fraud detection, life science and network & IT operations. In addition to the common use cases, Brad said something that highly resonated with me specifically w.r.t. Knowledge Graphs (paraphrasing):

 

“Use graphs to link information together to transform the business. Link things that were never connected before. This is really exciting.

 

Some other important takeaways

Coexistence or Competition

After discussions about how standardization works within W3C and ISO, there was a mini panel session on “Coexistence or Competition” with Olaf Hartig, Alastair Green and Peter Eisentraut. The take aways:

Lightning talks

The day ended with over 25 lightning talks. The moderators were excellent time keepers. The two main themes that caught my attention were the following:

Many independent bridges are already being formed: Many approaches are being presented that build bridges between Property Graphs, RDF Graphs and SQL. A few of the lightning talks:

 

However, as Olaf Hartig was alluding, we should not focus on creating ad-hoc implementations of bridges. We need to clearly understand what that bridge means (i.e. what are the semantics!). Olaf’s RDF*/SPARQL* proposal to annotate statements in RDF and which can serve as a bridge between Property Graphs in RDF has been very well received in the community. As a matter of a fact, this approach has already been implemented in commercial systems such as Cambridge Semantics and Blazegraph.

 

Personally, I avoid (and actually stop) discussion on syntax. In my opinion, that should not be the first topic of discussion. We first need to agree on the meaning.
Note: I think there may be interesting science in here.

 

GraphQL is popular: I was surprised to see GraphQL being a constant topic of discussion. It was presented as the global layer over heterogeneous data sources (i.e OBDA), as an interface to RDF graphs, and also as a schema language for Property Graphs. You could hear a lot of GraphQL discussions in the hallway.
Note: I think this is engineering. Not clear if there is science here.

 

The second day consisted of three simultaneous tracks: Interoperation, Problems & Opportunities and Standards Evolution for a total of 12 sessions. By coincidence (?), all the sessions I was interested were in the Interoperation track.

Graph Data Interchange

Graph Query Interoperation

Specifying a Standard

I find it very cool that Filip Murlak and colleagues defined a formal, readable, and executable semantics of Cypher in Prolog which is based on the formal semantics defined by the folks from U. of Edinburgh. This reminds me when I took a course with JC Browne on Verification and Validation of Software Systems and learned about Tony Hoare’s and Jay Misra’s Verification Grand Challenge. 

Finally, Andy Seaborne made a very important point:

Graph Schema

I was glad to have the opportunity to moderate this session because this is a topic very dear to me (Hello Gra.fo!) and I am chairing an informal Property Graph Schema Working Group (PGSWG), so I was glad to moderate it.

 

Inspired by our work in G-CORE, which was a very nice mix of industry and academia members, and which influenced the GQL manifesto which lead to the GQL standardization effort, I was asked to chair this informal working group. I was able to share what we have accomplished up to now

George Fletcher provided a quick overview of the lessons learned in the academic survey. He condensed the lessons learned into: 1) start small, 2) start from foundations and 3) start with flexibility in mind. Oskar van Rest presented an overview of what the existing industry-based Graph databases support. This is still work in progress. I presented the use case and requirements document which is that starting point to drive the discussions towards features that address concrete use cases. Olaf presented how GraphQL could be a schema language for PG, in other words there is a syntax that could be reused. This sparked the discussion of syntax, syntax, syntax. As I previously mentioned, I avoid discussions that jump immediately into syntax because we should first focus on the understanding/semantics.

 

The top desirable feature was … KEEP IT SIMPLE! Other top features were: enable for future extensibility, allow for permissive vs restrictive, allow for open world vs closed world, have a simple clean formalization, and again… keep it simple (don’t make mistakes like XML Schema). Josh Shinavier remotely mentioned “historically, property graphs were somewhat of a reaction to the complexity of RDF. A complex standard will not be accepted by the developer community.

 

To summarize our 1.5 hour discussion:

 Finally, 1.5 hours is not enough to discuss graph schemas so a group of us stayed the next day and kept working on it.

Up to now, the PGSWG has been informal. There was a consensus that it should gain some sort of formality by becoming a task force within the Linked Data Benchmark Council (LDBC). More info soon!

What are the next steps?

The goal of the third and final day was to first offer a summary of each session and then to discuss the concrete next steps.
My concrete proposal for next steps:

I was also proposing to standardize mappings from Relational Database to Property Graphs and I was happy to learn that this work is already underway within ISO.

 

Following the building bridges analogy, we need to have aligned piers in order to know how to build the bridge. RDF is standardized and formalized. Property Graphs are not. Therefore the first task is to lift the Property Graph pier so it can be aligned to RDF. Subsequently, we will be in the position to start addressing interoperability needs between Property Graphs and RDF Graphs by the means of establishing direct and customizable mappings.

 

Furthermore, given the commercial uptake and interest of RDF*/SPARQL*, this will drive discussions towards a new version of RDF in the very near future.

 

The official outcome (I believe) is that a W3C Business Group will be created in order to coordinate with all the interested parties, existing W3C community groups and be a liaison with ISO (where the GQL and SQL/PG work is going on). An official report will come soon.

Lack of Diversity

We have a vibrant graph community. However this community lacks diversity as it was observed on twitter by Margaret who wasn’t even at the event:

There was definitely over 100 people attending this meeting. 96+ were men. I believe there was only 5 4 females attending (thanks to Christophe for the clarification). I had the chance to meet with them.

 

Dörthe Arndt: A moderator for the Rules and Reasoning session, a researcher at Ghent University and who believes that rules should be part of data. Unfortunately I did not have the opportunity to speak more with Dörthe.

– Marlène Hildebrand: This is the first time I met Marlène. She is at EPFL working on data integration using RDF, so we discussed a lot about converting different sources to RDF, mappings and methodologies on how to create ontologies and mappings.

– Petra Selmer is a member of the Query Languages Standards and Research Group at Neo4j and has vast experience on graph databases.
Monika Solanki, well known in the Semantic Web community and always a pleasure to interact with her at conferences.
– Natasa Varytimou: It was great to finally meet Natasa in real life after interacting a lot via email. She is an Information Architect at Refinitiv (Finance company of Thomson Reuters) and is one of the brains behind the large scale Refinitiv Knowledge Graph.
The lack of diversity worries me and I strongly urge that we, as a community, take action on this matter.

Final quick notes

– We seem to be converging into a unified graph community! Not individual RDF and PG communities. I didn’t hear any RDF vs PG conversations.
– However, Gremlin was underrepresented. If it weren’t for Josh Shinavier, who was constantly providing his input remotely, we would have missed valuable input.
– Thank you Josh and Uber for offering a virtual connection. I believe everything has been recorded and you can find the details in the minutes.
– BMW is starting to get onboard the Knowledge Graph bandwagon. After chatting with Daniel Alvarez , it seems that they are still in an early innovator phase. Nevertheless, very exciting.
– It was a great idea to have a two day event across three days. That  way you could technically arrive on the first day of the event and leave on the last day of the event.
– The W3C RDB2RDF Standard editors meet again! I was one of the editors of the Direct Mapping while Richard Cyganiak was one of the editors of R2RML

– Adrian Gschwend has his summary in a twitter thread:

– Gautier Poupeau has his summary in a twitter thread in french:

– Find a lot more tweets by searching for the the #W3CGraphWorkshop hashtag.

5 Replies to “W3C Graph Data Workshop Trip Report”

  1. Hi Juan,
    thank you very much for this trip report!
    I have a very stupid question: I consider, e.g., the presentation “SQL extension for property graphs” and I wonder what makes a property graph different from a core RDF graph (throwing away reification, rdf:Resource being instance of itself and that stuff) aside of syntax (which you correctly say is not such a terribly important aspect)?
    Cheers,
    Steffen

    1. Not a stupid question! My hypothesis is that all data models are equally “expressive”, meaning that what can be modeled in (graph) data model M1 can also be modeled in (graph) data model M2. However, the issue, in my opinion, is that we need to understand how “natural” is the resulting modeled data. Olaf and I have been tinkering on this topic for a while. If you are interested, let us know! What is important, is to define 1) a PG<->RDF direct mapping and 2) a PG<->RDF mapping language.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.