Smart Data and Graphorum Conference Trip Report

I attended the Smart Data-Graphorum Conference (January 30 – February 1) in the Bay Area (actually Redwood City). This conference series originally was called Semantic Technology (SemTech) Conference and I have been presenting at it since 2010.

This year, the conference had a cozy feeling with ~250 attendees. I gave two talks:

  • Graph Query Languages: Similar to my Graph Data Texas talk, I gave an update from the Graph Query Language task force at the LDBC. The latest discussions were incorporated in this talk. We have been discussing the idea of having a paths as a datatype and also its own table ( a table for Nodes, Edges and Paths). Additionally, there are two notions of projection: relational vs graph. The slides provide some examples. This is still on going work.

  • Virtualizing Relational Databases as Graphs: a multi-model approach: In this talk I discussed how relational databases can be virtualized as RDF Graphs by using the W3C RDB2RDF standards: Direct Mapping and R2RML. I argue that graphs are cool, and ask if relational databases are cool? If you are  deciding to move from a relational database to a graph database, you should understand the tipping point. I believe virtualization is a viable option to keep your data in a relational database while continuing to take advantage of graph features. However, that may not always be the case.


Additional highlights of the conference

  • I was glad to see a lot of friendly faces. I feel very lucky to that I can always have a chat with Deborah McGuinness and Michael Uschold, two legends in ontologies. It’s always great to see Souri Das from Oracle (and all the Oracle folks from the semantic technology group) and discuss how the W3C RDB2RDF standards are doing. We both agree that we did a good job with that standard and gave a pat on our own backs 🙂 Also great to see Peter Haase, Dean Allemang, Atanas Kiryakov, Bart van Leeuwen, Jans Aasman, Dave McComb and many more.
  • Michael Uschold and I discussed the pragmatics of part-of and has-label semantics. For some situations you want to be generic. For example, it’s easier for a user to just use “has label” for any thing, instead of having to know the exact type of “has label” for a specific thing. Now I understand many of the modeling decisions made in gist. I argue that from a database point of view, query performance is better if you have more specific properties, unless you have some sort of semantic query optimizations.
  • Cambridge Semantics gave a presentation on their in-memory analytics graph database. They presented results using the LUBM benchmark where they claim to have blown Oracle away. Important to note that they used 4x the hardware. Atanas Kiryakov, Ontotext’s CEO was in the audience and rightfully asked why they didn’t use a more up to date benchmark given that LUBM is from 2007. It seems that everybody has been using LUBM (since 2007) so in order to compare to others, they continue to use LUBM. Hopefully they will start using the LDBC benchmarks!
  • I have been aware that Marklogic markets themselves as a document and graph database. I now understand how they represent things underneath the hood. Each entity, with their corresponding attributes and values are represented in a document (key-values). The relationships between the entities are represented as RDF triples.  This makes a lot of sense to me and I can imagine how this can improve query performance to a certain degree.
  • Brian Sletten gave a great talk on JSON-LD. I wish all web developers could see this presentation in order to understand the value of Linked Data. Even though Brian was not able to give his talk on the new W3C upcoming standard SHACL, the Shapes Constraint Language, his slides left a lasting impression. This is the best definition I have ever seen for the Open World Assumption!

  • It was great to see Emil Eifren, Neo Technologies’ CEO again. We discussed history of RDF and Semantic Web (I didn’t know he was a very early user of Jena!). We seem to be in agreement that RDF is great technology for data integration. Anything else graph related, he argues that you should use Neo4J. Not surprising 😛 I was also glad to see that Neo4j is starting to work on formalizing the semantics of Cypher, including making it a closed query language.

This was a great couple of days and hopefully next year we will have more people!