If libraries were like relational databases…

I was inspired by XKCD to draw this cartoon for a recent presentation on the Semantic Web. We have this habit of dismembering data when we use relational modeling. Consequently, we spend a lot of our development time figuring out how to reassemble entities to use them in our applications, particularly with large, heavily-normalized databases. It’s occasionally good to remind ourselves that relational modeling is an optimized form of data storage. But it’s not the only one, and it isn’t always the right one for a given problem.

Using Neo4j Graph Databases With ColdFusion

After last week, I decided to put off picking a new frontend platform for my Semantic Web rubric project and focus a bit on the server backend.

Since this is just a proof-of-concept project at this point I can afford to take some risks in choosing technologies. I’ve been following the developments around using graph databases for storing data, especially for Semantic Web applications. One project that kept coming up was Neo4j, a graph database engine built in Java. I figured now was a good time to try it out. My server-side logic is built in ColdFusion, and integrating open source Java projects like Neo4j into CF applications is generally a snap.

Aside from one hiccup, porting Neo4j’s 1-minute Java “Hello World” example to CFML proved to be fairly straightforward. The process I used to get this working is detailed below. I’d suggest that you skim over the Java example before continuing – I’m sure I left out some of the exposition.

First add the Neo4j Jar files to the ColdFusion server:

  • Download the Neo4j “Apoc” distribution and unpack it somewhere convenient. I’m using Mac OS X, so I put things like this in ~/lib/neo4j-apoc-1.0
  • Add the Neo4j JAR files to the ColdFusion classpath. Log into your ColdFusion Administrator, and select Server Settings -> Java and JVM. Enter the path to the lib folder in your Neo4j distribution in ColdFusion Class Path
  • Restart your ColdFusion server. If you’re at all nervous, log back in to the ColdFusion Administrator and verify that the Neo4j jars are indeed listed on your classpath.

Once this is complete, you can initialize a new database for your ColdFusion app. Decide where you want the CF server to create the Neo4j data files and pass that to the object’s init() method. I put mine in a folder under /tmp on Mac OS X.

<cfset dbroot = "/tmp/neo4jtest/" />

<cfset graphDb = createObject('java',
                  "org.neo4j.kernel.EmbeddedGraphDatabase") />
<cfset graphDb.init(dbroot & "var/graphdb") />

[Aside for non-ColdFusion folks: CF doesn't instantiate Java objects quite how you'd expect. The call to CreateObject() just gets a handle on the class itself. Calling init() on the resulting handle actually instantiates the class via the appropriate constructor.]

Just as in the Java example, it’s good to surround your connection with a try/catch block that will close your database connection if you throw an error. As I was working with Neo4j I would periodically lock up my database and not be able to connect without restarting CF. Adding a CFTRY/CFCATCH block cleared this right up.

<cftry>
   <cfset tx = graphDb.beginTx() />

   <cfscript>
     tx.success();
     WriteOutput("Success.");
   </cfscript>

   <cfset tx.finish() />

  <cfcatch type="any">
     <cfset graphDb.shutdown() />
     <cfdump var="#cfcatch#">
   </cfcatch>
</cftry>

<cfset graphDb.shutdown() />

Where things got really sticky was the use of Java enumerations to declare the available relationship types for the graph:

 /* Java code */
 public enum  MyRelationshipTypes implements RelationshipType
 {
    KNOWS
 }

To my knowledge there’s no way to declare something like this in standard CFML. I likely could have wrapped this in a Java class of some sort and loaded it through CreateObject(), but that wouldn’t have been true to the spirit of ColdFusion. So I dug around in the Neo4j docs and found an answer: relationships can be created dynamically at runtime from a static method on the class org.neo4j.graphdb.DynamicRelationshipType. I created an instance of DynamicRelationshipType for the “KNOWS” relationship and loaded it into a Struct, anticipating caching them in Application scope for a real application.

 relationship = CreateObject("java",
                             "org.neo4j.graphdb.DynamicRelationshipType");
 MyRelationshipTypes = structNew();
 MyRelationshipTypes.KNOWS = relationship.withName( "KNOWS" );

It might be interesting to see if these relationship enumerations could be generated and compiled by something like JavaLoader. I’m not yet aware of any downsides with dynamic relationships besides the obvious lack of compile-time checking.

The rest of the exercise follows without any real suprises:

 firstNode = graphDb.createNode();
 secondNode = graphDb.createNode();
 relationship = firstNode.createRelationshipTo( secondNode,
                                         MyRelationshipTypes.KNOWS );

 firstNode.setProperty( "message", "Hello, " );
 secondNode.setProperty( "message", "world!" );
 relationship.setProperty( "message", "brave Neo4j " );

 WriteOutput( firstNode.getProperty( "message" ) );
 WriteOutput( relationship.getProperty( "message" ) );
 WriteOutput( secondNode.getProperty( "message" ) );

And there you have it! A quick and dirty Neo4j application built with CFML.

I’ve put a little work into developing a Neo4j helper class that hides some of these warts in a nice clean CFC. As soon as I can get eGit to behave I’ll post the files on GitHub.

WordPress Plugins for the Personal Web

The first thing I looked for after successfully migrating my old blog content to WordPress was some new plugins. The plugin ecosystem for WordPress has always impressed me with it’s diversity and range. The first two plugins I installed have actually changed the way I think a little bit about what it means to have a personal identity on the Web.

My first task was to find a plugin to embed my “Friend of a Friend” (FOAF) profile into my blog pages. FOAF is a Semantic Web standard for describing personal information and social networking links in a way that is open and distributed.

Facebook is a fine system, but they’ve made it clear that the information you post about yourself and your friends is their intellectual property. I don’t begrudge them this – they’ve spent millions of dollars building a system that (for the most part) just works. But the real pain comes when Yet Another Social Networking Site comes on the scene. You sign up to join in on the fun, and immediately start building your social network up again in a different silo, a different walled garden.

The designers of the FOAF standard aimed to provide an open way of defining your social network using the tools of the Semantic Web: URIs and RDF. I hope that some future Facebook or Twitter will import and export a social network graph in this form. In the mean time, we can still build our own individual applications using the FOAF vocabulary. Addmittedly a stupid name, but a very powerful idea.

The wp-rdfa plugin adds support for FOAF to WordPress. It generates a very basic FOAF profile for the blog owner based on your already-defined user profile. This is the case where the Semantic Web shines. RDF and OWL are not going to replace (X)HTML over night – they’re too complex and arcane to be readily adopted by Web designers and developers who code by hand. But Content Management Systems, for example, can be modified to generate Semantic Web representations of the data it manages. Drupal 7 is a great example of this – they implemented RDFa as a standard part of the system.

With a bit of hacking to the plugin, I now include a reference to my (rudimentary) FOAF profile in the pages of this blog. It’s nothing fancy, but I can add additional information as I go, eventually building up a description of my interests, projects, friends, etc. that is independent of Facebook and LinkedIn, and is wholly mine. As more blogging packages add support for FOAF, we can begin to build a semantic distributed social network, with blog posts and comments replacing Newsfeeds and Wall posts.

I’m working on some modifications to the wp-rdfa plugin to make it a little more flexible. The first mod makes linking to an external FOAF file possible. I wanted to be able to put any sort of information into the FOAF file that it allowed without being limited to the profile boxes in WordPress. The second mod will make sure all additional FOAF data generated by the plugin (such as comments on posts) link back to the appropriate parts of the FOAF file.

Want to be FOAF? Use the FOAF-a-matic profile generator and follow the instructions shown on how to link this profile into your blog theme – it’s no more difficult than linking in a stylesheet or JavaScript library.

Why Open Rubrics?

In my past few posts, I tried to shed a little light on my interest in an open data model for educational rubrics. If you’re new to the general concept of a rubric, there’s a fine summary on Wikipedia. So what do I mean by an “open data model”? Let’s break that down.

Again from our friend Wikipedia:

A data model in software engineering is an abstract model that describes how data are represented and accessed. Data models formally define data elements and relationships among data elements for a domain of interest.

The jist is that we need a way to describe rubrics, whole or in part, for use in a software system.

Most of the online rubric generator tools produce a rubric document – usually in HTML, or possibly PDF or Excel, that lends itself well to printing and other pre-Internet use cases. But document rubrics are not easily integrated into any sort of information system – in these cases, they are merely presentational forms of a rubric, and contain little or no semantic information about the meaning of the various parts of the document. So the world of computerized rubrics is similar to the state of Web development in 1999 – lots of non-semantic, presentation-laden documents that are hard to process by any sort of software.

So why an open data model? My thoughts on this tend to group into two arguments:

  1. transportability – a rubric is a document that should be able to move from one technological system to another. There are a few existing rubric tools that do create a computer-readable rubric document, but the file format is proprietary – rubrics created in such a system can only be used in that system, and can’t be exchanged with other systems that might be able to use them, except in some presentational form like PDF.
  2. continuity – relying on any sort of proprietary system as the sole means of reading and storing important data is no longer an option. Even de facto standard formats like Microsoft’s Word DOC and other Office file formats are deemed too risky by many governments, leading to the creation OpenDocument Format Alliance.

So what type of format should we use? HTML and XML are great at describing the structure and content of documents, but less so the meaning implied by the information.

The Semantic Web provides some exciting possibilities for open data in all forms. So why not rubrics?

Next: Semantic Rubrics

Why many Microformats begin with ‘h’

I’ve been spending some quality time with several Microformats as part of my work for DealerPeak. We’ve been adding semantic attributes, including hCard, hProduct, and hListing, to the pages generated by the DealerPeak Automotive Dealership CRM/CMS system.

During a recent redesign of the car listing page, I was adding hCard microformat information to the dealer contact information block. As I was reviewing the hCard specification, I came across the following text:

The root class name for an hCard is “vcard”. An element with a class name of “vcard” is itself called an hCard.[1]

This distinction struct me as a bit odd, but I didn’t think too much of it because I had a deadline.

Over the past few days I’ve been working with one of the other developers again on some Microformat ideas, this time implementing some of the hProduct and hListing elements into a similar page. Sudden inspiration struck me – the ‘h’ is an ASCII-safe lower-case Mu (μ) – the SI prefix for “micro”!

[1] http://microformats.org/wiki/hcard#Root_Class_Name

Towards an Open Rubric – Part One

Though it seems like just a short time, almost three years ago my old workgroup at Penn State set out to do something crazy: help our faculty deal with an overnight tripling of class sizes in our college.

The College of Information Sciences and Technology had been created by University President Graham Spanier in 1999 under a protectionist model: class sizes were capped and admission to the major was restricted in order to try to create something different: a program built from the ground up around Problem Based Learning. At the same time, the administration recognized that the college couldn’t be self-sufficient with these restrictions, and provided the startup funding necessary to allow it to prosper.

When this additional funding came to an end, the college administration discovered a sobering fact: class sizes would have to nearly triple for the college to become self-sufficient. The artificial environment under which the college had prospered was coming to an end.

At the time of this inflection point, I was the Senior Technologist of the now-defunct IST Solutions Institute. SI was the education R&D arm of the college: an eLearning and technology “Skunk Works” comprised of Instructional and Multimedia Designers and Software Developers. A few months earlier, Stevie Rocco, one of our Instructional Designers and my partner in crime at SI, had come across an interesting project: a JavaScript-based rubric tool for evaluating student course work [I'm trying to find a reference to this project - BP]. There were a number of technical limitations of this prototype, but the idea was sound: have the rubric be the UI metaphor with which faculty could interact with a system that facilitated higher-quality, higher-speed grading by the faculty member by simultaneously:

  1. handling the accounting operations behind grading and giving feedback
  2. fostering the sharing of grading standards across a diverse faculty

We set about to design develop a rubric-centric application, one that would complement Penn State’s ANGEL LMS and SI’s existing Edison Services suite of eLearning tools.

In my mind, an absolute imperative behind developing such an application would be the separation of the definition of rubric documents (or data objects) from the application code of such a system. Many of the existing rubric tools (including that first JavaScript implementation) had no clear separation of data from behavior; at best, this makes them inseparable from their single, embedded rubric. In any case, the result is effectively a closed systems with little hope of sharing data with open systems in the education enterprise.

Still other rubric-based systems decomposed a rubric into multiple Relational Database tables, shattering the coherence of the rubric as a first-class part of the system. One can hardly fault such projects: this was the prime application design pattern of Web 1.0 and even Web 2.0 applications then coming into common use.

As we developed our prototype rubric tool (which we jokingly called “The Rubricator”), I made sure the design was built around a rubric as a document, at the time marked up in XML, that could be separated from the application, shared, remixed, etc. The UI was built in Adobe Flex with a server layer in ColdFusion, two technologies the SI gang was already very familiar with from previous projects. “The Rubricator” would load the rubric document payloads at runtime, ensuring a strong separation from logic and data representation.

The whole process of design at SI was one we took very seriously. To date, this project was the best example of team collaboration and iterative design and development I have experienced in my professional career. After two iterations of prototyping and design meetings, we now had a clear design and application flow:

Rubricator Application States

After the ensuing six months of back-burner and after-hours hacking, we approached the end of our third iteration and a magical “1.0″ release. Then the unthinkable happens: SI was dissolved and the team was scattered across other units in the college. While that was disappointing to all of us personally and professionally, we were leaving a big stakeholder in a really awkward position.

Next: Part Two – Finishing what we started

Speaking tonight on the Semantic Web

The Semantic Web has been a strong interest of mine over the last two years. When I came across RDF and OWL through a research project at IST back in 2008, a Web Standard no less, I’d somehow been completely oblivious to its existence.

If you’ve never heard of the Semantic Web, here’s a quick intro video. I’ll wait here.

Everybody back? Okay! The concepts behind OWL seemed to solve a few thorny design issues I’d come across in a decade of building relational databases-backed Web 1.0 apps, and do so in a really elegant way. Working with OWL fuses aspects of relational database modeling, information architecture, and object oriented design into a new set of technologies and techniques.

As I started talking to members of the developer community at Penn State about the Semantic Web, I got a lot of blank stares and misunderstandings (“Isn’t that just XML?”). And yet, every graduate student in IST was exposed to ontologies and semantic modelling as a routine part of the curriculum. The research community had been working with ontologies for years. Clearly there was a large academic-practitioner gap here to be bridged.

So as I’ve done many times in the past with a new technology or concept, I started talking about the Semantic Web at user group meetings and conferences, and looking for ways to apply these technologies in low-risk venues.

Tonight is the latest in this series of speaking engagements, and possibly the most challenging thus far. I’ll be presenting my talk “An Argument For Semantics” at the Portland Java User Group. I’ve been really impressed by the quality of home grown presenters at PJUG since I started attending. My talk will be very different – less code, more conceptual – than usual PJUG speakers, but I’m hoping the technical experience in the room can generate a good discussion on how and when it makes sense to employ Semantic Web technologies in real world applications.