If libraries were like relational databases…

I was inspired by XKCD to draw this cartoon for a recent presentation on the Semantic Web. We have this habit of dismembering data when we use relational modeling. Consequently, we spend a lot of our development time figuring out how to reassemble entities to use them in our applications, particularly with large, heavily-normalized databases. It’s occasionally good to remind ourselves that relational modeling is an optimized form of data storage. But it’s not the only one, and it isn’t always the right one for a given problem.

Using Neo4j Graph Databases With ColdFusion

After last week, I decided to put off picking a new frontend platform for my Semantic Web rubric project and focus a bit on the server backend.

Since this is just a proof-of-concept project at this point I can afford to take some risks in choosing technologies. I’ve been following the developments around using graph databases for storing data, especially for Semantic Web applications. One project that kept coming up was Neo4j, a graph database engine built in Java. I figured now was a good time to try it out. My server-side logic is built in ColdFusion, and integrating open source Java projects like Neo4j into CF applications is generally a snap.

Aside from one hiccup, porting Neo4j’s 1-minute Java “Hello World” example to CFML proved to be fairly straightforward. The process I used to get this working is detailed below. I’d suggest that you skim over the Java example before continuing – I’m sure I left out some of the exposition.

First add the Neo4j Jar files to the ColdFusion server:

  • Download the Neo4j “Apoc” distribution and unpack it somewhere convenient. I’m using Mac OS X, so I put things like this in ~/lib/neo4j-apoc-1.0
  • Add the Neo4j JAR files to the ColdFusion classpath. Log into your ColdFusion Administrator, and select Server Settings -> Java and JVM. Enter the path to the lib folder in your Neo4j distribution in ColdFusion Class Path
  • Restart your ColdFusion server. If you’re at all nervous, log back in to the ColdFusion Administrator and verify that the Neo4j jars are indeed listed on your classpath.

Once this is complete, you can initialize a new database for your ColdFusion app. Decide where you want the CF server to create the Neo4j data files and pass that to the object’s init() method. I put mine in a folder under /tmp on Mac OS X.

<cfset dbroot = "/tmp/neo4jtest/" />

<cfset graphDb = createObject('java',
                  "org.neo4j.kernel.EmbeddedGraphDatabase") />
<cfset graphDb.init(dbroot & "var/graphdb") />

[Aside for non-ColdFusion folks: CF doesn't instantiate Java objects quite how you'd expect. The call to CreateObject() just gets a handle on the class itself. Calling init() on the resulting handle actually instantiates the class via the appropriate constructor.]

Just as in the Java example, it’s good to surround your connection with a try/catch block that will close your database connection if you throw an error. As I was working with Neo4j I would periodically lock up my database and not be able to connect without restarting CF. Adding a CFTRY/CFCATCH block cleared this right up.

<cftry>
   <cfset tx = graphDb.beginTx() />

   <cfscript>
     tx.success();
     WriteOutput("Success.");
   </cfscript>

   <cfset tx.finish() />

  <cfcatch type="any">
     <cfset graphDb.shutdown() />
     <cfdump var="#cfcatch#">
   </cfcatch>
</cftry>

<cfset graphDb.shutdown() />

Where things got really sticky was the use of Java enumerations to declare the available relationship types for the graph:

 /* Java code */
 public enum  MyRelationshipTypes implements RelationshipType
 {
    KNOWS
 }

To my knowledge there’s no way to declare something like this in standard CFML. I likely could have wrapped this in a Java class of some sort and loaded it through CreateObject(), but that wouldn’t have been true to the spirit of ColdFusion. So I dug around in the Neo4j docs and found an answer: relationships can be created dynamically at runtime from a static method on the class org.neo4j.graphdb.DynamicRelationshipType. I created an instance of DynamicRelationshipType for the “KNOWS” relationship and loaded it into a Struct, anticipating caching them in Application scope for a real application.

 relationship = CreateObject("java",
                             "org.neo4j.graphdb.DynamicRelationshipType");
 MyRelationshipTypes = structNew();
 MyRelationshipTypes.KNOWS = relationship.withName( "KNOWS" );

It might be interesting to see if these relationship enumerations could be generated and compiled by something like JavaLoader. I’m not yet aware of any downsides with dynamic relationships besides the obvious lack of compile-time checking.

The rest of the exercise follows without any real suprises:

 firstNode = graphDb.createNode();
 secondNode = graphDb.createNode();
 relationship = firstNode.createRelationshipTo( secondNode,
                                         MyRelationshipTypes.KNOWS );

 firstNode.setProperty( "message", "Hello, " );
 secondNode.setProperty( "message", "world!" );
 relationship.setProperty( "message", "brave Neo4j " );

 WriteOutput( firstNode.getProperty( "message" ) );
 WriteOutput( relationship.getProperty( "message" ) );
 WriteOutput( secondNode.getProperty( "message" ) );

And there you have it! A quick and dirty Neo4j application built with CFML.

I’ve put a little work into developing a Neo4j helper class that hides some of these warts in a nice clean CFC. As soon as I can get eGit to behave I’ll post the files on GitHub.

CFQUERY and autonumber primary keys

After reading a post by Ben Nadel, it occurred to me that I should write up my technique of getting new autogenerated primary key values back from the database.

It’s very common for database designers to use columns that generate a unique, artificial primary key value for every row inserted into a database table. Different platforms have different names for this type of column:

  • Microsoft SQL Server: identity column property
  • Microsoft Access: AutoNumber field type
  • MySQL: AUTO_INCREMENT column property

PostgreSQL and Oracle use an alternative method called a sequence. Sequences are easier to use in some ways, and the technique I’ll describe below doesn’t really apply.

If you have child tables in your database related to your main table with Foreign Key constraints, you’re going to need to retrieve this autogenerated value before you can insert data into these child tables.

Let’s take a typical INSERT statement that pulls data from an HTML form POST operation:



INSERT INTO tblPeople (
nameLast,
nameFirst,
emailAddress
) VALUES (



)

Although INSERT CFQUERY tags don’t require a name attribute, I make sure to add one here. You’ll see why in a moment.

One handy property of the CFQUERY tag is that it can contain multiple SQL queries if you separate them with semicolons. You can use this trick to fetch the newly-created ID in this first query. This example works with MS SQL Server 2000 or higher:



INSERT INTO tblPeople (
nameLast,
nameFirst,
emailAddress
) VALUES (



);

SELECT SCOPE_IDENTITY() AS peopleID;

Because I named the CFQUERY instance, I can now reference the autogenerated ID value in my succeeding queries as:


#insertNewRecord.peopleID#

Don’t forget to wrap your entire batch of related CFQUERY tags inside a CFTRANSACTION tag to make them one complete operation.

One final note: The SCOPE_IDENTITY() function used above only works with SQL Server 2000 and higher. Anyone on SQL Server 7 (??) and before must use the special @@IDENTITY variable to accomplish the same thing. @@IDENTITY has some limitations with regards to databases with triggers, and may not retrieve the right value in all cases. Here’s a more thorough explanation of the problem.

People using MySQL can substitute the function LAST_INSERT_ID() to accomplish the same thing.