Using Neo4j Graph Databases With ColdFusion

After last week, I decided to put off picking a new frontend platform for my Semantic Web rubric project and focus a bit on the server backend.

Since this is just a proof-of-concept project at this point I can afford to take some risks in choosing technologies. I’ve been following the developments around using graph databases for storing data, especially for Semantic Web applications. One project that kept coming up was Neo4j, a graph database engine built in Java. I figured now was a good time to try it out. My server-side logic is built in ColdFusion, and integrating open source Java projects like Neo4j into CF applications is generally a snap.

Aside from one hiccup, porting Neo4j’s 1-minute Java “Hello World” example to CFML proved to be fairly straightforward. The process I used to get this working is detailed below. I’d suggest that you skim over the Java example before continuing – I’m sure I left out some of the exposition.

First add the Neo4j Jar files to the ColdFusion server:

  • Download the Neo4j “Apoc” distribution and unpack it somewhere convenient. I’m using Mac OS X, so I put things like this in ~/lib/neo4j-apoc-1.0
  • Add the Neo4j JAR files to the ColdFusion classpath. Log into your ColdFusion Administrator, and select Server Settings -> Java and JVM. Enter the path to the lib folder in your Neo4j distribution in ColdFusion Class Path
  • Restart your ColdFusion server. If you’re at all nervous, log back in to the ColdFusion Administrator and verify that the Neo4j jars are indeed listed on your classpath.

Once this is complete, you can initialize a new database for your ColdFusion app. Decide where you want the CF server to create the Neo4j data files and pass that to the object’s init() method. I put mine in a folder under /tmp on Mac OS X.

<cfset dbroot = "/tmp/neo4jtest/" />

<cfset graphDb = createObject('java',
                  "org.neo4j.kernel.EmbeddedGraphDatabase") />
<cfset graphDb.init(dbroot & "var/graphdb") />

[Aside for non-ColdFusion folks: CF doesn't instantiate Java objects quite how you'd expect. The call to CreateObject() just gets a handle on the class itself. Calling init() on the resulting handle actually instantiates the class via the appropriate constructor.]

Just as in the Java example, it’s good to surround your connection with a try/catch block that will close your database connection if you throw an error. As I was working with Neo4j I would periodically lock up my database and not be able to connect without restarting CF. Adding a CFTRY/CFCATCH block cleared this right up.

<cftry>
   <cfset tx = graphDb.beginTx() />

   <cfscript>
     tx.success();
     WriteOutput("Success.");
   </cfscript>

   <cfset tx.finish() />

  <cfcatch type="any">
     <cfset graphDb.shutdown() />
     <cfdump var="#cfcatch#">
   </cfcatch>
</cftry>

<cfset graphDb.shutdown() />

Where things got really sticky was the use of Java enumerations to declare the available relationship types for the graph:

 /* Java code */
 public enum  MyRelationshipTypes implements RelationshipType
 {
    KNOWS
 }

To my knowledge there’s no way to declare something like this in standard CFML. I likely could have wrapped this in a Java class of some sort and loaded it through CreateObject(), but that wouldn’t have been true to the spirit of ColdFusion. So I dug around in the Neo4j docs and found an answer: relationships can be created dynamically at runtime from a static method on the class org.neo4j.graphdb.DynamicRelationshipType. I created an instance of DynamicRelationshipType for the “KNOWS” relationship and loaded it into a Struct, anticipating caching them in Application scope for a real application.

 relationship = CreateObject("java",
                             "org.neo4j.graphdb.DynamicRelationshipType");
 MyRelationshipTypes = structNew();
 MyRelationshipTypes.KNOWS = relationship.withName( "KNOWS" );

It might be interesting to see if these relationship enumerations could be generated and compiled by something like JavaLoader. I’m not yet aware of any downsides with dynamic relationships besides the obvious lack of compile-time checking.

The rest of the exercise follows without any real suprises:

 firstNode = graphDb.createNode();
 secondNode = graphDb.createNode();
 relationship = firstNode.createRelationshipTo( secondNode,
                                         MyRelationshipTypes.KNOWS );

 firstNode.setProperty( "message", "Hello, " );
 secondNode.setProperty( "message", "world!" );
 relationship.setProperty( "message", "brave Neo4j " );

 WriteOutput( firstNode.getProperty( "message" ) );
 WriteOutput( relationship.getProperty( "message" ) );
 WriteOutput( secondNode.getProperty( "message" ) );

And there you have it! A quick and dirty Neo4j application built with CFML.

I’ve put a little work into developing a Neo4j helper class that hides some of these warts in a nice clean CFC. As soon as I can get eGit to behave I’ll post the files on GitHub.

Synchronized Web development workflow

It’s been six years since I switched from HomeSite to Eclipse+CFEclipse as my primary ColdFusion development environment. At the time, my switch was primarily driven by my switch to Mac for development, but the desire for integrated support for version control (e.g. Subversion) directly within in the IDE helped with that as well.

One of the things that has long bugged me about developing ColdFusion apps locally on a dev box (i.e. not on a shared network server) is the need to place project files directly in a Web root somewhere – for example, C:\InetPub\wwwroot on Windows/IIS or in C:\ColdFusion9\wwwroot if you use ColdFusion’s built-in Web server. This throws off my game in two ways:

  1. Browsing to where your files live (in Finder/Explorer, or via the command line) inevitably adds an extra step to every task
  2. Placing the project home outside your user space on the OS makes it more likely you’ll lose the files when upgrading/uninstalling/migrating.

Pain point one could be tackled by sprinkling my system with aliases, shortcuts and/or symbolic links. This reeks of configuration, and would be something I’d need to duplicate on any system I use.

Pain point two there is really the kicker. Sure, using version control keeps you from losing your work, but rebuilding the workspace after a system migration can take a long time. I rarely migrate applications when I get a new system; I prefer to just move the user files and reinstall the apps manually. This periodically cleanses the system and keeps me up to date on patches, even on apps I don’t use often.

I tried keeping my actual project files in a workspace somewhere convenient, such as in my home directory or in the root of the drive (e.g. C:\workspace\MyColdFusionApp)  and then copy the files to the server root to test them. I tried both manual copying and even Subversion commands, but I couldn’t keep that up for long. Part of the benefit of developing anything locally on your system is removing the step of uploading your code to a server to test it, and I was basically backsliding towards that kind of process.

But the idea was sound, and I looked around for something that would painlessly synchronize two folders in different parts of the drive – the project files in my workspace, and the files in a folder under the Web root. My last resort would be to use something like rsync, but I looked around for some sort of plugin for Eclipse – something that could keep the preferences as part of the Eclipse project and/or workspace and be easy to migrate and hard to lose.

With a little digging I found FileSync – a plugin which really fit the bill. It’s open source, and if you can look past the, ah, unpolished Web site of its creator the plugin works pretty well. When I save a file in my workspace, it painlessly gets pushed out to the Web server root for testing.

The plugin also appears to work with network drive targets, so you may be able to use it to publish changes out to a preview or QA server automatically, but you should probably be using some sort of version control for that. :)

April 14, 2010Permalink

CFQUERY and autonumber primary keys

After reading a post by Ben Nadel, it occurred to me that I should write up my technique of getting new autogenerated primary key values back from the database.

It’s very common for database designers to use columns that generate a unique, artificial primary key value for every row inserted into a database table. Different platforms have different names for this type of column:

  • Microsoft SQL Server: identity column property
  • Microsoft Access: AutoNumber field type
  • MySQL: AUTO_INCREMENT column property

PostgreSQL and Oracle use an alternative method called a sequence. Sequences are easier to use in some ways, and the technique I’ll describe below doesn’t really apply.

If you have child tables in your database related to your main table with Foreign Key constraints, you’re going to need to retrieve this autogenerated value before you can insert data into these child tables.

Let’s take a typical INSERT statement that pulls data from an HTML form POST operation:



INSERT INTO tblPeople (
nameLast,
nameFirst,
emailAddress
) VALUES (



)

Although INSERT CFQUERY tags don’t require a name attribute, I make sure to add one here. You’ll see why in a moment.

One handy property of the CFQUERY tag is that it can contain multiple SQL queries if you separate them with semicolons. You can use this trick to fetch the newly-created ID in this first query. This example works with MS SQL Server 2000 or higher:



INSERT INTO tblPeople (
nameLast,
nameFirst,
emailAddress
) VALUES (



);

SELECT SCOPE_IDENTITY() AS peopleID;

Because I named the CFQUERY instance, I can now reference the autogenerated ID value in my succeeding queries as:


#insertNewRecord.peopleID#

Don’t forget to wrap your entire batch of related CFQUERY tags inside a CFTRANSACTION tag to make them one complete operation.

One final note: The SCOPE_IDENTITY() function used above only works with SQL Server 2000 and higher. Anyone on SQL Server 7 (??) and before must use the special @@IDENTITY variable to accomplish the same thing. @@IDENTITY has some limitations with regards to databases with triggers, and may not retrieve the right value in all cases. Here’s a more thorough explanation of the problem.

People using MySQL can substitute the function LAST_INSERT_ID() to accomplish the same thing.

Which JVM for ColdFusion development on Mac?

I’ve been having a lot of UI freezes lately on my MacBook Pro. The system will lock up for 20-30 seconds, with mouse movement, but no responses to clicks or the keyboard.

I’m suspecting that this is something to do with CF 8 and/or Eclipse on Apple’s Java 1.5 JVM. The freezes happen most often when one or the other of these applications are running.

Things seem a whole lot better when I roll back to Java 1.4, but CFEclipse stops working.

I’ve also found references to running CF 8 on Java 1.6. Maybe that’s the next thing to try.

Holy Xeons!

Wow… ColdFusion MX on the MacPro is fastfastfast. It took a while to get it set up since it isn’t technically supported, but wow. WOW I say.

I skimped at least on one step: I didn’t recompile mod_jrun on this box. Since I’d gone through that process on the MacBook Pro, I just copied my existing Intel binary .so file to the new machine, and everything seems to be running just fine.

I had to mangle several lines in the config files I copied over from my G5, as the CF files now live in /Applications/JRun4 rather than /Applications/ColdFusionMX due to the recommended multi-server install.

The big drag again was adding our homebrew SSL CA certificate to the system keychain to let our development applications authenticate against Active Directory. I think we should just pony up the cash and get a verified certificate this year.

Trying out BlogCFC

I’ve been meaning to set up a personal blog for some time, but I never found quite the right package for my tastes. I wanted something I could tinker with and extend. There were several PHP packages that would have fit the bill, but as a huge fan of Macromedia ColdFusion, I thought I’d look there first.

As it turned out, most of the Macromedia-inspired blogs I read happen to use the same package, Ray Camden’s BlogCFC. This package looked like just what I wanted, but with one exception — the database support was somewhat limited. Only Access, SQL Server, and MySQL were supported out-of-the-box. While I have no qualms with SQL Server, all of my servers run Linux of one version or another so that wasn’t an option. This also disqualified Access, but honestly it hadn’t even gotten into the game.

So now to MySQL. Anyone that knows me has heard my cries of “Use a real database!” every time someone suggests MySQL for a new project. As someone raised on enterprise-level RDBMSs, MySQL always felt amateurish and limited.

I really like PostgreSQL, so I set about hacking BlogCFC to support it. This turned out to be messier than I had hoped; I got bogged down in the details, and never got around to starting a blog.

When I saw that a new version of BlogCFC was available, I figured I needed to check it out again.

I liked what I saw, so I finally put my reservations aside and installed it with a MySQL backend — in the interest of actually getting something going now.

So, here I am. I’m going to try to get my feet wet with short tips at first to try to build some momentum. I’m still not crazy about committing the RAM and processor cycles to a second database backend on my server, but at least now I can work on that PostgreSQL port gradually while getting some practical experience with the package.