Realtime ETL (sort of) Using JRuby

1 05 2007

I thought I’d share some of my experiences with embedding JRuby in a
Java application. This is revised from an email I sent to the JRuby Users mailing list.

At work, we have just completed a project to embed JRuby in a JBOSS app server.
The purpose being to extract data from a transaction database and project it
into an operational database. A simple, real-time ETL, if you will. This
release replaced a complex XML mapping model w/ a more robust scripting
solution.

Why do this at all?

We needed:
1) a testing framework around data projections
2) the ability to write projections without the entire technology stack running
3) the ability to send data almost anywhere in almost any form.

Why JRuby?

Active Record. Once we were able to get AR to retrieve db connections from a
JNDI datasource, it was the hands down winner. The pre-one-oh release of JRuby
was less of a concern then you might expect, because we were considering this
a long term solution.

What went well

  • prototyping was awesome
  • putting stuff on and taking stuff off JMS queues worked well. This
    eventually became
    part of the java service, but the proof-of-concept was written in JRuby.
  • agile development:
    writing libraries as users needed features was fast and painless
    unit testing worked well
  • bundling and versioning with gems
  • using java libraries for speed (xom)
  • active-record using JNDI to retrieve datasources
  • spinning off multiple processes was dead simple:
    this was necessary to overcome performance problems for large numbers of
    transactions (where large equals 6 months of transactions in a weekend)
  • users (other developers) grok-ed Ruby syntax quickly.
  • active_record

What didn’t go so well

  • JRuby performance was bad (worse then I expected).
  • ‘stack level too deep’ errors are brutal in an app server. Especially when
    active_support is involved
  • swallowed errors made debugging a nightmare
  • had to hunt down some memory leaks

Overall, I was pretty happy with the results. Even with the frustrations of the
‘didn’t go so well list’, JRuby provides with more flexibility in our
architecture then we ever had before.

Shortly after I sent that email, we added a simple DSL for republishing portions for the data when ETL rules change.

That project was done with JRuby 0.9.1. Since that time, each of the ‘didn’t go so well’ item has been addressed in JRuby.

Despite some of the problems we had with our first implementation of JRuby, many of our developers have made JRuby an important part of the their toolset. There are currently three more JRuby based projects in construction.

About these ads

Actions

Information

2 responses

3 05 2007
Stijn

Hi,

I have no experience with (J)Ruby but am intrigued by your post.

You mentioned that you also used it from a JMS queue, i.e. you had a (XML?) payload and pushed it right into the DB. Is that right?

If so, then I must investigate this further! :)

I’ll start reading up more on JRuby anyway just to see what role it fulfills in this kind of a setup.

Regards,

Stijn.

3 05 2007
kofno

In our JMS scenario, the payload was actually a serialized object, but it could have just as easily been XML, or YAML, or anything else that Java or Ruby speak. Once the payload is processed, it is pushed into the DB, but it’s not limited to just DB’s. We could, for example, send the data to a Web Service if such a requirement existed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




Follow

Get every new post delivered to your Inbox.

%d bloggers like this: