May 30, 2017

Merge Empty HBase Regions

If you have ascending row keys and cells with TTL set on them, you can end up with a lot of empty regions that need merged.


February 06, 2017

Writing Data To HDFS From Java

Some notes on writing data to HDFS from a Java application


October 05, 2015

Setup Open LDAP on Centos 6

A short guide to get Open LDAP up and running on Centos 6, allowing other machines to authenticate with the Open LDAP server.


September 23, 2015

Loading Stack Exchange Data Dumps to Hadoop and Hive

A simple tutorial showing how to load Stack Exchange data dumps to Hive


September 24, 2015

Map Reduce with XML Input and Multiple Avro Outputs

A map reduce job to convert the Stack Exchange data dumps into Avro format from their original XML format.


July 06, 2015

Create a RPM from a Ruby gem

A simple example to create an RPM from a Ruby gem using rpmbuild


May 05, 2015

Map Reduce Counters

A short post illustrating how to create custom counters in map reduce jobs


April 27, 2015

Map Reduce Multiple Outputs

Another simple map reduce example demonstrating how to use MultipleOutputs to write to more than one file.


April 21, 2015

Comparing Sequence Files, ORC Files and Parquet Files

A quick look at the performance characteristics of then Hadoop Sequence, ORC and Parquet file formats


March 02, 2015

Experimenting with Flume Performance

A summary of some course tests I performed against Flume to see how different settings affect performance


January 25, 2015

Creating Centos or Redhat init scripts

A description on how to create SysV init scripts for Redhat / Centos systems, including a script template.


January 06, 2015

Unit Testing Map Reduce Programs With MRUnit

Using MRUnit to test a simple Hadoop map reduce program, including how to run the job without a Hadoop cluster


December 15, 2014

Creating a Simple Map Reduce Program for Cloudera Hadoop

An example of a very simple Java Map Reduce program that counts the number of unique works in a CSV delimited data.


December 10, 2014

Maven Config For Cloudera Map Reduce Programs

Steps to setup a Maven project to develop Map Reduce applications against Cloudera Hadoop.


October 23, 2014

Creating a RPM From the Java JDK Tar File

Steps describing how to convert a tar file of precompiled software, in this case the Java JDK, into an installable RPM.


October 02, 2014

Speeding up Ember CLI build times on OS X

Using a RAM disk for an Ember CLI projects tmp directory to speed up compile times


September 04, 2014

Emacs Setup for Version 24.3

An more modern and improved Emacs setup, including ECB and Enhance Ruby Mode, Javascript mode and more.


August 11, 2014

Check Apache or Nginx Compression Is Working

A quick way to check your webserver is compressing html, JS and CSS.


August 07, 2014

Ruby Performance Analysis Tools

A selection of tools that can be used to analyze the performance of Ruby programs.


October 15, 2013

Profiling a Mysql Query

How to get Mysql to tell you how many reads it performs when running a query.


October 14, 2013

Statically Compiling Git

If you don't have access to a C compiler, or have the ability to install an RPM on a machine, but you still want to use git, you need to create yourself a statically linked binary on another machine and copy it over.


September 20, 2013

Masking data in Hive

A simple UDF to mask data using a keyed HMAC


August 29, 2013

Creating a Basic Hive UDF

A summary of the steps required to create a User Defined Function (UDF) in Hive.


May 21, 2013

Building a Ruby 2.0.0 RPM

Using Mock to build a Ruby 2.0.0 RPM on Centos


April 10, 2013

Test Hard Disk Speed With dd

It is pretty simple to test the speed of writing a large file with the dd command, but you need to know about the subtle options available when writing a file to actually test the speed of a storage device.


March 07, 2013

PLSQL Unit Test

Continuing on my track of creating Ruby Gems to help with database interactions, I created PLSQL Unit Test.


February 28, 2013

Data Factory

When I was writing PLSQL unit tests with Ruby, I came across a need to stage test data into tables. I didn't really want to involve an ORM like Active Record, but working with RAW insert statements quickly became a chore, so I created the Data Factory gem.


February 12, 2013

Simple Oracle JDBC

The first, and hopefully not last Ruby Gem I have released. This one is a wrapper around Oracle JDBC connections to make it easier to make database calls.


January 23, 2013

One Large Redis or Many Smaller Shards?

A critical look at Redis, outlining some things to watch out for when running large Redis instances, making the case for many smaller instances instead.


January 18, 2013

Calculating Velocity Scores at the Speed of Redis

To really get a feel for a technology, you have to at least build a proof of concept application using it. Building a simple application that uses Redis as an in memory hash table isn't very interesting, so I was searching for something that would test some of the more advanced features of Redis.


January 17, 2013

A Quick Overview Of Redis

Redis has gotten a lot of positive hype over the last few years, so I decided to make it the first NoSQL database I would investigate.


January 16, 2013

Nosql isn't just Hype

A very brief and incomplete introduction to nosql


January 10, 2013

Connecting to Sybase from Java with the JTDS drivers

A short piece of Java code to connect to Java from Sybase


December 19, 2012

Installing the libv8 Ruby gem on Centos 5.8

How to install the Ruby libv8 gem on Centos 5.8 without (much) pain


December 17, 2012

Ruby 1.9.3 libyaml centos 5.6

How to get Ruby installed so that gem, irb etc do not complain about missing psych and libyaml.


October 07, 2012

DBGeni - Better database installs

My first complete side-project, the Database Generic Installer - DBGeni. It is a Ruby gem that applies changes to databases using migration scripts.


May 22, 2012

Oracle JDBC connections slow to connect /dev/urandom

This little workaround can solve a problem when JDBC connection to Oracle are slow to establish.


November 16, 2011

JDBC, JRuby and Oracle

How to connect to Oracle using JDBC and JRuby


November 16, 2011

Connecting to Sybase with JRuby using the jtds drivers

A simple method to make JDBC connections to Sybase using JRuby


October 19, 2011

Installing ruby-oci8 on 64 bit Windows

Solving the error OCI.DLL: 193 is not a valid Win32 application when installing ruby-oci8


September 20, 2011

On getting stuff done in companies

Interesting Hacker News comment on the topic of how to hire people who get stuff done


December 23, 2010

Achievements in 2010

A quick list of what I achieved in 2010. Probably not of interest to many people, but it was worth writing for my own benefit.


December 08, 2010

A few general emacs tips

Finishing up the series on Emacs, this post points out a few tips and tricks I have learned working in it for the past few years.


November 30, 2010

Rails options for Emacs

How to configure emacs with some Rails specific IDE modes, however the simple solution may be good enough for most people.


November 29, 2010

Emacs Ruby Foo

Before worrying about Rails, you need to get Emacs to behave sanely with Ruby code. That means it should indent it automatically and syntax highlight the code. This post will guide you on getting ruby-mode working nicely.


November 28, 2010

Setting up the emacs code browser

This post continues the series on setting up emacs, this time focusing on installing, configuring and using the emacs code browser (ecb).


November 28, 2010

Installing Emacs on Windows and OS X

The second post in a series about setting Emacs up for Ruby on Rails development. This post explains how to get Emacs installed on Windows and OS X.


November 28, 2010

Choosing a text editor for Rails development

The first post in a series about setting up Emacs, primarily for Ruby on Rails development. This post discusses the reasons for choosing Emacs and the goals of the tutorial.


November 24, 2010

Sell Yourself

If your 10th App has failed, and your money is running out, and you need to go get a job, then it would be good to have something to point to that shows off the skills you have learned...


November 23, 2010

The new blog is ready

Well, another new blog. Hopefully I will stick with it longer than the last two, where I pretty much ran out of steam after only a few posts ...