29 August 2013

Creating a Basic Hive UDF

Creating and using a basic Hive UDF is pretty simple.

First locate the hive-exec and hadoop-core jars on your system, and add them to the class path:

CLASSPATH=$CLASSPATH:/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hive/lib/hive-exec-0.10.0-cdh4.2.1.jar:/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.2.1.jar:.

Next create a directory structure for the java files:

mkdir -p udf_test/src/com/sodonnel/udf
mkdir -p udf_test/classes

Create the most basic hello world UDF in udf_test/src/com/sodonnel/udf/HelloWorld.java:

package com.sodonnel.udf;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;


public class HelloWorld extends UDF
{
  public String evaluate(String v) {
    return "Hello World!";
  }
}

In the src directory, compile the Java class:

javac -d ../classes com/sodonnel/udf/HelloWorld.java

This will create the directories and class file under the classes folder. Now we need to create a JAR out of the class file. In the classes directory run the following command:

jar cf HelloWorld.jar com

The final step is to load this jar file into Hive:

hive> add jar /export/home/sodonnel/udf/src/com/sodonnel/udf_test/classes/HelloWorld.jar;
Added /export/home/sodonnel/udf/src/com/sodonnel/udf_test/classes/HelloWorld.jar to class path
Added resource: /export/home/sodonnel/udf/src/com/sodonnel/udf_test/classes/HelloWorld.jar

hive> create temporary function hello_world as 'com.sodonnel.udf.HelloWorld';
OK
Time taken: 0.0040 seconds

Now call the function when selecting some rows from a table:

hive> select hello_world('any string') from my_table limit 10;
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!

Not a very useful UDF, but it opens the door for more interesting things.

blog comments powered by Disqus