29 August 2013
Creating and using a basic Hive UDF is pretty simple.
First locate the hive-exec and hadoop-core jars on your system, and add them to the class path:
CLASSPATH=$CLASSPATH:/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hive/lib/hive-exec-0.10.0-cdh4.2.1.jar:/opt/cloudera/parcels/CDH-4.2.1-1.cdh4.2.1.p0.5/lib/hadoop/client-0.20/hadoop-core-2.0.0-mr1-cdh4.2.1.jar:.
Next create a directory structure for the java files:
mkdir -p udf_test/src/com/sodonnel/udf
mkdir -p udf_test/classes
Create the most basic hello world UDF in udf_test/src/com/sodonnel/udf/HelloWorld.java:
package com.sodonnel.udf;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.udf.UDFType;
public class HelloWorld extends UDF
{
public String evaluate(String v) {
return "Hello World!";
}
}
In the src directory, compile the Java class:
javac -d ../classes com/sodonnel/udf/HelloWorld.java
This will create the directories and class file under the classes folder. Now we need to create a JAR out of the class file. In the classes directory run the following command:
jar cf HelloWorld.jar com
The final step is to load this jar file into Hive:
hive> add jar /export/home/sodonnel/udf/src/com/sodonnel/udf_test/classes/HelloWorld.jar;
Added /export/home/sodonnel/udf/src/com/sodonnel/udf_test/classes/HelloWorld.jar to class path
Added resource: /export/home/sodonnel/udf/src/com/sodonnel/udf_test/classes/HelloWorld.jar
hive> create temporary function hello_world as 'com.sodonnel.udf.HelloWorld';
OK
Time taken: 0.0040 seconds
Now call the function when selecting some rows from a table:
hive> select hello_world('any string') from my_table limit 10;
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Hello World!
Not a very useful UDF, but it opens the door for more interesting things.