博客大数据Hive篇--UDF函数

大数据Hive篇--UDF函数

数栈君发表于 2023-08-14 09:49 1133 0

什么是UDF:

它是User defined Function的简写，意思是用户自定义方法

为什么要用UDF？

hive自带了一些函数，比如：max、min 等，但是自带的函数数量有限，所以hive提供给用户自定义函数的功能。
udf 函数可以直接应用于select 语句，对查询结构做格式化处理之后，然后再输出内容。

hive 编写udf函数的时候需要注意的地方：

1.自定义udf函数需要继承org.apache.hadoop.hive.ql.UDF
2.需要实现evaluate 函数，evaluate 函数支持重载。
3.udf 必须要有返回类型，可以返回null，但是返回类型不能为void；
4.udf 常用Text/LongWrite 等类型，不推荐使用java类型。

如何编写UDF

创建project 打jar包

<dependency>

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-common</artifactId>

<version>2.6.0</version>

</dependency>

<dependency>

<groupId>org.apache.hive</groupId>

<artifactId>hive-exec</artifactId>

<version>1.1.0</version>

</dependency>

需要hadoop的common包和hive的hive-exec

编写一个简单的函数

在输出的字段前加上’Hello’

package com.njbdqn;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.io.Text;

public class MyFunc extends UDF {

public Text evaluate(Text txt){

return new Text("Hello,"+txt.toString());

}

怎么使用函数

使用之前，先把包打包，我打的是瘦包，亲测可使用，如不行，可打胖包。

http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/0d143ae2e5f75f76769acedf775073a4..png

然后通过xftp传输到linux上

http://dtstack-static.oss-cn-hangzhou.aliyuncs.com/2021bbs/files_user1/article/3fd3c4dfb7132bb52c3db8df762349f1..png

在linux启动了hadoop的前提下上传至hdfs端

将包上传hdfs：

hdfs dfs -mkdir -p /func

hdfs dfs -put /opt/jar/myfun-1.0-SNAPSHOT.jar /func

进入hive界面开始添加函数：

hive> add jar hdfs://192.168.56.100:9000/func/myfun-1.0-SNAPSHOT.jar；

创建新函数

create function mytest as "com.njbdqn.MyFunc";

选择表进行试验：

hive> select mytest(username) from userinfos limit 3;

OK

Hello,lhqye

Hello,gaqhq

Hello,thfqn

Time taken: 0.305 seconds, Fetched: 3 row(s)

此为原表内容：

hive> select username from userinfos limit 3;

OK

lhqye

gaqhq

thfqn

Time taken: 0.096 seconds, Fetched: 3 row(s)

免责申明：

本文系转载，版权归原作者所有，如若侵权请联系我们进行删除！

《数据治理行业实践白皮书》下载地址：https://fs80.cn/4w2atu

《数栈V6.0产品白皮书》下载地址：https://fs80.cn/cw0iw1

想了解或咨询更多有关袋鼠云大数据产品、行业解决方案、客户案例的朋友，浏览袋鼠云官网：https://www.dtstack.com/?src=bbs

同时，欢迎对大数据开源项目有兴趣的同学加入「袋鼠云开源框架钉钉技术群」，交流最新开源技术信息，群号码：30537511，项目地址：https://github.com/DTStack