博客 Elasticsearch中使用IK分词配置

Elasticsearch中使用IK分词配置

数栈君发表于 2023-09-25 10:54 1223 0

在Elasticsearch中，IK分词是一种基于ik_max_word（最大分词长度）和ik_smart（智能分词模式）的分词插件。它可以对中文文本进行分词，支持三种分词模式：精确模式、全模式和搜索引擎模式。本文将介绍如何在Elasticsearch中配置IK分词。

一、安装IK分词插件

1. 下载IK分词插件

首先，我们需要下载IK分词插件。可以从GitHub上下载最新版本的ik分词插件：https://github.com/medcl/elasticsearch-analysis-ik

2. 解压插件文件

下载完成后，将插件文件解压到Elasticsearch的`plugins`目录下。如果没有`plugins`目录，请手动创建。

3. 重启Elasticsearch

为了使得IK分词插件生效，需要重启Elasticsearch。可以使用以下命令重启Elasticsearch：

```bash
# Linux系统
sudo systemctl restart elasticsearch

# Windows系统
net stop elasticsearch
net start elasticsearch
```

二、配置IK分词

1. 创建索引时指定分词插件

在创建索引时，可以通过`settings`参数指定IK分词插件。例如，创建一个名为`my_index`的索引，并使用IK分词插件：

```json
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"ik_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": [
"lowercase"
]
}
},
"filter": {
"lowercase": {
"type": "stop"
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_analyzer"
}
}
}
}
```

2. 修改现有索引的分词插件配置

如果需要修改现有索引的分词插件配置，可以使用以下命令：

```json
PUT /my_index/_settings
{
"analysis": {
"analyzer": {
"ik_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": [
"lowercase"
]
}
},
"filter": {
"lowercase": {
"type": "stop"
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_analyzer"
}
}
}
}
```

三、使用IK分词查询数据

1. 查询精确模式数据

使用精确模式进行查询，可以确保查询结果中的词语顺序与原文一致。例如，查询包含“Elasticsearch”和“中文”两个词语的数据：

```json
GET /my_index/_search?q=Elasticsearch+中文&fields=title^10,_source&pretty=true&default_operator=AND&size=10&timeout=30s&version=true&scroll=2m%20500ms&request_timeout=90s&collapse=false&human=true&error_trace=true&source=false&filter_path=hits.hits._source.title&filter_path=-title&explain=true&summary=false&highlight=true&fq=title:("Elasticsearch") AND title:("中文")&routing=1&unbounded_scroll=true&ignore_unavailable=true&max_concurrent_searches=1&client=default&prettyPrint=true&preference=0&qf=title^10,ik_max_word,ik_smart,ik_smart - ik_max_word,ik_max_word,ik_max_word,7,标准分词，同义词，大小写敏感，不区分大小写，全模式，精确模式，搜索引擎模式，ik分词器，自定义分析器，自定义分析器，ik分词器 - title^10,ik_max_word,ik_smart,ik_smart,7,标准分词，同义词，大小写敏感，不区分大小写，全模式，精确模式，搜索引擎模式，ik分词器，自定义分析器，自定义分析器，ik分词器 - title^5,ik_max_word,ik_smart,ik_smart,7,标准分词，同义词，大小写敏感，不区分大小写，全模式

《数据治理行业实践白皮书》下载地址：https://fs80.cn/4w2atu

《数栈V6.0产品白皮书》下载地址：https://fs80.cn/cw0iw1

想了解或咨询更多有关袋鼠云大数据产品、行业解决方案、客户案例的朋友，浏览袋鼠云官网：https://www.dtstack.com/?src=bbs

同时，欢迎对大数据开源项目有兴趣的同学加入「袋鼠云开源框架钉钉技术群」，交流最新开源技术信息，群号码：30537511，项目地址：https://github.com/DTStack

mysql 数据库集群部署 Hadoop集群大数据集群集群管理 CDH集群迁移 CDH集群数据迁移 CDH集群迁移方案 IK分词大数据 Elasticsearch

0条评论

上一篇：RabbitMQ:hello结构

下一篇：大数据集成平台

我要提问

分享经验

社区公告

大数据领域最专业的产品&技术交流社区，专注于探讨与分享大数据领域有趣又火热的信息，专业又专注的数据人园地

最新活动更多

Elasticsearch中使用IK分词配置

我要提问

分享经验

微信扫码获取数字化转型资料

钉钉扫码加入技术交流群