DolphinScheduler是一个分布式易用的大数据工作流调度系统,提供了可视化的web操作界面,帮助用户快速、高效地构建和调度大数据任务;支持分布式部署和单机部署两种方式。单机部署适用于小规模使用场景,可以在一台机器上快速搭建并运行。
本文将介绍如何在单机上部署DolphinScheduler
一、准备工作
在开始部署之前,确保你已经安装了以下软件和环境:
Java 8或更高版本
MySQL数据库
Hadoop和Hive(可选,用于执行Hadoop和Hive任务)
Centos服务器 (获取方法) (使用的链接工具是XShell)
安装包在下面链接中,这里给出了两个,按方便的来下,也可以自行去官网下载
盼兮网盘 提取码:GuMNm (可直接下载)
百度网盘 提取码:55k9
如果没有安装,就按顺序操作,安装了可以直接到DolphinScheduler
二、安装JDK和Mysql
由于之前已经写过了,这里就直接放链接了
安装JDK:云服务器上安装JDK
安装Mysql:云服务器上配置Mysql
三、安装DolphinScheduler
wget --no-check-certificate https://archive.apache.org/dist/dolphinscheduler/3.1.4/apache-dolphinscheduler-3.1.4-bin.tar.gz
cd /tmp
mkdir -p /opt/soft/dolphinscheduler
tar -zxvf /tmp/apache-dolphinscheduler-3.1.4-bin.tar.gz -C /opt/soft/dolphinscheduler
CREATE USER 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler';
授予操作 dolphinscheduler 数据库的权限并刷新权限
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO 'dolphinscheduler'@'%' IDENTIFIED BY 'dolphinscheduler';
flush privileges;
mysql -u dolphinscheduler -p
vim /opt/soft/dolphinscheduler/apache-dolphinscheduler-3.1.4-bin/bin/env/dolphinscheduler_env.sh
内容如下:
# JAVA_HOME, will use it to start DolphinScheduler server
export JAVA_HOME=${JAVA_HOME:-/root/jdk1.8.0_11} # 修改为自己的jdk安装目录
# 修改MySQL配置
# Database related configuration, set database type, username and password
export DATABASE=${DATABASE:-mysql}
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_URL="jdbc:mysql://localhost:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false"
export SPRING_DATASOURCE_USERNAME="dolphinscheduler"
export SPRING_DATASOURCE_PASSWORD="dolphinscheduler"
# DolphinScheduler server related configuration
export SPRING_CACHE_TYPE=${SPRING_CACHE_TYPE:-none}
export SPRING_JACKSON_TIME_ZONE=${SPRING_JACKSON_TIME_ZONE:-UTC}
export MASTER_FETCH_COMMAND_NUM=${MASTER_FETCH_COMMAND_NUM:-10}
# Registry center configuration, determines the type and link of the registry center
export REGISTRY_TYPE=${REGISTRY_TYPE:-zookeeper}
export REGISTRY_ZOOKEEPER_CONNECT_STRING=${REGISTRY_ZOOKEEPER_CONNECT_STRING:-localhost:2181}
# Tasks related configurations, need to change the configuration if you use the related tasks.
export HADOOP_HOME=${HADOOP_HOME:-/opt/soft/hadoop}
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/opt/soft/hadoop/etc/hadoop}
export SPARK_HOME1=${SPARK_HOME1:-/opt/soft/spark1}
export SPARK_HOME2=${SPARK_HOME2:-/opt/soft/spark2}
export PYTHON_HOME=${PYTHON_HOME:-/opt/soft/python}
export HIVE_HOME=${HIVE_HOME:-/opt/soft/hive}
export FLINK_HOME=${FLINK_HOME:-/opt/soft/flink}
export DATAX_HOME=${DATAX_HOME:-/opt/soft/datax}
export SEATUNNEL_HOME=${SEATUNNEL_HOME:-/opt/soft/seatunnel}
export CHUNJUN_HOME=${CHUNJUN_HOME:-/opt/soft/chunjun}
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME/bin:$JAVA_HOME/bin:$HIVE_HOME/bin:$FLINK_HOME/bin:$DATAX_HOME/bin:$SEATUNNEL_HOME/bin:$CHUNJUN_HOME/bin:$PATH
vim /opt/soft/dolphinscheduler/apache-dolphinscheduler-3.1.4-bin/standalone-server/conf/application.yaml
主要修改的位置有两处:前面的datasource和后面datasource,这两处都改成如下即可
datasource:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
username: dolphinscheduler
password: dolphinscheduler
可以在vim的命令下输入 /datasource ,回车便会跳转,按 n 可以查看下一个
修改后的内容如下所示
spring:
jackson:
time-zone: UTC
date-format: "yyyy-MM-dd HH:mm:ss"
banner:
charset: UTF-8
cache:
# default enable cache, you can disable by `type: none`
type: none
cache-names:
- tenant
- user
- processDefinition
- processTaskRelation
- taskDefinition
caffeine:
spec: maximumSize=100,expireAfterWrite=300s,recordStats
sql:
init:
schema-locations: classpath:sql/dolphinscheduler_h2.sql
datasource:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
username: dolphinscheduler
password: dolphinscheduler
quartz:
job-store-type: jdbc
jdbc:
initialize-schema: never
properties:
org.quartz.threadPool:threadPriority: 5
org.quartz.jobStore.isClustered: true
org.quartz.jobStore.class: org.springframework.scheduling.quartz.LocalDataSourceJobStore
org.quartz.scheduler.instanceId: AUTO
org.quartz.jobStore.tablePrefix: QRTZ_
org.quartz.jobStore.acquireTriggersWithinLock: true
org.quartz.scheduler.instanceName: DolphinScheduler
org.quartz.threadPool.class: org.quartz.simpl.SimpleThreadPool
org.quartz.jobStore.useProperties: false
org.quartz.threadPool.makeThreadsDaemons: true
org.quartz.threadPool.threadCount: 25
org.quartz.jobStore.misfireThreshold: 60000
org.quartz.scheduler.makeSchedulerThreadDaemon: true
org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.StdJDBCDelegate
org.quartz.jobStore.clusterCheckinInterval: 5000
servlet:
multipart:
max-file-size: 1024MB
max-request-size: 1024MB
messages:
basename: i18n/messages
jpa:
hibernate:
ddl-auto: none
mvc:
pathmatch:
matching-strategy: ANT_PATH_MATCHER
registry:
type: zookeeper
zookeeper:
namespace: dolphinscheduler
connect-string: localhost:2181
retry-policy:
base-sleep-time: 60ms
max-sleep: 300ms
max-retries: 5
session-timeout: 30s
connection-timeout: 9s
block-until-connected: 600ms
digest: ~
security:
authentication:
# Authentication types (supported types: PASSWORD,LDAP)
type: PASSWORD
# IF you set type `LDAP`, below config will be effective
ldap:
# ldap server config
urls: ldap://ldap.forumsys.com:389/
base-dn: dc=example,dc=com
username: cn=read-only-admin,dc=example,dc=com
password: password
user:
# admin userId when you use LDAP login
admin: read-only-admin
identity-attribute: uid
email-attribute: mail
# action when ldap user is not exist (supported types: CREATE,DENY)
not-exist-action: CREATE
# Traffic control, if you turn on this config, the maximum number of request/s will be limited.
# global max request number per second
# default tenant-level max request number
traffic:
control:
global-switch: false
max-global-qps-rate: 300
tenant-switch: false
default-tenant-qps-rate: 10
#customize-tenant-qps-rate:
# eg.
#tenant1: 11
#tenant2: 20
master:
listen-port: 5678
# master fetch command num
fetch-command-num: 10
# master prepare execute thread number to limit handle commands in parallel
pre-exec-threads: 10
# master execute thread number to limit process instances in parallel
exec-threads: 10
# master dispatch task number per batch
dispatch-task-number: 3
# master host selector to select a suitable worker, default value: LowerWeight. Optional values include random, round_robin, lower_weight
host-selector: lower_weight
# master heartbeat interval
heartbeat-interval: 10s
# Master heart beat task error threshold, if the continuous error count exceed this count, the master will close.
heartbeat-error-threshold: 5
# master commit task retry times
task-commit-retry-times: 5
# master commit task interval
task-commit-interval: 1s
state-wheel-interval: 5s
# master max cpuload avg, only higher than the system cpu load average, master server can schedule. default value -1: the number of cpu cores * 2
max-cpu-load-avg: -1
# master reserved memory, only lower than system available memory, master server can schedule. default value 0.3, the unit is G
reserved-memory: 0.3
# failover interval
failover-interval: 10m
# kill yarn jon when failover taskInstance, default true
kill-yarn-job-when-task-failover: true
worker-group-refresh-interval: 10s
worker:
# worker listener port
listen-port: 1234
# worker execute thread number to limit task instances in parallel
exec-threads: 10
# worker heartbeat interval
heartbeat-interval: 10s
# Worker heart beat task error threshold, if the continuous error count exceed this count, the worker will close.
heartbeat-error-threshold: 5
# worker host weight to dispatch tasks, default value 100
host-weight: 100
# tenant corresponds to the user of the system, which is used by the worker to submit the job. If system does not have this user, it will be automatically created after the parameter worker.tenant.auto.create is true.
tenant-auto-create: true
#Scenes to be used for distributed users.For example,users created by FreeIpa are stored in LDAP.This parameter only applies to Linux, When this parameter is true, worker.tenant.auto.create has no effect and will not automatically create tenants.
tenant-distributed-user: false
# worker max cpuload avg, only higher than the system cpu load average, worker server can be dispatched tasks. default value -1: the number of cpu cores * 2
max-cpu-load-avg: -1
# worker reserved memory, only lower than system available memory, worker server can be dispatched tasks. default value 0.3, the unit is G
reserved-memory: 0.3
# default worker groups separated by comma, like 'worker.groups=default,test'
groups:
- default
# alert server listen host
alert-listen-host: localhost
alert-listen-port: 50052
task-execute-threads-full-policy: REJECT
alert:
port: 50052
# Mark each alert of alert server if late after x milliseconds as failed.
# Define value is (0 = infinite), and alert server would be waiting alert result.
wait-timeout: 0
python-gateway:
# Weather enable python gateway server or not. The default value is true.
enabled: true
# Authentication token for connection from python api to python gateway server. Should be changed the default value
# when you deploy in public network.
auth-token: jwUDzpLsNKEFER4*a8gruBH_GsAurNxU7A@Xc
# The address of Python gateway server start. Set its value to `0.0.0.0` if your Python API run in different
# between Python gateway server. It could be be specific to other address like `127.0.0.1` or `localhost`
gateway-server-address: 0.0.0.0
# The port of Python gateway server start. Define which port you could connect to Python gateway server from
# Python API side.
gateway-server-port: 25333
# The address of Python callback client.
python-address: 127.0.0.1
# The port of Python callback client.
python-port: 25334
# Close connection of socket server if no other request accept after x milliseconds. Define value is (0 = infinite),
# and socket server would never close even though no requests accept
connect-timeout: 0
# Close each active connection of socket server if python program not active after x milliseconds. Define value is
# (0 = infinite), and socket server would never close even though no requests accept
read-timeout: 0
server:
port: 12345
servlet:
session:
timeout: 120m
context-path: /dolphinscheduler/
compression:
enabled: true
mime-types: text/html,text/xml,text/plain,text/css,text/javascript,application/javascript,application/json,application/xml
jetty:
max-http-form-post-size: 5000000
management:
endpoints:
web:
exposure:
include: '*'
endpoint:
health:
enabled: true
show-details: always
health:
db:
enabled: true
defaults:
enabled: false
metrics:
tags:
application: ${spring.application.name}
audit:
enabled: true
metrics:
enabled: true
# Override by profile
---
spring:
config:
activate:
on-profile: postgresql
quartz:
properties:
org.quartz.jobStore.driverDelegateClass: org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
datasource:
driver-class-name: org.postgresql.Driver
url: jdbc:postgresql://127.0.0.1:5432/dolphinscheduler
username: root
password: root
---
spring:
config:
activate:
on-profile: mysql
sql:
init:
schema-locations: classpath:sql/dolphinscheduler_mysql.sql
datasource:
driver-class-name: com.mysql.cj.jdbc.Driver
url: jdbc:mysql://127.0.0.1:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8
username: dolphinscheduler
password: dolphinscheduler
10.配置MySQL驱动
cd /opt/soft/dolphinscheduler
wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.16/mysql-connector-java-8.0.16.jar -P /tmp
cp /tmp/mysql-connector-java-8.0.16.jar apache-dolphinscheduler-3.1.4-bin/worker-server/libs
cp /tmp/mysql-connector-java-8.0.16.jar apache-dolphinscheduler-3.1.4-bin/api-server/libs
cp /tmp/mysql-connector-java-8.0.16.jar apache-dolphinscheduler-3.1.4-bin/alert-server/libs
cp /tmp/mysql-connector-java-8.0.16.jar apache-dolphinscheduler-3.1.4-bin/master-server/libs
cp /tmp/mysql-connector-java-8.0.16.jar apache-dolphinscheduler-3.1.4-bin/tools/libs
cp /tmp/mysql-connector-java-8.0.16.jar apache-dolphinscheduler-3.1.4-bin/standalone-server/libs/standalone-server
11.初始化Dolphinscheduler数据库
cd /opt/soft/dolphinscheduler/apache-dolphinscheduler-3.1.4-bin
bash tools/bin/upgrade-schema.sh
出现如下表示成功
cd /opt/soft/dolphinscheduler/apache-dolphinscheduler-3.1.4-bin/bin
./dolphinscheduler-daemon.sh start standalone-server
停止dolphinscheduler
./dolphinscheduler-daemon.sh stop standalone-server
至此,你已经成功部署并启动了DolphinScheduler的单机版。
参考博客
dolphinscheduler单机版部署教程
总结
通过上述步骤,你可以在一台机器上快速部署DolphinScheduler的单机版。如果你需要更高的性能和可靠性,可以考虑使用分布式部署方式。
希望本教程对您有所帮助!如果在部署过程中遇到问题,请随时在评论区留言或参考DolphinScheduler官方文档获取更详细的部署指南。感谢阅读!
https://blog.csdn.net/m0_70405779/article/details/140721238
本文系转载,版权归原作者所有,如若侵权请联系我们进行删除!
《数据资产管理白皮书》下载地址:
《行业指标体系白皮书》下载地址:
《数据治理行业实践白皮书》下载地址:
《数栈V6.0产品白皮书》下载地址:
想了解或咨询更多有关袋鼠云大数据产品、行业解决方案、客户案例的朋友,浏览袋鼠云官网:
同时,欢迎对大数据开源项目有兴趣的同学加入「袋鼠云开源框架钉钉技术群」,交流最新开源技术信息,群号码:30537511,项目地址: