脚本执行spark-shell scala文件退出

hpkaiq 收录于编程

2023-04-18 2023-04-18 约 200 字预计阅读 1 分钟 - 次阅读 - 条评论

脚本

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#! /bin/bash
source /etc/profile

set +o posix  # to enable process substitution when not running on bash 

scala_file=$1

shift 1

arguments=$@
 
##### scala 文件后加  sys.exit

spark-shell --master yarn \
--executor-cores 5 \
--num-executors 4 \
--executor-memory 8g \
--driver-memory 3g \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.default.parallelism=400 \
--conf spark.sql.shuffle.partitions=400 \
--conf spark.shuffle.io.maxRetries=10 \
--conf spark.shuffle.io.retryWait=20s \
--conf spark.yarn.executor.memoryOverhead=4096 \
--conf spark.network.timeout=300s \
--name sparkshell_scala \
-i <(echo 'val args = "'$arguments'".split("\\s+")' ; cat $scala_file)

简单scala文件示例

1
2
3
val path = args(0)
spark.read.parquet(path).where("app_id = 77701").repartition(1).write.parquet(s"${path}_new")
sys.exit

如果不需要传参，可简单使用

1
spark-shell < test.scala

Buy me a coffee~

赞赏

支付宝

微信