hpkaiq 收录于 编程 约 200 字 预计阅读 1 分钟
脚本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
| #! /bin/bash
source /etc/profile
set +o posix # to enable process substitution when not running on bash
scala_file=$1
shift 1
arguments=$@
##### scala 文件后加 sys.exit
spark-shell --master yarn \
--executor-cores 5 \
--num-executors 4 \
--executor-memory 8g \
--driver-memory 3g \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.default.parallelism=400 \
--conf spark.sql.shuffle.partitions=400 \
--conf spark.shuffle.io.maxRetries=10 \
--conf spark.shuffle.io.retryWait=20s \
--conf spark.yarn.executor.memoryOverhead=4096 \
--conf spark.network.timeout=300s \
--name sparkshell_scala \
-i <(echo 'val args = "'$arguments'".split("\\s+")' ; cat $scala_file)
|
简单scala文件示例
1
2
3
| val path = args(0)
spark.read.parquet(path).where("app_id = 77701").repartition(1).write.parquet(s"${path}_new")
sys.exit
|
如果不需要传参,可简单使用
1
| spark-shell < test.scala
|