Spark2
- academic nicety : 学术细节
- How are stages split into tasks in Spark?
- Apache Arrow
Spark Shell Startups:
- Local Mode : bin/spark-shell --master local[3]
- Standalone Cluster: ./bin/spark-shell --master spark://IP:PORT
- Big data applications are good candidates for utilizing the Kubernetes architecture because of the scalability and extensibility of Kubernetes clusters.
References
- How to Install Apache Spark on Multi-Node Cluster (Standalone)
- Spark On YARN 集群安装部署
Understanding your Apache Spark Application Through Visualization
Tuning Apache Spark Jobs the Easy Way: Web UI Stage Detail View
Remote Spark Jobs on YARN ---Great
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets
- How-to: Tune Your Spark Jobs (Part II)
- Spark testing base
- Distributed SQL Query Engine : Presto
- cleanframes — data cleansing library for Apache Spark!