• BigDataFrameworks
  • Big Data Frameworks
  • Introduction
  • Apache Ignite
  • Alluxio
  • DataBases
  • Distributed Coordination
    • Raft
  • Distributed Tracing
    • Tail Latency
    • Pinpoint
    • Pinpoint本地环境搭建
  • Druid
    • ClickHouse
    • Presto
  • Flink
    • Flink DataStream Transformations
    • RocksDB
    • FlinkForward
  • HTTP2
  • Serialization
    • Java Object Serialization
    • Java Object Deep Copy
    • Kryo
    • Protocol Buffer
    • Microservice
  • RPC Frameworks
    • Arvo
    • Fingle
    • gRPC
    • Thrift
    • Wangle
    • Zero ICE
  • ElasticSearch
    • Spring Data ElasticSearch
    • TF-IDF统计方法
    • ELK
  • MongoDB
    • B-Tree Index
    • Log-Structured Merge Trees
    • MongoDB FAQ
  • Hadoop2
    • HDFS
    • Hadoop Compression
    • YARN
    • HBase
    • Monitoring Hadoop With Graphite
  • Kafka
    • Kafka简介
    • Punctuation
    • Data Change Capture
  • Spark
    • Spark2
    • Hyperloglog
  • Docker
    • Docker Network
    • Ansible
    • Mesos
    • Azure Custom Image
      • DCOS-Bootstrap
    • Golang
  • Kubernetes
    • Kubernetes Batch Job
    • Kubernetes Monitoring
      • Prometheus
      • AlertManager
      • Prometheus Operator
      • EFK
      • Prometheus Configuration
    • Kubernetes Networking
      • Istio
        • Istio-Security
        • Istio-Policy
      • Envoy
      • Kiali
      • Nginx
    • Kubernetes Scheduler
    • Kubernetes Security
      • opensslConf
      • JWT
    • Kubernetes Service Discovery
    • Kubernetes Storage
    • kubectl
    • CDF
    • Kubernetes技能图谱
    • Kubernetes Operator Pattern
    • kubeadm
    • Serverless
  • Kylin
  • Redis3
    • Python3 client for Redis
    • Memcache
    • Net Command
  • System Monitor
    • System Performance
    • Linux free
    • Linux ss
    • Linux pidstat
    • Linux dtrace
    • Linux sar
  • Net Command
  • netcat
    • curl
    • DNS-system
    • ip command
    • iptables
    • Linux lsof
    • NetworkNamespace
    • Router
  • Mathematics
  • Line Algebra
  • Probability
  • Statistics
    • StatisticsAndAnalytics
  • Vector Calculus
  • Machine Learning
  • DataScience
Powered by GitBook

Spark2

Spark2

  1. academic nicety : 学术细节
  2. How are stages split into tasks in Spark?
  3. Apache Arrow

Spark Shell Startups:

  1. Local Mode : bin/spark-shell --master local[3]
  2. Standalone Cluster: ./bin/spark-shell --master spark://IP:PORT
  3. Big data applications are good candidates for utilizing the Kubernetes architecture because of the scalability and extensibility of Kubernetes clusters.

References

  1. How to Install Apache Spark on Multi-Node Cluster (Standalone)
  2. Spark On YARN 集群安装部署
  3. Spark优化:禁止应用程序将依赖的Jar包传到HDFS – 过往记忆

  4. Spark基本工作流程及YARN cluster模式原理

  5. Understanding your Apache Spark Application Through Visualization

  6. Tuning Apache Spark Jobs the Easy Way: Web UI Stage Detail View

  7. 深入研究 spark 运行原理之 job, stage, task

  8. Remote Spark Jobs on YARN ---Great

  9. Spark project: Spark + Scala + Gradle + IntelliJ

  10. Spark应用remote debugging

  11. How to print the contents of RDD?

  12. A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets

  13. How-to: Tune Your Spark Jobs (Part II)
  14. Spark testing base
  15. Distributed SQL Query Engine : Presto
  16. cleanframes — data cleansing library for Apache Spark!

results matching ""

    No results matching ""