Spark ML

Spark ML

High-level API


What is Spark ML?.

It is a new package introduced in Spark 1.2, aims to provide a uniform set of high-level APIs that help users create and tune practical machine learning pipelines. Spark ML standardizes APIs for machine learning algorithms to make it easier to combine multiple algorithms into a single pipeline, or workflow.



Maintained by License Type Popular Examples Support Updates Developer Skills
Apache Apache 2.0 spark.apache.org/mllib ‎Scala‎, ‎Java‎, ‎Python‎, ‎R
Often Compared to Testing Accessibility Maintained by Repository
python, tensorflow Apache 2.0 github.com/apache/spark



  Pros:
  • Ease of use, the Spark API allows minimal boilerplate and can be written in a variety of languages including Python, Scala, and Java.
  • Flexibility, the frameworks comes with support for streaming, batch processing, sql queries, machine learning, etc. It can be used in a variety of applications without needing to integrate a lot of other distributed processing technologies.
  Cons:
  • Running applications on a cluster is not well documented anywhere, some applications are hard to debug.
  • Debugging and Testing are sometimes time-consuming.