Spark ML

Spark ML

High-level API

What is Spark ML?.

It is a new package introduced in Spark 1.2, aims to provide a uniform set of high-level APIs that help users create and tune practical machine learning pipelines. Spark ML standardizes APIs for machine learning algorithms to make it easier to combine multiple algorithms into a single pipeline, or workflow.

Maintained by License Type Popular Examples Support Updates Developer Skills
Apache Apache 2.0 ‎Scala‎, ‎Java‎, ‎Python‎, ‎R
Often Compared to Testing Accessibility Maintained by Repository
python, tensorflow Apache 2.0

  • Ease of use, the Spark API allows minimal boilerplate and can be written in a variety of languages including Python, Scala, and Java.
  • Flexibility, the frameworks comes with support for streaming, batch processing, sql queries, machine learning, etc. It can be used in a variety of applications without needing to integrate a lot of other distributed processing technologies.
  • Running applications on a cluster is not well documented anywhere, some applications are hard to debug.
  • Debugging and Testing are sometimes time-consuming.