What is Spark ML?.
It is a new package introduced in Spark 1.2, aims to provide a uniform set of high-level APIs that help users create and tune practical machine learning pipelines. Spark ML standardizes APIs for machine learning algorithms to make it easier to combine multiple algorithms into a single pipeline, or workflow.
|Maintained by||License Type||Popular Examples||Support||Updates||Developer Skills|
|Apache||Apache 2.0||–||spark.apache.org/mllib||–||Scala, Java, Python, R|
|Often Compared to||Testing||Accessibility||Maintained by||Repository|
|python, tensorflow||–||–||Apache 2.0||github.com/apache/spark|
- Ease of use, the Spark API allows minimal boilerplate and can be written in a variety of languages including Python, Scala, and Java.
- Flexibility, the frameworks comes with support for streaming, batch processing, sql queries, machine learning, etc. It can be used in a variety of applications without needing to integrate a lot of other distributed processing technologies.
- Running applications on a cluster is not well documented anywhere, some applications are hard to debug.
- Debugging and Testing are sometimes time-consuming.