What is Spark ML?.
It is a new package introduced in Spark 1.2, aims to provide a uniform set of high-level APIs that help users create and tune practical machine learning pipelines. Spark ML standardizes APIs for machine learning algorithms to make it easier to combine multiple algorithms into a single pipeline, or workflow.
Maintained by | License Type | Popular Examples | Support | Updates | Developer Skills |
---|---|---|---|---|---|
Apache | Apache 2.0 | – | spark.apache.org/mllib | – | Scala, Java, Python, R |
Often Compared to | Testing | Accessibility | Maintained by | Repository |
---|---|---|---|---|
python, tensorflow | – | – | Apache 2.0 | github.com/apache/spark |
Pros:
- Ease of use, the Spark API allows minimal boilerplate and can be written in a variety of languages including Python, Scala, and Java.
- Flexibility, the frameworks comes with support for streaming, batch processing, sql queries, machine learning, etc. It can be used in a variety of applications without needing to integrate a lot of other distributed processing technologies.
Cons:
- Running applications on a cluster is not well documented anywhere, some applications are hard to debug.
- Debugging and Testing are sometimes time-consuming.