Alessandra Bilardi - From your laptop to all resources you need by your Jupyter notebook | PyData

Learn how to use Dask, a Python library for parallelism, to parallelize your machine learning tasks on the cloud. Discover how to set up a Dask cluster on AWS and use Dask cloud provider to scale your Python tools.

Key takeaways
  • Dask is a general-purpose Python library for parallelism for CPUs and Shit-learn only uses one.
  • Dask cloud provider supports four cloud providers: AWS, Azure, Google Platform, and OpenStack.
  • Dask distributed is the collection that we will use to create the cluster and some system libraries.
  • Dask cluster is our layer to follow.
  • The client is our notebook or our code.
  • The scheduler is the state of the workers.
  • The workers are who computes the tasks, our statements that we want to parallelize.
  • We can use a library or a combo of libraries that today are useful to manage resources in the cloud.
  • We need to configure AWS to create an account.
  • We need to install AWS command line and all packages that we need.
  • We need to create a cluster on AWS.
  • We need to configure the credentials that we just created.
  • We can now use Dask cloud provider to parallelize our Shit-learn statements.
  • We can use Dask cloud provider to scale our Python tools.