Wednesday, August 14, 2019

Docker for Data Scientists

Introduction

Docker can be a very powerful tool and you can learn how to use it without going all the way down the rabbit hole. In this guide, you will get just enough Docker knowledge to improve your data science workflow and avoid common pitfalls.
This guide is based on my experiences as an independent consultant, helping data science teams to introduce reproducible and automated machine learning workflows.

What is Docker?

Docker is container platform for deploying isolated applications on Linux/Windows. It includes a tool chain for creating, sharing, and building upon layered application stacks. Docker also forms the basis for more advanced services such as Docker Swarm from Docker Inc. and Kubernetes from Google.

As a Data Scientist, Why Should You Care?

Automation of deployment via Docker containers helps you to focus on your work, and not on maintaining complex software dependencies. By freezing the exact state of a deployed system inside an image, you also get easier reproducibility of your work and collaboration with your colleagues. Finally, you can use resources like Docker Hubto find pre-built recipes (Dockerfiles) from others that you can copy and build on.


Useful docker commands

docker run -it --rm --name ds -p 8888:8888 jupyter/datascience-notebook

above command used to get pre-existed image in docker hub and run with the name "ds" in the port 8888

docker ps
docker build  ---- to build image from .Dockerfile

docker images  ---shows all dowloaded images

docker rm   ---- remove containers

docker rmi  --- remove images

docker rmi -f image-name  ---- to remove force

docker run -it -p 1234:80 --name hello-world

docker --version

control - c to exit or remove/stop running containers

Advantages:

  • Return on investment & cost savings
  • Standardization & productivity
  • Compatibility & maintainability
  • Simplicity & faster configurations
  • Rapid Deployment
  • Continuous Deployment & Testing
  • Isolation
  • Security



No comments:

Post a Comment

Image noise comparison methods

 1. using reference image technique     - peak_signal_noise_ratio (PSNR)     - SSI 2. non-reference image technique     - BRISQUE python pac...