Introduction
Docker can be a very powerful tool and you can learn how to use it without going all the way down the rabbit hole. In this guide, you will get just enough Docker knowledge to improve your data science workflow and avoid common pitfalls.
This guide is based on my experiences as an independent consultant, helping data science teams to introduce reproducible and automated machine learning workflows.
What is Docker?
Docker is container platform for deploying isolated applications on Linux/Windows. It includes a tool chain for creating, sharing, and building upon layered application stacks. Docker also forms the basis for more advanced services such as Docker Swarm from Docker Inc. and Kubernetes from Google.
As a Data Scientist, Why Should You Care?
Automation of deployment via Docker containers helps you to focus on your work, and not on maintaining complex software dependencies. By freezing the exact state of a deployed system inside an image, you also get easier reproducibility of your work and collaboration with your colleagues. Finally, you can use resources like Docker Hubto find pre-built recipes (Dockerfiles) from others that you can copy and build on.
Useful docker commands
docker run -it --rm --name ds -p 8888:8888 jupyter/datascience-notebook
above command used to get pre-existed image in docker hub and run with the name "ds" in the port 8888
docker ps
docker build ---- to build image from .Dockerfile
docker images ---shows all dowloaded images
docker rm ---- remove containers
docker rmi --- remove images
docker rmi -f image-name ---- to remove force
docker run -it -p 1234:80 --name hello-world
docker --version
control - c to exit or remove/stop running containers
Advantages:
- Return on investment & cost savings
- Standardization & productivity
- Compatibility & maintainability
- Simplicity & faster configurations
- Rapid Deployment
- Continuous Deployment & Testing
- Isolation
- Security
above command used to get pre-existed image in docker hub and run with the name "ds" in the port 8888
- Return on investment & cost savings
- Standardization & productivity
- Compatibility & maintainability
- Simplicity & faster configurations
- Rapid Deployment
- Continuous Deployment & Testing
- Isolation
- Security