Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is sharing GPU to multiple containers feasible? #52757

Open
tianshapjq opened this issue Sep 20, 2017 · 76 comments
Open

Is sharing GPU to multiple containers feasible? #52757

tianshapjq opened this issue Sep 20, 2017 · 76 comments

Comments

@tianshapjq
Copy link
Contributor

@tianshapjq tianshapjq commented Sep 20, 2017

Is this a BUG REPORT or FEATURE REQUEST?: feature request
/kind feature

What happened:
As far, we do not support sharing GPU to multiple containers, one GPU can only be assigned to one container at a time. But we do have some requirements on achieving this, is it feasible that we manage GPU just like CPU or memory?

What you expected to happen:
sharing GPU to multiple containers just like CPU and memory.

@tianshapjq
Copy link
Contributor Author

@tianshapjq tianshapjq commented Sep 20, 2017

@vishh @cmluciano is it workable?

@tbchj
Copy link

@tbchj tbchj commented Sep 20, 2017

+1

@jianzhangbjz
Copy link
Contributor

@jianzhangbjz jianzhangbjz commented Sep 20, 2017

/cc

1 similar comment
@huzhengchuan
Copy link
Contributor

@huzhengchuan huzhengchuan commented Sep 20, 2017

/cc

@RenaudWasTaken
Copy link
Member

@RenaudWasTaken RenaudWasTaken commented Sep 20, 2017

/sig node
until we have a wg-resource-management label

From @flx42:

By default, kernels from different processes can't run on one GPU simultaneously (concurrency but not parallelism), they are time sliced. The Pascal architecture brings instruction-level preemption instead of block-level preemption, but context switches are not free.

Also, there is no way of partitioning GPU resources (SMs, memory), or even assignin priorities when sharing a card.

You also have MPS which is another problem :D

But I suppose you only mean sharing NVIDIA devices between multiple containers ?

Currently we are focusing on making sure GPU enablement through Device Plugin is done right in 1.8 but it could be a goal for 1.9.

@tianshapjq
Copy link
Contributor Author

@tianshapjq tianshapjq commented Sep 20, 2017

@RenaudWasTaken thanks! But another question, where to place the GPU enablement code if we separate the GPU from kubelet? Seems it's not appropriate to place the GPU code in the vendor pkg anymore, do we have to create a new repo related to kubernetes?

@RenaudWasTaken
Copy link
Member

@RenaudWasTaken RenaudWasTaken commented Sep 20, 2017

@tianshapjq see the device plugin design document for 1.8 which is how we plan to support GPUs in the future: kubernetes/community#695

@linyouchong
Copy link
Member

@linyouchong linyouchong commented Sep 20, 2017

@RenaudWasTaken Do you mean that sharing NVIDIA devices between multiple containers could be a goal for 1.9 ?

@dixudx
Copy link
Member

@dixudx dixudx commented Sep 27, 2017

/cc

@vishh
Copy link
Member

@vishh vishh commented Oct 5, 2017

Sharing GPUs is out of scope for the foreseeable future (at-least until v1.11). Our current focus is to get gpus per container working in production.

@vishh vishh self-assigned this Oct 5, 2017
@reverson
Copy link

@reverson reverson commented Oct 20, 2017

/cc

@ScorpioCPH
Copy link
Member

@ScorpioCPH ScorpioCPH commented Dec 6, 2017

/cc

@flx42
Copy link

@flx42 flx42 commented Feb 14, 2018

FWIW, I'm seeing more and more users/customers asking for a way to share a single GPU across a pod.

@mindprince
Copy link
Member

@mindprince mindprince commented Feb 28, 2018

@flx42 Did you mean sharing a single GPU between different containers belonging to the same pod? What isolation do your users/customers expect in such scenarios?

@tianshapjq
Copy link
Contributor Author

@tianshapjq tianshapjq commented Feb 28, 2018

@flx42 yes, seems the isolation is the blocker at this moment. GPU doesn't support secure isolation in production-grade, which would cause fatal damage if we simply assign one gpu to multiple containers IMO. If any news about gpu isolation please let me know :)

@WIZARD-CXY
Copy link
Contributor

@WIZARD-CXY WIZARD-CXY commented Feb 28, 2018

FWIW, I'm seeing more and more users/customers asking for a way to share a single GPU across a pod.

I'm one of the many users.

@tianshapjq
Copy link
Contributor Author

@tianshapjq tianshapjq commented Feb 28, 2018

@flx42 @mindprince BTW, as device plugin has been counted into extended resource, is that sharing would not be acceptable at present?

@flx42
Copy link

@flx42 flx42 commented Feb 28, 2018

What isolation do your users/customers expect in such scenarios?

Isolation doesn't matter in this case.

@allxone
Copy link

@allxone allxone commented Mar 1, 2018

+1

2 similar comments
@brucechou1983
Copy link

@brucechou1983 brucechou1983 commented Mar 26, 2018

+1

@YuxiJin-tobeyjin
Copy link
Contributor

@YuxiJin-tobeyjin YuxiJin-tobeyjin commented Apr 13, 2018

+1

@rafmonteiro
Copy link

@rafmonteiro rafmonteiro commented Mar 13, 2019

@cheyang Thank you for opensourcing this solution! However I'm a bit confused... Looking at the Installation Guide Step 1 is to download https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml then on Step 2 you mention kube-scheduler.yaml So I'm not sure if I should modify something on Step 1 or not.
My second question is: Would your solution work on Amazon EKS?

@wsxiaozhang
Copy link

@wsxiaozhang wsxiaozhang commented Mar 19, 2019

@cheyang Thank you for opensourcing this solution! However I'm a bit confused... Looking at the Installation Guide Step 1 is to download https://raw.githubusercontent.com/AliyunContainerService/gpushare-scheduler-extender/master/config/gpushare-schd-extender.yaml then on Step 2 you mention kube-scheduler.yaml So I'm not sure if I should modify something on Step 1 or not.
My second question is: Would your solution work on Amazon EKS?

@rafmonteiro basically, the solution has 2 parts, one for extending K8S scheduling logic, the other one for extending nvidia gpu device plugin. The 1st one is to extend K8S default scheduler with a gpushare-scheduler-extender, which is a separate running service executing gpu sharing specific scheduling logic. The install guide step 1 is to create this gpushare-scheduler-extender service first. Step 2 is to modify k8s default scheduler's startup configuration by adding scheduler-policy-config.json. This config json contains info of scheduler extenders, which points default scheduler to our gpushare-scheduler-extender service. In that way, default scheduler can pass gpu sharing related filtering and prioritization works to the extender. Actually it's K8S scheduler extender design.

Although we have not got a chance to test the solution on EKS, the both extension should be workable on any standard K8S. Any test and feedback is welcomed.

@eloyekunle
Copy link
Member

@eloyekunle eloyekunle commented Mar 23, 2019

@wsxiaozhang @resouer @cheyang @RenaudWasTaken

Taken together, do these projects (GPU Sharing Scheduler Extender, GPU Sharing Device Plugin) solve the Run GPU sharing workloads with Kubernetes + Kubeflow GSoC project?

If it does not:

  • What else is currently required for a complete solution?
  • Is the solution supposed to be a core part of Kubernetes, or provided as a separate framework?
  • How deeply will this solution be integrated into Kubeflow?
  • Can these two projects serve as a building block for this implementation?

Thanks, and I look forward to your responses.

@gaocegege
Copy link
Contributor

@gaocegege gaocegege commented Mar 25, 2019

@cheyang @wsxiaozhang

Thanks for your awesome work! I also have some questions about it. I think we are trying to use GPU sharing in DL/ML model serving workloads, thus I am a little comfuing why we need basic model training experience in your GSoC project.

Are you trying to share GPUs during DL training?

I'd appreciate it if you could answer me, thanks.

@resouer
Copy link
Member

@resouer resouer commented Mar 28, 2019

@eloyekunle

Taken together, do these projects (GPU Sharing Scheduler Extender, GPU Sharing Device Plugin) solve the Run GPU sharing workloads with Kubernetes + Kubeflow GSoC project?

Nope, the GSoC project aims at making these pieces working end to end in Kubeflow stack, which is missing from current solution.

Also, the GSoC project does not fix or intend to fix #52757 as GPU sharing is totally another story and a non-goal. Any open source GPU sharing mechanism should be acceptable during your GSoC project.

@gaocegege

Mostly serving, while also some cases for training like low-cost debugging & tuning phases etc.

@pbarker
Copy link
Contributor

@pbarker pbarker commented May 3, 2019

@xuemzhan
Copy link

@xuemzhan xuemzhan commented Jun 11, 2019

It's a useful and fantastic feature, marks for tracing.

@paulhodson
Copy link

@paulhodson paulhodson commented Jun 21, 2019

I too am interested in sharing devices across pods. I have a device plugin which is currently advertising copies of the device for different containers in a pod (e.g. mydomain.com/mydevice-cont1, mydomain.com/mydevice-cont2) but sharing would definitely be more elegant.

@sakjain92
Copy link

@sakjain92 sakjain92 commented Aug 27, 2019

This is not specifically related to kubernetes. But regarding a comment by @jiayingz

"Certain requirements, like resource isolation guarantees, still need to be driven by vendors. I feel many devices we have today may not provide as strong isolation guarantees as cpu and memory do. "

A year ago, I worked on a research project for exactly this problem statement of how to provide strong resource isolation guarantees in GPUs (taking NVIDIA latest GPUs as examples) without any hardware modifications needed. I was able to get good resource isolation between parallel processes running on the same NVIDIA GPU. You can look at the results in the paper ( https://ieeexplore.ieee.org/document/8743200 Open source link: http://www.andrew.cmu.edu/user/sakshamj/papers/FGPU_RTAS_2019_Fractional_GPUs_Software_based_Compute_and_Memory_Bandwidth_Reservation_for_GPUs.pdf )

If anyone is interested, the code base for this work is present at https://github.com/sakjain92/Fractional-GPUs (Note: This codebase was more oriented towards proof-of-concept and is not production-ready. It might aid in understanding the paper though)

Let me know if this looks interesting. I always welcome all feedback.

@eero-t
Copy link

@eero-t eero-t commented Sep 12, 2019

Certain requirements, like resource isolation guarantees, still need to be driven by vendors. I feel many devices we have today may not provide as strong isolation guarantees as cpu and memory do.

AMD's latest cgroups GPU proposal: https://lists.freedesktop.org/archives/dri-devel/2019-August/233463.html

Intel's latest proposal: https://lists.freedesktop.org/archives/intel-gfx/2019-May/197206.html

There have also been earlier proposals, listed in above mails.

After cgroups support is in kernel, some updates are needed also to OCI & CRI specs (and their container implementations) so that user is able to configure GPU limits in Kubernetes.

@pan87232494
Copy link

@pan87232494 pan87232494 commented Oct 10, 2019

https://github.com/Deepomatic/shared-gpu-nvidia-k8s-device-plugin

tried to install k8s 1.14 via kubespray 2.10, this plugin not working.

2019/10/10 05:37:54 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
2019/10/10 05:37:54 Registered device plugin with Kubelet
2019/10/10 05:41:36 ! Try to allocate devices [GPU-def4c886-13cb-53ef-9517-f8d07cfac6d3 GPU-21f11e51-117e-243f-388f-626e217f9b7d]
2019/10/10 05:41:36 ! Try to allocate devices [GPU-def4c886-13cb-53ef-9517-f8d07cfac6d3 GPU-21f11e51-117e-243f-388f-626e217f9b7d]
2019/10/10 05:54:17 ! Try to allocate devices [GPU-21f11e51-117e-243f-388f-626e217f9b7d GPU-def4c886-13cb-53ef-9517-f8d07cfac6d3]

@ide8
Copy link

@ide8 ide8 commented Oct 16, 2019

Although we have not got a chance to test the solution on EKS, the both extension should be workable on any standard K8S. Any test and feedback is welcomed.

@wsxiaozhang , trying to run it on EKS, but we can't edit a default scheduler there. Did you have a chance to try it on EKS?

@k82cn
Copy link
Member

@k82cn k82cn commented Jan 5, 2020

/wg machine-learning

@achimnol
Copy link

@achimnol achimnol commented Jan 6, 2020

As @sakjain92 noted, container-level fractional GPU sharing must be done in the lower level (e.g., accelerator drivers), but also the orchestration framework (k8s) must be able to handle fractional values for resource types.

I've been doing an open-source framework called Backend.AI to support such fractional sharing scenarios for containers. It comes with a "practical" implementation as a proprietary plugin to achieve container-level CUDA GPU virtualization by hooking the driver API. (The open-source version of the CUDA plugin works like the standard k8s CUDA plugin by assigning GPUs to containers device-by-device.) As being a commercialized product, it also supports "a virtualized view" of in-container nvidia-smi results because customers often ask about the pid/resource mismatches if it's not virtualized. The framework's batch-mode scheduler is plugin-customizable and it provides the DRF scheduler by default, which can handle GPU fractions natively.

Currently Backend.AI's CUDA plugin only imposes the maximum limits for the GPU memory amount and the number of streaming multiprocessors to each container, and I am very interested at achieving better isolation of CUDA kernels from mutliple tenants by rewriting/augmenting user-provided CUDA PTX binaries. This is inevitable because the CUDA's internal scheduling mechanism is not customizable by applications. In this perspective, the key ideas in the RTAS paper look promising but I think there are a lot of work to do. For instance, to make it a viable product that can be used by arbitrary customers, CUDA kernel augmentation must be done transparently and automatically for any kernel.

I hope my experience above in Backend.AI would help the k8s community to concretize the requirements for fractional GPU scaling in k8s. (Backend.AI also has plans to support k8s as a replacible backend for its future versions!)

@Eric-918
Copy link

@Eric-918 Eric-918 commented Mar 7, 2020

a solution for sharing one gpu among multiple containers, by wrapping interfaces of libcuda.so
u can create a pod A with 0.2 GPU ,and a pod B with 0.4 GPU, A only use 20% SMs, B only use 40% SMs.
https://github.com/tkestack/gpu-manager

@samedguener
Copy link

@samedguener samedguener commented Mar 7, 2020

@Eric-918 Thank you very much! I will look into this! Looks like very different approach of what I have seen before.

@Eric-918
Copy link

@Eric-918 Eric-918 commented Mar 16, 2020

@Eric-918 Thank you very much! I will look into this! Looks like very different approach of what I have seen before.

@samedguener Any problem about this solution,let me know,always be helpful?

@circum1
Copy link

@circum1 circum1 commented Apr 29, 2020

Just don't specify the GPU in resource limits/requests and all containers will have full access to all GPUs. The only downside is that you cannot enforce any limits.

@adampl I tried it out, but in that case nvidia-docker does not configure my container for using the gpu. How do you make nvidia-docker kick in in case you do not specify nvidia.com/gpu as a resource limit?

@adampl
Copy link

@adampl adampl commented Apr 29, 2020

@circum1 In order to use Nvidia GPUs from your pods, you will need:

  1. cuda-drivers and nvidia-container-runtime installed on the host
  2. nvidia-container-runtime configured as the default runtime in /etc/docker/daemon.json
  3. a CUDA-enabled docker image for your containers (if you want to use CUDA)
  4. NVIDIA_VISIBLE_DEVICES=all (all or selected GPU IDs) env variable set in the container
  5. CUDA_VISIBLE_DEVICES env variable may also be necessary (if you want to use CUDA)

Unfortunately, I don't know how it is with AMD and other GPUs.

@circum1
Copy link

@circum1 circum1 commented Apr 30, 2020

@adampl I do not manage the k8s cluster, but I guess points 1-2 are ok as gpu-using pods are working if I add nvidia.com/gpu: 1 to the limits.

I am using tensorflow/tensorflow:1.12.0-gpu-py3 image.

Deploying with the following deploy.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tf-test
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tf-test
  template:
    metadata:
      labels:
        app: tf-test
    spec:
      containers:
      - name: tf-test
        image: myrepo.com/tf-test:latest
        imagePullPolicy: Always
        command: ["sleep"]
        workingDir: /app
        args: ["100d"]
        resources:
          limits:
            nvidia.com/gpu: 1
      tolerations:
      - key: "compute"
        operator: "Equal"
        value: "gpu"
        effect: "NoSchedule"

nvidia is configured:

root@tf-test-59cdcc5cb7-8pjhc:/app# nvidia-smi 
Thu Apr 30 07:53:55 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P6000        On   | 00000000:00:05.0 Off |                    0 |
| 26%   44C    P8    10W / 250W |      0MiB / 22916MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

root@tf-test-59cdcc5cb7-8pjhc:/app# env
...
NVIDIA_VISIBLE_DEVICES=GPU-173e35e2-09e2-773e-c0d3-a0a0cc38bc3e
...

If I remove the resource part, and add env, then seemingly nvidia-docker does not configure my pod:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tf-test
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tf-test
  template:
    metadata:
      labels:
        app: tf-test
    spec:
      containers:
      - name: tf-test
        image: myrepo.com/tf-test:latest
        imagePullPolicy: Always
        command: ["sleep"]
        workingDir: /app
        args: ["100d"]
        env:
        - name: NVIDIA_VISIBLE_DEVICES
          value: "all"

Result:

root@tf-test-6cf67655d7-774v5:/app# nvidia-smi
bash: nvidia-smi: command not found

Checking the mounts also indicates that nvidia-docker did not configure the container.

Also it does not make a difference if I set the env var in the Dockerfile instead of deploy.yaml.

Do you have an idea what do I do wrong?

@circum1
Copy link

@circum1 circum1 commented Apr 30, 2020

Reply to myself: the "tolerations:" part is needed to put the pod to the right node... How wonderful would be if I had learn k8s instead of just trying to copy-paste SO answers from here and there :)

@adampl
Copy link

@adampl adampl commented Apr 30, 2020

@circum1 I guess nvidia-container-runtime is not set as the default Docker runtime in your cluster.

@jasonliu747
Copy link

@jasonliu747 jasonliu747 commented May 15, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.