Clearing Tensorflow GPU memory after model execution

Posted on

Question :

Clearing Tensorflow GPU memory after model execution

I’ve trained 3 models and am now running code that loads each of the 3 checkpoints in sequence and runs predictions using them. I’m using the GPU.

When the first model is loaded it pre-allocates the entire GPU memory (which I want for working through the first batch of data). But it doesn’t unload memory when it’s finished. When the second model is loaded, using both tf.reset_default_graph() and with tf.Graph().as_default() the GPU memory still is fully consumed from the first model, and the second model is then starved of memory.

Is there a way to resolve this, other than using Python subprocesses or multiprocessing to work around the problem (the only solution I’ve found on via google searches)?

Answer #1:

A git issue from June 2016 (https://github.com/tensorflow/tensorflow/issues/1727) indicates that there is the following problem:

currently the Allocator in the GPUDevice belongs to the ProcessState,
which is essentially a global singleton. The first session using GPU
initializes it, and frees itself when the process shuts down.

Thus the only workaround would be to use processes and shut them down after the computation.

Example Code:

import tensorflow as tf
import multiprocessing
import numpy as np

def run_tensorflow():

    n_input = 10000
    n_classes = 1000

    # Create model
    def multilayer_perceptron(x, weight):
        # Hidden layer with RELU activation
        layer_1 = tf.matmul(x, weight)
        return layer_1

    # Store layers weight & bias
    weights = tf.Variable(tf.random_normal([n_input, n_classes]))


    x = tf.placeholder("float", [None, n_input])
    y = tf.placeholder("float", [None, n_classes])
    pred = multilayer_perceptron(x, weights)

    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    optimizer = tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)

    init = tf.global_variables_initializer()

    with tf.Session() as sess:
        sess.run(init)

        for i in range(100):
            batch_x = np.random.rand(10, 10000)
            batch_y = np.random.rand(10, 1000)
            sess.run([optimizer, cost], feed_dict={x: batch_x, y: batch_y})

    print "finished doing stuff with tensorflow!"


if __name__ == "__main__":

    # option 1: execute code with extra process
    p = multiprocessing.Process(target=run_tensorflow)
    p.start()
    p.join()

    # wait until user presses enter key
    raw_input()

    # option 2: just execute the function
    run_tensorflow()

    # wait until user presses enter key
    raw_input()

So if you would call the function run_tensorflow() within a process you created and shut the process down (option 1), the memory is freed. If you just run run_tensorflow() (option 2) the memory is not freed after the function call.

Answered By: Oliver Wilken

Answer #2:

You can use numba library to release all the gpu memory

pip install numba 
from numba import cuda 
device = cuda.get_current_device()
device.reset()

This will release all the memory

Answered By: hitesh kumar

Answer #3:

I use numba to releae gpu, with tensorflow I can not find a effect method.

import tensorflow as tf
from numba import cuda

a = tf.constant([1.0,2.0,3.0],shape=[3],name='a')
b = tf.constant([1.0,2.0,3.0],shape=[3],name='b')
with tf.device('/gpu:1'):
    c = a+b

TF_CONFIG = tf.ConfigProto(
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.1),
  allow_soft_placement=True)

sess = tf.Session(config=TF_CONFIG)
sess.run(tf.global_variables_initializer())
i=1
while(i<1000):
        i=i+1
        print(sess.run(c))

sess.close() # if don't use numba,the gpu can't be released
cuda.select_device(1)
cuda.close()
with tf.device('/gpu:1'):
    c = a+b

TF_CONFIG = tf.ConfigProto(
gpu_options=tf.GPUOptions(per_process_gpu_memory_fraction=0.5),
  allow_soft_placement=True)

sess = tf.Session(config=TF_CONFIG)

sess.run(tf.global_variables_initializer())
while(1):
        print(sess.run(c))
Answered By: TanLingxiao

Answer #4:

Now there seem to be two ways to resolve the iterative training model or if you use future multipleprocess pool to serve the model training, where the process in the pool will not be killed if the future finished. You can apply two methods in the training process to release GPU memory meanwhile you wish to preserve the main process.

  1. call a subprocess to run the model training. when one phase training completed, the subprocess will exit and free memory. It’s easy to get the return value.
  2. call the multiprocessing.Process(p) to run the model training(p.start), and p.join will indicate the process exit and free memory.

Here is a helper function using multiprocess.Process which can open a new process to run your python written function and reture value instead of using Subprocess,

# open a new process to run function
def process_run(func, *args):
    def wrapper_func(queue, *args):
        try:
            logger.info('run with process id: {}'.format(os.getpid()))
            result = func(*args)
            error = None
        except Exception:
            result = None
            ex_type, ex_value, tb = sys.exc_info()
            error = ex_type, ex_value,''.join(traceback.format_tb(tb))
        queue.put((result, error))

    def process(*args):
        queue = Queue()
        p = Process(target = wrapper_func, args = [queue] + list(args))
        p.start()
        result, error = queue.get()
        p.join()
        return result, error  

    result, error = process(*args)
    return result, error
Answered By: liviaerxin

Answer #5:

GPU memory allocated by tensors is released (back into TensorFlow memory pool) as soon as the tensor is not needed anymore (before the .run call terminates). GPU memory allocated for variables is released when variable containers are destroyed. In case of DirectSession (ie, sess=tf.Session(“”)) it is when session is closed or explicitly reset (added in 62c159ff)

Answered By: Yaroslav Bulatov

Answer #6:

I am figuring out which option is better in the Jupyter Notebook. Jupyter Notebook occupies the GPU memory permanently even a deep learning application is completed. It usually incurs the GPU Fan ERROR that is a big headache. In this condition, I have to reset nvidia_uvm and reboot the linux system regularly. I conclude the following two options can remove the headache of GPU Fan Error but want to know which is better.

Environment:

  • CUDA 11.0
  • cuDNN 8.0.1
  • TensorFlow 2.2
  • Keras 2.4.3
  • Jupyter Notebook 6.0.3
  • Miniconda 4.8.3
  • Ubuntu 18.04 LTS

First Option

Put the following code at the end of the cell. The kernel immediately ended upon the application runtime is completed. But it is not much elegant. Juputer will pop up a message for the died ended kernel.

import os
 
pid = os.getpid()
!kill -9 $pid

Section Option

The following code can also end the kernel with Jupyter Notebook. I do not know whether numba is secure. Nvidia prefers the “0” GPU that is the most used GPU by personal developer (not server guys). However, both Neil G and mradul dubey have had the response: This leaves the GPU in a bad state.

from numba import cuda

cuda.select_device(0)
cuda.close()

It seems that the second option is more elegant. Can some one confirm which is the best choice?

Notes:

It is not such the problem to automatically release the GPU memory in the environment of Anaconda by direct executing “$ python abc.py”. However, I sometimes need to use Jyputer Notebook to handle .ipynb application.

Answered By: Mike Chen

Leave a Reply

Your email address will not be published.