Skip to content Skip to sidebar Skip to footer

Tensorflow Gpu Epoch Optimization?

So this code works, and it gives me a 2x boost over CPU only, but I think its possible to get it faster. I think the issue boils down to this area... for i in tqdm(range(epochs),

Solution 1:

The overhead of session.run is around 100 usec, so if you do 10k steps, this overhead adds around 1 second. If this is significant, then you are doing many small iterations, and are incurring extra overhead in other places. IE, GPU kernel launch overhead is 5x larger than CPU (5 usec vs 1 usec).

Using feed_dict is probably a bigger problem and you could speed things up by using queues/input pipelines.

Also, a robust way to figure out where you are spending time is to profile. IE, to figure out what fraction of time is due to your for loop, you can do cProfile as follows.

python -m cProfile -o timing.prof myscript.py
snakeviz  timing.prof

To figure out where the time goes inside of TensorFlow run, you can do timeline profiling as described here

Post a Comment for "Tensorflow Gpu Epoch Optimization?"