Tensorflow Gpu Epoch Optimization?
So this code works, and it gives me a 2x boost over CPU only, but I think its possible to get it faster. I think the issue boils down to this area... for i in tqdm(range(epochs),
Solution 1:
The overhead of session.run
is around 100 usec, so if you do 10k steps, this overhead adds around 1 second. If this is significant, then you are doing many small iterations, and are incurring extra overhead in other places. IE, GPU kernel launch overhead is 5x larger than CPU (5 usec vs 1 usec).
Using feed_dict
is probably a bigger problem and you could speed things up by using queues/input pipelines.
Also, a robust way to figure out where you are spending time is to profile.
IE, to figure out what fraction of time is due to your for
loop, you can do cProfile as follows.
python -m cProfile -o timing.prof myscript.py
snakeviz timing.prof
To figure out where the time goes inside of TensorFlow run
, you can do timeline profiling as described here
Post a Comment for "Tensorflow Gpu Epoch Optimization?"