How To Reshape 3 Channel Dataset For Input To Neural Network
Solution 1:
A couple of things:
The TimeDistributed layer in Keras needs a time dimension, so for video image processing this could be 75 here (the frames).
It also expects images to be sent in shape (120, 60, 3). So the TimeDistributed layer input_shape should be (75, 120, 160, 3). 3 stands for the RGB channels. If you have greyscale images, 1 as the last dimension should work.
The input_shape always ignores the "row" dimension of your examples, in your case 99.
To check the output shapes created by each layer of the model, put model.summary()
after compiling it.
See: https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed
You can convert images into numpy arrays with shape (X, Y, 3) using Keras.preprocessing.image.
from keras.preprocessing import image
# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 160))
# convert PIL.Image.Image type to 3D tensor with shape (120, 160, 3)
x = image.img_to_array(img)
Update: It seems the reason you had to make all images squared (128,128,1) is that in model.fit(), training examples (x_train) and labels (normally y_train) are the same set. If you look at the model summary below, after the Flatten layer everything becomes a square. It is therefore expecting labels to be squares. It makes sense: using this model for prediction would transform a (120,160,1) image into something of the shape (128, 128, 1). Changing model training to below code should therefore work:
x_train = random.random((90, 5, 120, 160, 1)) # training data
y_train = random.random((90, 5, 128, 128, 1)) # labels
model.fit(x_train, y_train)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
time_distributed_1 (TimeDist (None, 5, 120, 160, 64) 320
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 60, 80, 64) 0
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 60, 80, 32) 18464
_________________________________________________________________
time_distributed_4 (TimeDist (None, 5, 30, 40, 32) 0
_________________________________________________________________
time_distributed_5 (TimeDist (None, 5, 30, 40, 16) 4624
_________________________________________________________________
time_distributed_6 (TimeDist (None, 5, 15, 20, 16) 0
_________________________________________________________________
time_distributed_7 (TimeDist (None, 5, 4800) 0
_________________________________________________________________
lstm_1 (LSTM) (None, 5, 64) 1245440
_________________________________________________________________
time_distributed_8 (TimeDist (None, 5, 8, 8, 1) 0
_________________________________________________________________
time_distributed_9 (TimeDist (None, 5, 16, 16, 1) 0
_________________________________________________________________
time_distributed_10 (TimeDis (None, 5, 16, 16, 16) 160
_________________________________________________________________
time_distributed_11 (TimeDis (None, 5, 32, 32, 16) 0
_________________________________________________________________
time_distributed_12 (TimeDis (None, 5, 32, 32, 32) 4640
_________________________________________________________________
time_distributed_13 (TimeDis (None, 5, 64, 64, 32) 0
_________________________________________________________________
time_distributed_14 (TimeDis (None, 5, 64, 64, 64) 18496
_________________________________________________________________
time_distributed_15 (TimeDis (None, 5, 128, 128, 64) 0
_________________________________________________________________time_distributed_16 (TimeDis (None, 5, 128, 128, 1) 577
=================================================================
Total params: 1,292,721
Trainable params: 1,292,721
Non-trainable params: 0
Update 2: To make it work with non-square images without changing y, set LSTM(300), Reshape(15, 20, 1), and you remove one of the Conv2D + Upsampling layers afterwards. Then you can use pictures with shape (120,160) even in an autoencoder. The trick is to look at the model summary, and make sure after the LSTM you start with the right shape so that after adding all the other layers, the end result is a shape of (120,160).
model = Sequential()
model.add(
TimeDistributed(Conv2D(64, (2, 2), activation="relu", padding="same"), =(5, 120, 160, 1)))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences=True))
model.add(TimeDistributed(Reshape((15, 20, 1))))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(1, (3, 3), padding='same')))
model.compile(optimizer='adam', loss='mse')
model.summary()
x_train = random.random((90, 5, 120, 160, 1))
y_train = random.random((90, 5, 120, 160, 1))
model.fit(x_train, y_train)
Solution 2:
Thanks to Mr.Kai Aeberli for his assistance. I was able to run the model after resizing the image to 128x128 dimension.The size of dataset may cause system to crash in absence of gpu. Reduce size as necessary. Please refer to the whole comment section if you have doubts. You can find the code here in github
Post a Comment for "How To Reshape 3 Channel Dataset For Input To Neural Network"