How To Reshape 3 Channel Dataset For Input To Neural Network

January 03, 2024 Post a Comment

I am trying to feed kth action dataset to a cnn. I am having difficulty with reshaping the data. I have created this array (99,75,120,160) type=uint8 ie, 99 videos belonging to a c

Solution 1:

A couple of things:

The TimeDistributed layer in Keras needs a time dimension, so for video image processing this could be 75 here (the frames).

It also expects images to be sent in shape (120, 60, 3). So the TimeDistributed layer input_shape should be (75, 120, 160, 3). 3 stands for the RGB channels. If you have greyscale images, 1 as the last dimension should work.

The input_shape always ignores the "row" dimension of your examples, in your case 99.

To check the output shapes created by each layer of the model, put model.summary() after compiling it.

See: https://www.tensorflow.org/api_docs/python/tf/keras/layers/TimeDistributed

You can convert images into numpy arrays with shape (X, Y, 3) using Keras.preprocessing.image.

from keras.preprocessing import image

# loads RGB image as PIL.Image.Image type
img = image.load_img(img_file_path, target_size=(120, 160))
# convert PIL.Image.Image type to 3D tensor with shape (120, 160, 3)
x = image.img_to_array(img)

Update: It seems the reason you had to make all images squared (128,128,1) is that in model.fit(), training examples (x_train) and labels (normally y_train) are the same set. If you look at the model summary below, after the Flatten layer everything becomes a square. It is therefore expecting labels to be squares. It makes sense: using this model for prediction would transform a (120,160,1) image into something of the shape (128, 128, 1). Changing model training to below code should therefore work:

x_train = random.random((90, 5, 120, 160, 1)) # training data
y_train = random.random((90, 5, 128, 128, 1)) # labels
model.fit(x_train, y_train)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
time_distributed_1 (TimeDist (None, 5, 120, 160, 64)   320       
_________________________________________________________________
time_distributed_2 (TimeDist (None, 5, 60, 80, 64)     0         
_________________________________________________________________
time_distributed_3 (TimeDist (None, 5, 60, 80, 32)     18464     
_________________________________________________________________
time_distributed_4 (TimeDist (None, 5, 30, 40, 32)     0         
_________________________________________________________________
time_distributed_5 (TimeDist (None, 5, 30, 40, 16)     4624      
_________________________________________________________________
time_distributed_6 (TimeDist (None, 5, 15, 20, 16)     0         
_________________________________________________________________
time_distributed_7 (TimeDist (None, 5, 4800)           0         
_________________________________________________________________
lstm_1 (LSTM)                (None, 5, 64)             1245440   
_________________________________________________________________
time_distributed_8 (TimeDist (None, 5, 8, 8, 1)        0         
_________________________________________________________________
time_distributed_9 (TimeDist (None, 5, 16, 16, 1)      0         
_________________________________________________________________
time_distributed_10 (TimeDis (None, 5, 16, 16, 16)     160       
_________________________________________________________________
time_distributed_11 (TimeDis (None, 5, 32, 32, 16)     0         
_________________________________________________________________
time_distributed_12 (TimeDis (None, 5, 32, 32, 32)     4640      
_________________________________________________________________
time_distributed_13 (TimeDis (None, 5, 64, 64, 32)     0         
_________________________________________________________________
time_distributed_14 (TimeDis (None, 5, 64, 64, 64)     18496     
_________________________________________________________________
time_distributed_15 (TimeDis (None, 5, 128, 128, 64)   0         
_________________________________________________________________time_distributed_16 (TimeDis (None, 5, 128, 128, 1)    577       
=================================================================
Total params: 1,292,721
Trainable params: 1,292,721
Non-trainable params: 0

Update 2: To make it work with non-square images without changing y, set LSTM(300), Reshape(15, 20, 1), and you remove one of the Conv2D + Upsampling layers afterwards. Then you can use pictures with shape (120,160) even in an autoencoder. The trick is to look at the model summary, and make sure after the LSTM you start with the right shape so that after adding all the other layers, the end result is a shape of (120,160).

model = Sequential()
model.add(
    TimeDistributed(Conv2D(64, (2, 2), activation="relu", padding="same"), =(5, 120, 160, 1)))

model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))

model.add(TimeDistributed(Flatten()))
model.add(LSTM(units=300, return_sequences=True))

model.add(TimeDistributed(Reshape((15, 20, 1))))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(16, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(32, (3, 3), activation='relu', padding='same')))
model.add(TimeDistributed(UpSampling2D((2, 2))))
model.add(TimeDistributed(Conv2D(1, (3, 3), padding='same')))


model.compile(optimizer='adam', loss='mse')

model.summary()

x_train = random.random((90, 5, 120, 160, 1))
y_train = random.random((90, 5, 120, 160, 1))

model.fit(x_train, y_train)

Solution 2:

Thanks to Mr.Kai Aeberli for his assistance. I was able to run the model after resizing the image to 128x128 dimension.The size of dataset may cause system to crash in absence of gpu. Reduce size as necessary. Please refer to the whole comment section if you have doubts. You can find the code here in github

Python Tutorial for Beginners

How To Reshape 3 Channel Dataset For Input To Neural Network

Solution 1:

Solution 2:

Post a Comment for "How To Reshape 3 Channel Dataset For Input To Neural Network"