Why Keras raises shape error in last dense layer?

I built a simple NN to distinguish integers from decimals, my input data is 1 dimensional array,and the final output should be the probability of integer. At first, I succeeded when last layer(name:output) had 1 unit. But it raised ValueError when I changed the last dense layer to two units,for I wanted to output both probabilities of number x as integer and decimal.

from tensorflow.python.keras.models import Sequential,load_model
from tensorflow.python.keras.utils import np_utils
from tensorflow.python.keras.layers import Dense
from tensorflow.python.keras.layers import Activation
from tensorflow import keras
import numpy as np
import tensorflow as tf
from sklearn.utils import shuffle
def train():
    t=[]
    a=[]
    for i in range (0,8000):    #generate some training data
        ran=np.random.randint(2)
        if(ran==0):
            y=np.random.uniform(-100,100)
            t.append(y)
            a.append(0)
        else:
            y=np.random.randint(1000)
            t.append(y)
            a.append(1)
    t=np.asarray(t)
    a=np.asarray(a)
    pt=t.reshape(-1,1)  #reshape for fit()
    pa=a.reshape(-1,1)
    pt,pa=shuffle(pt,pa)
   

    model=Sequential()
    dense=Dense(units=32,input_shape=(1,),activation='relu')
    dense2=Dense(units=64,activation='relu')
    output=Dense(units=2,activation='softmax')   # HERE is the problem
    
    model.add(dense)
    model.add(dense2)
    model.add(output)
    model.summary()
    
    model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['accuracy'])
    model.fit(pt,pa,validation_split=0.02,batch_size=10, epochs=50, verbose=2)
    model.save('integer_predictor.h5')

train()

ValueError: Error when checking target: expected dense_2 to have shape (2,) but got array with shape (1,)

Solution

This should solve your problem

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',metrics=['accuracy'])

Since you have 2 outputs, you cant use binary cross_entropy since its a 2 class classification problem. Also, when your inputs are not one-hot encoded you will need sparse_categorical_crossentropy. If you have one hot features then categorical_crossentropy will work with outputs > 1.

Read this to get more insight into this.