I have built a 3 layer neural network to perform a binary mapping (2016 inputs, 288 outputs.) I am getting decent results with mean square error and stochastic gradient decent. My question is: Is there a more appropriate loss function for regression when the output is binary?