It is quite difficult to manually select a model order that will perform well at forecasting a dataset. This is why Rob has built the 'auto.arima' function in his R forecast package, to figure out the model that may perform best based on certain metrics.
When you see a pacf plot with significantly negative lags that usually means you have over differenced your data. Try removing the 1st order difference and keeping the 12 order difference. Then carry on making your best guess.
I'd recommend trying his auto.arima function and passing it a time series object with frequency = 12. He has a good writeup of seasonal arima models here:
https://www.otexts.org/fpp/8/9
If you would like more insight into manually selecting a SARIMA model order, this is a good read:
https://onlinecourses.science.psu.edu/stat510/node/67
In response to your Edit: I think it would be beneficial to this post if you clarify your objective. Which of the following are you trying to achieve?
Find a model where residuals satisfy Ljung Box Test
Produce the most accurate out of sample forecast
Manually select lag orders such that ACF and PACF plots show no significant lags remaining.
In my opinion, #2 is the most sought after objective so I'll assume that is your goal. From my experience, #3 produces poor results out of sample. In regards to #1, I am usually not concerned about correlations remaining in the residuals. We know we do not have the true model for this time-series, so I do not feel there's any reason to expect an approximate model that performs well out of sample to not have left something behind in the residuals that is more complex perhaps, or nonlinear etc.
To provide you another SARIMA result, I ran this data through some code I've developed and found the following equation produced the minimal error on a cross-validation period.
Final model is:
SARIMA [0,1,1] [1,1,1]12 with a constant using the log normal of the time-series.
The errors in the cross validation period are:
MAPE = 16%
MAE = 0.46
RSQR = 74%
Here is the Partial Autocorrelation plot of the residuals for your information.
This is roughly similar in methodology to selecting an equation based on AICc to my understanding, but is ultimately a different approach. Regardless, if your objective is out of sample accuracy, I'd recommend evaluating equations in terms of their out of sample accuracy versus in-sample fit, tests, or plots.