This study evaluated an artificial neural network model that used historical production data from an electric arc furnace producing stainless steel as training to predict the end-point power demand of future heats. The electric arc furnace used in this study is an AC electric arc furnace with a scrap preheater using furnace gas as the heating medium.

### Statistical model

- Artificial Neural Network (ANN)

ANN is a node connected to multiple layers, where the weight of each node is equal to the number of nodes in the previous layer. The first layer is called the input layer, and its composition is equivalent to the number of input variables plus an optional deviation term. The next layer to the last layer is called the hidden layer, which enables the network to sort out the complex relationships inherent in the data. The more hidden layers, the more complex relationships that can be handled. However, the number of weights obtained also varies exponentially with the number of layers. This means that if there are too few data points, an ANN with multiple hidden layers cannot converge optimally. Therefore, it is important to optimize the topology by creating multiple ANN models with different numbers of nodes and hidden layers. This is commonly referred to as grid search, and is a common method of searching for the best parameter values for any given machine learning model. The last layer of the network is called the output layer, which is the last calculation before the predicted value is determined.

Because ANN has the ability to learn complex relationships, the model framework also has a strong tendency to overfit the training data. Overfitting means that the model achieves extremely high accuracy on the training data, but it is insufficient on the accuracy of the test data. In order to overcome this shortcoming, the training data is divided into training batches and verification batches. In the training phase, the model uses the training batch data to update the weights. After each iteration (weight update), the verification data is networked to calculate the error value. The model continues to update the weights until the error value of the verification data starts to increase or does not decrease in the tolerance value of the last iteration.

- Data cleaning

Data cleaning can be done through statistics or using domain knowledge. The statistical data cleaning method is abstract and cannot incorporate any knowledge domain into the cleaning calculation. For example, a statistical data cleaning method is to remove the top and bottom 2.5% of the data nodes in the input variables. However, using this method for all input variables in this step of cleaning will result in too few data nodes. This is why in the case of cleaning data nodes based on expected and actual values, cleaning up specific areas is preferred. In the metallurgical process, two examples are the removal of test heats or heats with abnormally long smelting time.

- Training and verification

All statistical models must be run-in (trained) and tested whether they can be used in practice. In the training phase, the coefficients of the model are adjusted by including the coefficients and the minimized error function of the data. In the testing phase, the model can predict data that has not been seen before, and the performance of the model is tested. If the error has essentially deviated during the testing phase, it is likely that the model has overfitted. When the model is too adapted to the training data and loses its “plasticity” and cannot predict new values well, it is overfitting, or it may be due to changes in the underlying process that the new data sets are processed differently. Take metallurgical examples, such as the installation of new burners or preheaters in the electric arc furnace process, which changes the energy dynamics of future heating. This is one of the reasons why domain expertise is important when validating statistical models.

- Model performance indicators

Common performance indicators for model evaluation include R2, standard error and absolute error. R2 is a measure of how well a model can capture potential differences in data. An R2 value equal to 1 is a perfect model, and an R2 value equal to or less than 0 is a model that cannot capture any variance in the model.

To measure the error of the model, you can use the standard error or the absolute error, and the standard error measurement value subtracts the true value from the predetermined value. Therefore, the standard error distinguishes between overestimated and underestimated forecasts. The absolute error is the standard error or the absolute value of the measured value, and does not distinguish between over- and under-predictions.

In the context of process metallurgy, the impact of overestimation is often very different from the impact of underestimation. This is why the standard error measurement is preferable when using statistical models in analytical process metallurgical applications.

### Explainable machine learning

Making machine learning models interpretable means that it is possible for humans to use the knowledge of the application domain (domain experts) to understand how the model is predictable. In this research, the focus will be on two algorithms: PI and SHAP.

- PI algorithm

PI: Each input variable in the model is randomly shuffled (replaced) one by one to break the hidden relationship between input and output variables. In each iteration, the model error of the replaced data is divided by the model error of the unreplaced data to generate the PI value. When all input variables are individually decomposed, the corresponding PI values are used to sort the variables from most important to least important. This method is quite intuitive. For input variables that do not improve the accuracy of the model, the error values of the non-permutation vector and the permutation vector are similar. In this case, the PI value will be close to zero, which indicates that the variable will hardly affect the model in its prediction. The small negative PI value is due to the randomness of the arrangement and because the value is close to zero. Although PI is intuitive, it only explains the importance of each feature relative to other features as the average of all predictions. Therefore, it is considered as an overall interpretability algorithm.

- SHAP algorithm

SHAP: This interpretation algorithm is based on the Shapley value in game theory, assigning important values to all features of each prediction. Therefore, SHAP is a local interpretability algorithm opposite to PI. SHAP assumes a base value, which is the predicted average (expected value). Before presenting any new information (ie value), it is natural to assume expected value. When the value from the input variable is presented, the Shapley value of the input variable for a specific instance is the difference between the new predicted value and the old predicted value, the Shapley value will be added to the base value, and the last added Shapley value determines the instance The output value. In this way, the specific contribution of all variables to the output of any single instance can be determined. By examining the positive or negative contribution of each variable, domain experts can assess whether the prediction is meaningful from the perspective of the domain. The positive and negative contributions of each variable are related to the average of the data used in the SHAP algorithm. For example, if the added scrap steel is higher than the average value, then the electrical energy should be higher. On the other hand, if the propane input is higher, the electrical energy should be lower. Any results that deviate from established science should be treated with suspicion and further exploration should be carried out. Due to the large number of samples required, calculating the Shapley value is very expensive. In order to reduce the amount of calculation, some approximate methods are proposed. This article uses the Kernel SHAP algorithm.

- Limitations

Both PI and SHAP are susceptible to related input characteristics. For SHAP, samples are extracted from the edge distribution of each feature. If any feature has a strong correlation with another feature, then the sampling value may be unrealistic in practice, because the marginal distribution in SHAP only accounts for a single feature in each iteration. When a feature is randomly replaced, so is PI. In addition, the importance of related features can be shared, resulting in lower PI values of related features. In order to explain the related features, a study was conducted in which the p-value of each feature was included in the PI analysis. However, for the SHAP algorithm, although some ideas have been proposed, the interpretation method of the relevant features of the unrealistic value has not been studied.

### method

Table 1 defines the variables used in the model. The total number of heats is 11,917, divided into 11,531 training heats and 386 test heats. Test heats are heats generated within 30 days after the training heats, accounting for 3.24% of the total number of data points. The number of test heats is the same for all models.

Table 1 defines the variables used in the model. The total number of heats is 11,917, divided into 11,531 training heats and 386 test heats. Test heats are heats generated within 30 days after the training heats, accounting for 3.24% of the total number of data points. The number of test heats is the same for all models.

The training data is divided into a training batch and a verification batch, accounting for 80% and 20% respectively. This means that 9225 furnaces are used in the training batch and 2306 furnaces are used in the verification batch. Using one and two hidden layers for grid search, the topology of the ANN model is determined. The number of nodes in each hidden layer is the same. The number of nodes is 1 to 24, and 48 topological structures are produced in the grid search. Each model is trained 10 times to illustrate the randomness of ANN model training. The model with the highest average R2 value and the lowest R2 standard deviation is considered the best model. The low R2 standard deviation means that the model parameters can ensure stability, because the variance of accuracy does not differ significantly between iterations. All 11917 furnace data will be used in PI and SHAP interpretability algorithms to provide as comprehensive analysis as possible on model interpretability and transparency. Python implements the use of PI and SHAP.

### Results and discussion

The ANN model uses three models: A1, A2, and A3, and five variables: tPON, Wtot, VPropane, Vo2_1, and Vo2_b. The variable tPON explains most of the accuracy of the model. Compared with the other 4 input variables, it predicts EEl better. The average error is reduced by more than 300kWh, while the standard deviation error is slightly reduced. The R2 value increased by 0.09, which means that the variance in the data captured by tPON alone is better than the combined 5 input variables. Using the other 4 input variables (excluding tPON) can reduce the absolute average error by 119 kWh, but the standard deviation error And the maximum error was increased from 1334 kWh to 2791 kWh, and from 7057 kWh to 16576 kWh. Since the R2 value is 0.05, the model cannot capture the inherent variance of the data. Observing the A1 model, it is obvious that tPON has a great influence on the prediction. Other input variables account for only 1% of the tPON effect at most. This is not surprising, because the total power-on time has a linear relationship with power consumption. If the power output of all heats are similar, the accuracy of the model is high. If the power output of each furnace is different, the accuracy of the model will be reduced. Since the steel mill’s goal is to achieve the highest productivity, that is, the smelting time is short, it can be considered that the power output of most furnaces is close to the maximum and similar. Therefore, the power-on time should not be part of the input variable. The PI values of all 11,917 heats were calculated using the ANN model. The value of each model is normalized with the highest PI value. Therefore, the PI value of the most important variable is equal to 1.

Apply SHAP to all 11,917 heats of model Al, and select 5 heats (A, B, C, D, E) in the test set for further analysis. The input and output variable values, predictions and errors of 5 selected test heats are calculated.

The model regards tPON as an important contributor to the EPred of heats A to E, which is consistent with the PI value. In addition, tPON is higher than the average of heats A to D and is correctly interpreted as contributing a higher EPred. For heat E, tPON is lower than the average, and the lower EPred is correctly obtained. Since the absolute difference from the average value matches the larger EPred, it can be qualitatively considered that the size of the EPred is correct. VPropane promotes lower EPred for heats A to D because this value is higher than the average value of VPropane. More propane helps increase energy through exothermic chemical reactions, thereby reducing the need for electrical energy. For heat E, the situation is the opposite, and the model explains that less propane contributes higher electrical energy.

The increase in the use of scrap steel has led to an increase in energy demand. The model correctly explains this, because the EPreds of heats A and C get positive contributions from Wtot, and the EPreds of heats B, D, and E get negative contributions from Wtot.

Increasing the amount of injected oxygen, whether through the spray gun or through the excess metering ratio of oxygen/propane in the burner, should intuitively reduce the demand for electrical energy, because the oxidation of elements in the molten steel will increase additional energy. The model interprets the decrease in Vo2_1 as an increase in the electric energy of heats B to E, but the increase in Vo2_1 of heat A erroneously leads to an increase in electric energy. This may be due to model errors, because Vo2_1 is not closely related to any other input features.

Vo2_b will not cause a drop in electric energy, because the amount of Vo2_b added for heats A to D is higher than the average. The Vo2_b of Heat E is lower than the average value, but it still erroneously leads to a decrease in EPred. The reason may be that Vo2_b is highly correlated with VPropane, which introduces some uncertainty in SHAP. In addition, since the ratio of propane to oxygen is about 4.7:1 on average, the addition of oxygen should not be excessive. 5:1 is the ideal stoichiometric ratio. Without adding a large amount of oxygen to the melt, the use of oxygen is completely Burn propane. This means that Vo2_b is a variable and cannot be used as additional useful information, so it should be deleted.