Interpretation of nonlinear relationships between process variables by use of random forests
Variable importance measures derived from a random forest model of the throughput of a calcium carbide furnace depending on nine process variables. The dummy variable (No. 10) is shown in red, with the dashed red line indicating the upper 95% confidence limit of the significance of the process varia...
Saved in:
Published in: | Minerals engineering Vol. 35; pp. 27 - 42 |
---|---|
Main Authors: | , |
Format: | Journal Article |
Language: | English |
Published: |
Elsevier Ltd
01-08-2012
|
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Variable importance measures derived from a random forest model of the throughput of a calcium carbide furnace depending on nine process variables. The dummy variable (No. 10) is shown in red, with the dashed red line indicating the upper 95% confidence limit of the significance of the process variables. [Display omitted]
► Random forest models can be used to interpret complex process or plant data. ► With dummy variables, the significance of explanatory variables can be assessed. ► Reliable analysis is possible, despite significant additive noise in the data.
Better understanding of process phenomena is dependent on the interpretation of models capturing the relationships between the process variables. Although linear regression is used routinely in the mineral process industries for this purpose, it may not be useful where the relationships between variables are nonlinear or complex. Under these circumstances, nonlinear methods, such as neural networks or decision trees can be used to develop reliable models, without necessarily giving any particular or explicit insight into the relationships between the process and the target variables. This is a major drawback in situations where such information would be very important, such as in fault identification or gaining a better understanding of the fundamentals of a process.
In this paper, the use of variable importance measures and partial dependency plots generated by random forest models are proposed as a practical tool that can be used to surmount this problem. In particular, it is shown that important variables can be flagged by appropriate threshold generated by inclusion of dummy variables in the system. Moreover, the results of the study indicate that random forest models can reliably identify the influence of individual variables, even in the presence of high levels of additive noise. This would make it a useful tool in continuous process improvement and root cause analysis of abnormal process behaviour. |
---|---|
ISSN: | 0892-6875 1872-9444 |
DOI: | 10.1016/j.mineng.2012.05.008 |