Planetary gearbox is one of the most widely used core parts in heavy machinery. Once it breaks down, it can lead to serious accidents and economic loss. Induction motor current signal analysis (MCSA) is a noninvasive method that uses current to detect faults. Currently, most MCSA-based fault diagnosis studies focus on the parallel shaft gearbox. However, there is a paucity of studies on the planetary gearbox. The effect of various signal processing methods on motor current and the performance of different machine learning models are rarely compared. Therefore, fault diagnosis of planetary gearbox based MCSA is conducted in this study. First, the effects of various faults on motor currents are studied. Specifically, the characteristic frequencies of a fault in sun/planet/ring gears and supporting bearings of the planetary gearbox are derived. Then, a signal preprocessing method, namely, singular spectrum analysis (SSA), is proposed to remove the supply frequency component in the current signal. Subsequently, four classical machine learning models, including the support vector machine (SVM), decision tree (DT), random forest (RF), and AdaBoost, are used for fault classifications based on the features extracted via principal component analysis (PCA). The convolutional neural network (CNN), which can automatically extract features, is also adopted. The dynamic experiment of the planetary gearbox with seven types of faults, including tooth chipping in sun/planet/ring gears, inner race spall in planet bearing, inner/outer races, and ball spalls in input support bearing, is conducted. Raw current signal in the time domain, reconstructed signal by SSA, and the current spectra in the frequency domain are used as the inputs of various models. The classification results show that the PCA-SVM is the best model for learned data while CNN is the best model for unlearned data on average. Furthermore, SSA mainly increases the accuracy of CNN in the time domain and exhibits a positive effect on unlearned data in the time domain. The classification accuracy increases significantly after transforming the time domain current data to the frequency domain.

A planetary gearbox exhibits the characteristics of a compact structure with a large transmission ratio and high transmission efficiency [

A fault in rotating machinery equipment affects the motor current via torque transmission [

Yilmaz and Ayaz [

In terms of research methods, demodulation, wavelet transform [

To extract representative features from signals, the methods of intelligent fault diagnosis mentioned above are time-consuming. To a great extent, the methods depend on the prior knowledge of signal processing technology and rotor dynamics. However, unsupervised learning and deep learning represented by neural networks reduce the requirement of prior knowledge because of their ability to automatically extract features and lead to direct diagnoses of the original signal or simply processed signal. Shen et al. [

From the above review, it is confirmed that the existing research on MCSA-based fault diagnosis of planetary gearboxes is relatively less, and the performance of different feature extraction methods and classification models should be examined. Therefore, fault diagnosis of planetary gearbox based MCSA is conducted in this study. All the techniques adopted in this paper are common, and one of the aims of this study is to aid engineers and researchers in selecting an appropriate strategy for practical planetary gearbox fault detection via discussing preprocessing, feature extraction, and classification models. After the effects of various fault on characteristic frequencies of motor current are studied, a signal preprocessing method, singular spectrum analysis (SSA), is proposed to remove the supply frequency component and obtain a reconstructed signal. Subsequently, a convolutional neural network (CNN) and four classical machine learning models are selected to determine the type of fault. In this case, principal component analysis (PCA) is performed to extract features as the inputs of various machine learning models. Thus, dynamic experiments on planetary gearbox with seven types of faults, including tooth chipping in sun/planet/ring gears, inner race spall in planet bearing, inner/outer races, and ball spalls in input support bearing, are conducted. Finally, a few conclusions are summarized at the end.

A typical planetary gearbox structure is shown in Figure

Schematic diagram of the planetary gearbox.

Therefore, the load torque under fault condition can be expressed as the sum of a fixed torque and periodic oscillation torque due to fault. The periodic oscillation torque can be expanded via Fourier expansion. In the case of neglecting the higher harmonics,

In the planetary gearbox, the fault characteristic frequency of sun gear, planet gear, and ring gear can be expressed as follows [

In addition to gears, rolling element bearings are also used to support the rotary components of the planetary gearbox. By considering the rolling bearing that supports the sun gear, as an example, the fault characteristic frequencies of the outer and inner race and the ball are shown, respectively, as follows:

Under normal working conditions, the load torque of the motor is in balance with the output torque of the motor. As long as the fault appears, the resultant torque that acts on the mechanical system is

The rotor phase

By considering the slip between the synchronous speed of the stator

The magnetomotive force of the stator is not affected by the fault. Hence, it can be directly obtained as follows:

The airgap flux density

In practice,

It is shown that the stator current is composed of two parts after the occurrence of fault. The first part is from the stator and the second part is from the oscillation of the rotor. Thus, the fault frequency can be expressed as follows:

In the current signal, the supply frequency of the current is the dominant component, while the fault characteristic frequency components are usually weak. Hence, it is necessary to separate the different components of the current signal via preprocessing methods.

Singular spectrum analysis (SSA) is often used to deal with the nonlinear time series data [

The first step of SSA for reconstruction is to transform time series

It is assumed that a time series

The second step of SSA for reconstruction involves performing a singular value decomposition (SVD) on the trace matrix

Given that the elements on each inverse diagonal of

After the trace matrix of each component is obtained, different types of signal

Principal component analysis (PCA) is a classical dimension reduction method in data mining. When a group of data has more than one variable, the problem becomes complicated. Typically, there is a certain correlation between these variables, and the information reflected by the two (or more) variables may be redundant. In this case, PCA uses one variable to reflect multiple variables with high similarity and thereby reduces the number of variables.

In signal processing, it is considered that the principal component has a larger variance and the noise exhibits a smaller variance. A set of zero-mean signal samples

Furthermore,

With the Lagrange multiplier method, the optimization problem can be transformed into matrix eigenvalue decomposition problem. The eigenvalues of the covariance matrix

Given that the covariance matrix is a real symmetric matrix, the unit eigenvectors are orthogonal to each other. Multiplying this eigenmatrix

Four classical machine learning models, including support vector machine (SVM), decision tree (DT), random forest (RF), and AdaBoost, are adopted in this study. They are realized with Scikit-learn, an open-source machine learning library. The structure and principle of these models are briefly introduced as follows:

The SVM has been widely used in classification and regression problems due to its good robustness and strong generalization ability for unknown data. The core idea of SVM is to find a hyperplane, which can divide the samples linearly in space based on the training set samples. The samples that are closest to the hyperplane are called support vectors. The optimization direction of the model involves maximizing the distance between the support vector and hyperplane. When it is difficult to divide the low dimensional space linearly, SVM can use the kernel function for mapping the data to the high dimensional space to find the hyperplane. The main parameters of SVM in programming are kernel function, kernel coefficient gamma, and regularization parameter C. Usually, the kernel function is Gauss kernel function. As for gamma, if it is too large, the variance of Gaussian distribution is too small. Hence, the model can only operate on the support vector sample, which may lead to overfitting. Conversely, if gamma is too small, then the Gaussian distribution will be too smooth, resulting in the underfitting of training results. As for C, if it is too large, then the model increases the punishment in case of classification error, which leads to an overfit model.

The structure of DT is similar to that of a tree. Each internal node represents a judgment on the attribute, each branch represents an output of the judgment result, and, finally, each leaf node represents a classification result. The priority among different attributes is determined by the contribution of entropy. The main parameters of the decision tree are the criteria function, maximum depth, minimum number of samples for node subdivision, and minimum number of samples that are required at a leaf node. Furthermore, criteria function is used for calculating the entropy. Other parameters are used to determine the structure of the tree.

RF and AdaBoost are ensemble models that are the integration of traditional machine learning algorithms. Their core idea involves combining multiple weak classifiers to obtain a strong classifier. When compared with a single classifier, it reduces the possibility of overfitting the model. In this study, the base classifier for both types of ensemble models is the decision tree.

RF establishes multiple weak classifiers that are independent and equal. When a new sample is input, the voting method is used to select its category. Each classifier is trained to obtain the combined model.

AdaBoost uses an adaptive method to iteratively learn each weak classifier when a new sample is input. Each iteration increases the sample weight of the last error classification, reduces the sample weight of the correct classification, combines multiple classifiers into a strong classifier linearly, and provides a larger weight to the weak classifier with a lower error rate.

The neural network differs from the classical machine learning model because it represents the learning method that can automatically extract and select features from the data.

For a given sample

One-dimensional convolutional neural network.

In the convolutional layer, features are extracted from the input data:

The convolution kernel performs the convolution operation by window sliding on the input signal. Each convolutional layer can use multiple convolution kernels to extract different features from the current input. The purpose of the activation function is to introduce nonlinearity into the output of a neuron. If the activation function is not used, then all the neurons are linear combined. The commonly used activation functions are rectiﬁed linear unit (ReLU) and sigmoid:

Subsequently, the feature obtained via convolution enters the pooling layer. The pooling operation is also realized by sliding a certain length of window on feature maps. When the window slides, the average or maximum value in each window is calculated to form a new feature map. The essence of pooling is actually a resampling operation, which reduces the amount of data.

Suppose that a signal sample input into the rollup layer is

By alternately stacking multiple convolution operations and pooling layers, the overall structure of convolutional layers can be obtained. The output feature map of the last convolutional layer is directly flattened into a one-dimensional vector and input into the fully connected layer. For the fully connected layer, the operation process of each layer can be expressed as follows:

The length of the last output vector from the fully connected layer is equal to the number of categories

With respect to the training process, the cross-entropy loss function is used to measure the error between the predicted and real fault type because this is a classification problem. The cross-entropy loss function is as follows:

The test rig, as shown in Figure

Test bench and seven different types of faults.

Seven types of faults have been seeded in the test gearbox. They are chipping teeth in sun/planet/ring gears (SG, PG, RG), inner race spall in planet bearing (PG-IR), inner/outer races, and ball spalls in input support bearing (IR, OR, B). The pictures of fault components are shown in Figure

A current sensor was utilized to acquire the phase A current signal of the motor. There are five types of torque loads in the system that run under each type of fault, and the experiments are conducted four times under each operating condition (see Table

Five operating conditions.

Torque load (N·m) | 0 | 2.6 | 5.25 | 7.8 | 10.5 |
---|---|---|---|---|---|

File number | 101, 102, 103, 104 | 201, 202, 203, 204 | 301, 302, 303, 304 | 401, 402, 403, 404 | 501, 502, 503, 504 |

In this study, 20 s data from two files under torque loads of 0 N·m, 5.25 N·m, and 10.5 N·m are selected to build the model. They are split by a window of size 0.1 s. Thus, a total of 9600 samples are acquired. The 9600 samples were divided into a training set and test set 1 by 70% and 30%. Specifically, 3200 samples (10 s data from two files under a torque load of 2.6 N·m and 7.8 N·m) as unlearned condition were used to evaluate the performance of the model to inspect its robustness. This set of 3200 samples is termed as test set 2.

The current signals of seven different types of fault and the normal condition are analyzed. As shown in Figure

Amplitude spectrum analysis of different types of faults. (a) Bearing inner race fault, (b) bearing outer race fault, (c) bearing ball fault, (d) planet gear fault, (e) planet bearing inner race fault, (f) sun gear fault. (g) ring gear fault, and (h) no fault.

From others’ research [

After processing the current signal with SSA, by considering the inner race fault bearing as an example (as shown in Figure

Different reconstructed signals from the components. (a) Supply frequency signal and (b) residual signal.

Therefore, the periodic components of the original signal can be removed, and the signal can be reconstructed with the residual signal. The periodic component is compared with the residual and original signals, as shown in Figure

Comparison of the periodic signal with the residual signal and the original signal.

Besides SSA, an approach based on WPD is also considered for comparison with SSA. After removing the nodes, which represent frequencies near the supply frequency in the binary decomposition tree, signals can be reconstructed.

Figure

Comparison of amplitude spectrum after the usage of WPD and SSA. (a) WPD result. (b) SSA result.

Before the experiment, signals are labeled according to the types of fault. Given the fact that traditional machine learning models require feature engineering, PCA functions as a feature extractor. Their combination, PCA-SVM, PCA-DT, PCA-RF, and PCA-AdaBoost, with CNN is used in this classification task. Signal processing methods can be divided into four groups according to whether they are preprocessed by SSA and whether they are transformed into the frequency domain via fast Fourier transform (FFT). Given the fact that SSA involves large-scale matrix operation and resampling of the original data can lead to information loss, the window size used should not be extremely large. In this case, it is 640.

For the experiment, we implement the Python-based programming. First, PCA compresses the raw signal to 20 dimensions in advance. By using equation (

However, these machine learning models, which should extract features in advance, are not suitable for the original long sequence signal. Thus, CNN is adopted. By considering the computational burden, the specific structure designed is shown in Table

CNN structure.

Input signal data (2560 × 1) | |
---|---|

Convolutional layer 1 | Input channel: 1, output channel: 4, kernel size: 3 |

ReLU | |

Max pooling, filter size: 5 | |

Output size (512 × 4) | |

Convolutional layer 2 | Input channel: 4, output channel: 8, kernel size: 3 |

ReLU | |

Max pooling, filter size: 4 | |

Output size (128 × 8) | |

Convolutional layer 3 | Input channel: 8, output channel: 16, kernel size: 3 |

ReLU | |

Max pooling, filter size: 4 | |

Output size (32 × 16) | |

Convolutional layer 4 | Input channel: 16, output channel: 32, kernel size: 3 |

ReLU | |

Max pooling, filter size: 4 | |

Output size (8 × 32) | |

Flatten | Feature size (256 × 1) |

Fully connected layer | Input size: 256, output size: 2560 |

ReLU | |

Input size: 2560, output size: 8 | |

Final output (classified by softmax) |

After the training, the current signals of test set 1 and test set 2 are inputted into each model to obtain the classification results. The accuracy of test set 1 is shown in Table

Fault diagnosis accuracy of test set 1.

Methods | Time domain | Frequency domain | Average (%) | ||||
---|---|---|---|---|---|---|---|

None (%) | SSA (%) | WPD (%) | None (%) | SSA (%) | WPD (%) | ||

CNN | 96.88 | 99.13 | 77.78 | 98.37 | 98.16 | 78.12 | 91.41 |

PCA-SVM | 83.33 | 99.20 | 82.64 | ||||

PCA-DT | 80.73 | 67.08 | 53.12 | 94.13 | 89.24 | 64.58 | 74.81 |

PCA-RF | 95.62 | 91.49 | 69.79 | 98.75 | 97.47 | 77.78 | 88.48 |

PCA-AdaBoost | 99.41 | 96.94 | 73.61 | 99.24 | 86.46 | 92.57 | |

Average | 94.49 | 90.77 | 71.53 | 98.04 | 96.69 | 77.92 |

The comparison of different models reveals that PCA-SVM exhibits the best performance in the time domain, whether to use preprocessing or not. It is also the best model in the frequency domain for reconstructed signals. It exhibits a maximum accuracy of 99.83% and an average accuracy of 93.92%. Furthermore, PCA-AdaBoost exhibits the second-highest average accuracy of 92.57%. It is the best model for signals, which are not preprocessed in the frequency domain. The average accuracy of CNN is slightly inferior to that of PCA-AdaBoost. These three models are very stable. PCA-RF is ranked after these three models. The worst model among them is PCA-DT. Its maximum accuracy is 94.13%. However, its average accuracy is lower than 80%.

The preprocessing method, SSA, seems to have a weak impact on the result of test set 1. It only improves the performance of CNN in the time domain and that of PCA-SVM slightly in the frequency domain. The accuracy of PCA-DT and that of two ensemble models in two domains decrease, which suggests that components of the reconstructed signal may contain certain misleading features learned by the three models. The performance of WPD is even worse than that of SSA, which suggests that SSA may be a better signal reconstruction solution.

After the signals are transformed into the frequency domain via FFT, the average classification accuracy increases. As for CNN and PCA-SVM, their performance in the time domain becomes extremely good. Thus, accuracy is easily affected, resulting in no obvious improvements or a slight decrease in accuracy. For the other three models, with data not preprocessed, the score of PCA-DT increase by 13.40%, that of PCA-RF by 3.13%, and that of PCA-AdaBoost by 0.35%. As for the reconstructed signal of SSA, the accuracy of PCA-DT increases by 22.16%, that of PCA-RF by 5.98%, and that of PCA-AdaBoost by 2.30%. Furthermore, WPD also shows the same improving trend.

In general, PCA-SVM is the best model in the test set 1. The deep learning model may not be better than the classical machine learning model. SSA mainly works on CNN in the time domain but does not significantly improve the performance of other models. Thus, the preprocessing method should be designed carefully in case of adverse effects. Transforming the time domain data into the frequency domain can improve diagnosis accuracy.

Figure

Confusion matrix of classification result with the original signal and CNN. IR represents bearing inner race fault; OR represents bearing outer race fault; B represents bearing ball fault; PG represents planet gear fault; IR (PG) represents that bearing in the gearbox has an inner race fault; SG represents sun gear fault; RG represents ring gear fault; and N represents no fault.

For test set 2, the accuracy of different methods is shown in Table

Fault diagnosis accuracy of test set 2.

Methods | Time domain | Frequency domain | Average (%) | ||||
---|---|---|---|---|---|---|---|

None (%) | SSA (%) | WPD (%) | None (%) | SSA (%) | WPD (%) | ||

CNN | 36.88 | 76.41 | 74.88 | 61.88 | 65.80 | ||

PCA-SVM | 57.09 | 67.56 | 48.12 | 80.91 | 65.00 | ||

PCA-DT | 32.09 | 44.50 | 28.75 | 71.03 | 69.09 | 45.62 | 48.51 |

PCA-RF | 44.72 | 64.22 | 41.25 | 78.37 | 78.53 | 63.44 | 61.76 |

PCA-AdaBoost | 47.53 | 67.53 | 45.94 | 79.91 | 60.31 | 63.76 | |

Average | 48.99 | 65.01 | 40.19 | 77.61 | 76.71 | 59.25 |

With respect to test set 2, the reconstruction of the original current signal by SSA plays a positive role, which is different from that for test set 1. The accuracy of all the models in the time domain increased significantly after SSA, especially that of CNN, which suggests that SSA increases the robustness in the time domain. Furthermore, it still outperforms WPD. In test set 2, the component that contains weak fault characteristics may be exposed via the decomposition method of SSA rather than WPD. However, the positive effect in the frequency domain is not significant.

The advantage of transforming to frequency domain is more evident, whether the preprocessing method is used or not. The performance in the frequency domain is better than that in the time domain. Furthermore, even the accuracy of SVM increases.

Our paper also uses vibration signals for comparison. As far as CNN is concerned, the accuracy of model classification for vibration signals is 99.0%. Although the best classification accuracy of the current signal in test set 1 exceeds that of the vibration signal, the training time consumed by the former is much longer than that of the latter. In the experiment, only five epochs were required, and the classification accuracy of the vibration signal can reach 99.0% while the current signal requires more than 100 epochs. Additionally, the model of vibration signal can easily attain an accuracy of 95% in test set 2, while the current signal can only attain a maximum accuracy of 81.34%. This proves that the model trained via vibration signal exhibits higher robustness. Therefore, the method based on the vibration signal is still better than the method based on the current signal.

In this paper, deep learning and different machine learning models are used for fault diagnosis of bearings and planetary gearboxes. The current signal of the motor is processed via different methods, and the results are compared.

From the perspective of models, PCA-SVM is the best model for learned data while CNN is the best model for unlearned data in the time domain. In the frequency domain, PCA-AdaBoost and PCA-SVM exhibit the best performance in both test sets. PCA-DT is the least recommended model. From the perspective of preprocessing, SSA mainly increases the accuracy of CNN in the time domain and exhibits a positive effect on unlearned data in the time domain. However, in other cases, the accuracy decreases due to SSA. This suggests that an effective preprocessing method should be designed for the target model in the future. Finally, by transforming the time domain data to the frequency domain data, the accuracy increases significantly, which in turn shows that fault features are more likely to be exposed in the frequency domain.

These conclusions can aid in the selection of methods for future studies related to the fault diagnosis of planetary gearbox based on MCSA.

The motor current data used to support the findings of this study are available from the corresponding author upon request.

The authors declare that there are no conflicts of interest regarding the publication of this paper.

The research work described in the paper was supported by the National Science Foundation of China under Grant no. 11872222 and the State Key Laboratory of Tribology under Grant no. SKLT2019B09.