CHEN Cheng, JIANG Ming, ZHANG Min
In complex scenarios, Chinese character recognition encounters challenges. Traditional feature extraction or classification methods often fail to extract sufficient features due to potential character occlusion or deformation, leading to recognition errors. Furthermore, existing models that integrate visual and language information overlook the importance of hidden character features in the language model, resulting in inaccurate error correction. To tackle this issue, we propose a multi-dimensional representation recognition algorithm based on characters, radicals, and key pen strokes. This algorithm utilizes self-attention mechanisms to extract multi-level character features, effectively addressing the problem of insufficient feature extraction and reducing recognition errors. Additionally, a multi-dimensional representation fusion mechanism is developed to connect the visual model with the language model, effectively conveying hidden character features to the language model. This algorithm accomplishes text recognition, character recognition, and error correction in three stages. Experimental results demonstrate that compared to state-of-the-art Transformer-based models, the algorithm in this paper achieved performance improvements of 1.81%, 1.11%, 0.25%, and 2.27% on scene text, web datasets, printed text and handwritten datasets, respectively.