15 July 2025, Volume 45 Issue 4
    

  • Select all
    |
    Special Issue on the Summit Discipline of Computer Science and Technology
  • XU Xiaoliang, ZHU Runkai, GENG Yuxia
    Journal of Hangzhou Dianzi University. 2025, 45(4): 1-11. https://doi.org/10.13954/j.cnki.hdu.2025.04.001
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Compositional Zero-Shot Learning (CZSL) defines categories as combinations of attributes and objects, requiring models to accurately recognize novel combinations not encountered during training. CZSL not only reduces dependence on labeled data but also explores the compositional generalization capabilities of deep learning models. Existing methods often neglect the appearance variability of attributes and objects in different compositional contexts, resulting in insufficient generalization. To address this, we propose a Context-Aware Decision Mechanism for CZSL. Leveraging the multimodal pretrained CLIP model as an encoder, our approach incorporates textual soft prompts, adapter fine-tuning, and feature denoising mechanisms to independently extract attributes and object features. Furthermore, we design a context-related gating network to adaptively fuse the predicted scores of attributes and objects during the decision-making phase. Experiments on four standard benchmark datasets demonstrate that our method significantly enhances CZSL performance, validating the effectiveness of context-aware decision-making in handling attribute-object combinations.
  • CHEN Xiaodiao, YANG Hao
    Journal of Hangzhou Dianzi University. 2025, 45(4): 12-21. https://doi.org/10.13954/j.cnki.hdu.2025.04.002
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Curve approximation is a fundamental problem in Computer Aided Geometric Design (CAGD), and has wide applications in data compression, path planning and trajectory generation. Challenges such as high computational complexity and poor approximation results are often encountered when approximating complex intersecting curves. To address these challenges,a new curve approximation algorithm is proposed, which minimizes the integral of squared distance between the original curve and its approximation one for determining the control points of the approximating curve. Unlike traditional integral minimization methods, the proposed algorithm employs the Gauss-Kronrod quadrature to efficiently compute complex integrals, improving both accuracy and computational efficiency. Experimental results demonstrate that the proposed algorithm achieves higher approximation accuracy and reduced data volume compared to prevailing approximation methods.
  • CHEN Cheng, JIANG Ming, ZHANG Min
    Journal of Hangzhou Dianzi University. 2025, 45(4): 22-31. https://doi.org/10.13954/j.cnki.hdu.2025.04.003
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    In complex scenarios, Chinese character recognition encounters challenges. Traditional feature extraction or classification methods often fail to extract sufficient features due to potential character occlusion or deformation, leading to recognition errors. Furthermore, existing models that integrate visual and language information overlook the importance of hidden character features in the language model, resulting in inaccurate error correction. To tackle this issue, we propose a multi-dimensional representation recognition algorithm based on characters, radicals, and key pen strokes. This algorithm utilizes self-attention mechanisms to extract multi-level character features, effectively addressing the problem of insufficient feature extraction and reducing recognition errors. Additionally, a multi-dimensional representation fusion mechanism is developed to connect the visual model with the language model, effectively conveying hidden character features to the language model. This algorithm accomplishes text recognition, character recognition, and error correction in three stages. Experimental results demonstrate that compared to state-of-the-art Transformer-based models, the algorithm in this paper achieved performance improvements of 1.81%, 1.11%, 0.25%, and 2.27% on scene text, web datasets, printed text and handwritten datasets, respectively.
  • ZHANG Zhuoqun, WANG Rongbo, HUANG Xiaoxi
    Journal of Hangzhou Dianzi University. 2025, 45(4): 32-41. https://doi.org/10.13954/j.cnki.hdu.2025.04.004
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Mined information from massive medical texts and extracting key named entities is crucial to improving the efficiency of medical information processing and dialogue understanding.Aiming at the difficulty of recognizing complex medical terms in named entity recognition tasks in the medical field, and the difficulty of comprehensive and accurate capture by a single feature method, a medical text named entity recognition method is proposed, which integrates multiple features of Pinyin, glyphs and lexicon information.This method employs pre-trained models to extract semantic vectors, constructs glyph vectors based on Chinese character strokes and radicals, along with tonal Pinyin vectors. It aligns medical dictionaries through bidirectional matching, dynamically fuses multi-modal features, and ultimately inputs them into a BiLSTM-CRF model to enhance entity recognition accuracy. The experimental results demonstrate that the proposed method significantly outperforms other comparative experiments in recognizing online doctor-patient conversation texts from the IMCS internet platform. The highest F1 score achieved on the test set reaches 92.5%, representing an improvement of up to 8.6 percentage points compared to the F1 scores of the other five comparative experiments (Base-BERT, MC-BERT, RoBERTa-wwm-ext, Medbert-Kd-Chinese, and Albert-Base-Chinese). Furthermore, ablation experiments also further verified the effectiveness of the fusion features in improving the recognition performance.
  • WANG Weizhi, HU Haiyang, GU Pan, CUI Gaobin, LI Zhenghua, LI Yuan
    Journal of Hangzhou Dianzi University. 2025, 45(4): 42-50. https://doi.org/10.13954/j.cnki.hdu.2025.04.005
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Rivets are often used to connect different metal parts or structures in the industrial field, and some serious safety problems may occur once assembly errors or missing installation occur. In order to solve the above problems, a LRD-DETR (Limited Ranged-Deformable-DETR) model for detecting rivet is proposed. After feature extraction in the feature extraction network, a multi-scale fusion module DSPANet is added to enhance the semantic information of feature maps. A range-limiting deformable attention mechanism is designed in the encoder, which limits the sampling range of each reference point. The DAB-DETR is used in the decoder; and based on it, a new MLP module is designed to update the 4D anchor frame information every time. The experimental results showed that the model in the self-built rivet dataset achieved a satisfactory result of 59.2% mAP value, which was better than prevailing target detection algorithms.
  • BAI Yizhuo, DOU Xiaolong, ZHOU Wenhui
    Journal of Hangzhou Dianzi University. 2025, 45(4): 51-58. https://doi.org/10.13954/j.cnki.hdu.2025.04.006
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    In view of the characteristics of short duration and subtle facial muscle movements of micro-expressions, a dual-stream Transformer-based micro-expression sequence recognition network (TSTMER) is proposed. Firstly, the TV-L1 optical flow method is used to calculate the optical flow information between the micro-expression sequence frames and the starting frame, in order to obtain the motion characteristics of facial muscles in micro-expressions. Secondly, a micro-expression feature extraction module is designed, which consists of two Transformer networks and a cross-attention mechanism. The two Transformer networks are used to extract spatial features of optical flow images and micro-expression images respectively, and the cross-attention mechanism fuses their spatial features into new ones. Then, a spatio-temporal attention mechanism is designed to enhance the weights of micro-expression local regions and sequence frames. Finally, Bi-LSTM is used to learn both the forward and the backward dependencies of micro-expression sequences in time dimension. Experiments on multiple micro-expression datasets demonstrate that the TSTMER network has significant advantages in UAR and UF1.
  • ZHANG Yakai, WU Zizhao, YANG Ping
    Journal of Hangzhou Dianzi University. 2025, 45(4): 59-68. https://doi.org/10.13954/j.cnki.hdu.2025.04.007
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    The development of information technology has led to a tremendous demand for 3D assets. Recently, text-guided 3D generation models have gained widespread attention due to their ease of use. To meet the growing demand for fast and diverse high-quality 3D shape generation, we propose a 3D shape generation method, named Re3Diffusion, based on a multi-modal retrieval system and latent space diffusion model. Given a text prompt, Re3Diffusion can access an external multi-modal knowledge base to retrieve relevant shapes, using them as references for generating 3D models. To generate 3D shapes more aligned with the text description, we first use contrastive learning pretraining to narrow the gap between different modalities through parameter learning, and then combine the retrieval system during inference to reduce the distance between the generated shape and the target distribution via non-parameter learning. To avoid misleading generation results caused by irrelevant information retrieved with low correlation to the text condition, we further design an adaptive weighting scheme to selectively utilize the retrieved shape information. This method also supports text-guided shape editing. Experimental results show that the method significantly improves the semantic matching between the generated 3D shape and the text condition while maintaining fast generation speed, and it also demonstrates novel out-of-domain generalization ability.
  • ZHAO Jian, YING Na, SHU Qin
    Journal of Hangzhou Dianzi University. 2025, 45(4): 69-77. https://doi.org/10.13954/j.cnki.hdu.2025.04.008
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Aiming at the problems of current gait recognition methods in the recognition accuracy under cross-view conditions and performance under complex wearing conditions, this paper proposes a cross-view gait robot with global and local feature fusion (GLFF), which combines with the introduction of neural network hyperparameters to form the brain neural network for enhancing the network capabilities. Firstly, the power segmentation camera captures the movement information of the limbs and enhances the fine-grained characteristics of the network block space; and then, uses the set pooling layer (SP) to extract the color features with high viewing angle robustness, aggregate the gait at the time point of the asynchronous state, integrate a series of features and use the micro-action template builder to extract the micro-action features, and train to obtain GLFF. The CASIA-B data set of the Institute of Automation of the Chinese Academy of Sciences was used for experimental verification. The average recognition accuracy of this software under cross-view conditions reached 88.9%, and the recognition rate under simulated wearing conditions (CL) reached 79.5 %. This result can verify that fusion with local components can effectively reduce the influence of wearing factors on cross-view gait recognition.
  • GUO Yinjun, SHI Yifang
    Journal of Hangzhou Dianzi University. 2025, 45(4): 78-87. https://doi.org/10.13954/j.cnki.hdu.2025.04.009
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    In the application of target tracking in complementary field-of-view sensor networks, when a target crosses the field-of-view of multiple nodes, it can only be observed and return measurement by a subset of nodes at each time step. This leads to disparate accuracy in the state estimation of the same target among different nodes, causing a mismatch in the gain weights for adaptive consensus information fusion between nodes and resulting in degraded target tracking performance in the sensor network. In this paper, we propose a confidence-based adaptive consensus filter algorithm (CB-ACF) based on the existing ACFr algorithm. The proposed algorithm utilizes the historical statistical measures of target observability by nodes to categorize the target's state across nodes into two situations: entering the field-of-view and exiting the field-of-view. It then designs an adaptive confidence calculation rule that reflects the trend of changes in the accuracy of target state estimation by each node. By using node confidence, it designs consistency gain weights and performs adaptive consensus filtering fusion of state estimation values among nodes. Simulation results demonstrate that compared with the prevailing adaptive consensus filter algorithm, the proposed algorithm significantly improves the target state estimation accuracy in the sensor network and enhances the consistency of target state estimation among nodes.
  • LIU Chunfeng, WANG Haiyan, DING Xianghai
    Journal of Hangzhou Dianzi University. 2025, 45(4): 88-96. https://doi.org/10.13954/j.cnki.hdu.2025.04.010
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    In order to solve the problem of high capacity idle rate and uneven interests of multi-parties in the production capacity sharing platform, the service fee input and order acquisition income of the manufacturers, the requirements for product quality and delivery of the costumers, the production coordination and cost control of the platform, a model is constructed with the objective of capacity utilization fairness and order value fairness, which can get the strategies of order splitting, order allocation and production scheduling. An improved discrete cuckoo search algorithm is proposed to solve the model. A random walk mechanism is revised to avoid the algorithm from falling into local optima. An adaptive discard probability is designed for cuckoo to get more random walks in the early stage, which increases the diversity of solutions. Meanwhile, a special method of encoding mapping is developed to prevent Levy flight from degrading to random search. Numerical experimental results show that the optimization performance of the improved discrete cuckoo search algorithm is better than that of classic cuckoo search algorithm and genetic algorithm within the same runtime.
  • ZHU Jiabin, KONG Wanzeng
    Journal of Hangzhou Dianzi University. 2025, 45(4): 97-102. https://doi.org/10.13954/j.cnki.hdu.2025.04.011
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    In order to address the challenges of low recognition rate and poor stability caused by the long-tailed distribution of electroencephalography data in the brain-computer interface target detection paradigm, this paper proposes a brain-computer target detection model based on decoupled representation network. Firstly, the EEG data is input to the feature extraction module of the neural network model in the form of triplets, extracting the spatio-temporal features of the EEG data. And then, these features are mapped to a low-dimensional space using a projection layer, and the feature extraction module is trained using the triplet loss function. Subsequently, with the parameters of the feature extractor frozen, the EEG data is downsampled to balance the number of target and non-target samples, and the classifier is trained using a cross-entropy loss function. Experimental results demonstrate that compared to the traditional one-stage training approach, the decoupling learning of both the feature extractor and the classifier in the proposed model achieves higher classification accuracy and stability.