High spatial resolution remote sensing images contain rich information, it is therefore very important to study their semantic segmentation. Traditional machine learning methods appear low accuracy and efficiency when used for segmenting high-resolution remote sensing images. In recent years, the deep learning method has developed rapidly and has become the mainstream method of image semantic segmentation. Some scholars have introduced SegNet, Deeplabv3+, U-Net and other neural networks into remote sensing image semantic segmentation, but these networks have only limited effect in remote sensing image semantic segmentation. This paper improves the U-Net network for semantic segmentation of remote sensing images. Firstly, an improved convolutional attention module channel interaction and spatial group attention module (CISGAM) is embedded in the feature extraction stage of the U-Net network, so that the network can obtain more effective features; secondly, a residual module is used in the decoding layer to replace the ordinary convolutional layer to avoid the degradation of the model. In addition, we use an attention pyramid pooling module (APPM) with CISGAM to connect the encoder and decoder of U-Net to enhance the network's extraction of multi-scale features. Finally, experiments are carried out on the UC Merced dataset with 0.3m resolution and the GID dataset with 1m resolution. Compared with the original networks such as U-Net and Deeplabv3+, the mean intersection over union (MIoU) of our method on the UCM dataset has increased by 14.56% and 8.72%, and the mean pixel accuracy (MPA) has increased by 12.71% and 8.24%, respectively. In the classification results on the GID dataset, the classification accuracy of waters, buildings and other objects has also been greatly improved. Compared with the original CBAM and PPM, the CISGAM and APPM also achieve certain performance improvement. The experimental results show that the feasibility and robustness of the model is stronger than traditional networks, and it can improve the accuracy of semantic segmentation of high-resolution remote sensing images through stronger feature extraction capabilities, hence providing a new approach for intelligent interpretation of high-resolution remote sensing images.