Existing traffic flow prediction models based on graph neural networks and attention mechanisms have shortcomings in capturing complex spatiotemporal dependencies, overcoming the constraints of predefined graph structures, and modeling periodic patterns. Thus, a multi-scale adaptive graph attention Transformer (MSAGAFormer) was proposed. Short-, medium-, and long-term historical traffic data were divided into low-, medium-, and high-scale temporal sequences, and a compression mechanism was employed to reduce redundant information and enhance the efficiency of temporal feature representation. A spatiotemporal embedding method was designed to encode node positions and temporal attributes, thereby strengthening the model’s capability to interpret spatiotemporal data. A GAT-based multi-head attention mechanism was utilized in the spatial layer to model dynamic spatial correlations, while a multi-scale temporal attention structure was incorporated in the temporal layer to capture dynamic variations across different temporal granularities. Experimental results on the PEMS datasets demonstrated that MSAGAFormer outperformed state-of-the-art models such as Trendformer, ATST-GCN, and STTN in prediction accuracy.