Traffic signal control algorithms are developed using meta-reinforcement learning

Traffic signal control algorithms are developed using meta-reinforcement learning
Traffic signal control algorithms

In urban areas, traffic signal control has a significant impact on day-to-day life. Based on the traffic conditions, the existing system utilizes a theory-based controller to alter the traffic lights. During unsaturated traffic conditions, the objective is to minimize vehicle delays while maximizing vehicle throughput during congestion.

Nevertheless, the existing traffic signal controller is not capable of fulfilling such objectives, while a human controller is only able to manage a limited number of intersections. Artificial intelligence has been used to develop alternate methods for controlling traffic signals as a result of recent advancements in the field.

In recent research on this topic, reinforcement learning (RL) algorithms have been explored as a possible method. In spite of this, RL algorithms sometimes do not work because of the dynamic nature of traffic environments, i.e., traffic conditions at an intersection are affected by traffic conditions at other junctions nearby. Although multiagent RL can address this interference issue, it suffers from exponentially increasing dimensionality as intersections increase.

A team of researchers led by Prof. Keemin Sohn from Chung Ang University in Korea recently proposed a meta-RL model to address this issue. To control traffic signals, the team developed an extended deep Q-network (EDQN)-based meta-RL model that incorporates context.

Several studies have devised meta-RL algorithms based on intersection geometry, traffic signal phases, or traffic conditions, but the current study explores the non-stationary aspect of traffic signal control based on congestion levels. According to Prof. Sohn, their study has been published in Computer-Aided Civil and Infrastructure Engineering. "The meta-RL detects traffic states autonomously, classifies traffic regimes, and assigns signal phases," he explained.

Here is how the model works. An indicator of the overall environmental condition is used to determine whether the traffic regime is saturated or unsaturated. Like a human controller, the model maximizes throughput or minimizes delays as a function of traffic flow. This is accomplished by implementing traffic signal phases (actions).

In the same way as intelligent learning agents, this action is controlled by the provision of a reward function. In this example, the reward function is set at +1 or -1 depending on whether the traffic handling performance was better or worse compared to the previous interval. Furthermore, the EDQN functions as a traffic signal decoder that controls several intersections at once.

A commercial traffic simulator, Vissim v21.0, was used to train and test the meta-RL algorithm after its theoretical development. A real-world testbed was selected in southwest Seoul consisting of 15 intersections. During meta-testing, the model was able to adapt to new tasks without having to adjust its parameters as a result of meta-training.

Based on the simulation experiments, it was determined that the proposed model was capable of switching control tasks (via transitions) without requiring explicit knowledge of the traffic conditions. Additionally, it could differentiate rewards based on traffic saturation levels. Furthermore, the EDQN-based meta-RL model outperformed existing algorithms for traffic signal control and was capable of being extended to tasks involving different transitions and rewards.

In spite of this, the researchers suggested that a more precise algorithm was required to account for differences in saturation levels between intersections. For traffic signal control, existing research employs reinforcement learning based on a single fixed objective. However, the present study has developed a controller that can independently select a target based on the current traffic conditions. Prof. Sohn concludes that if traffic signal control agencies adopt the framework, they may reap travel benefits that have never before been experienced.

Src: Chung Ang University