# Interaction and Decision Making in Autonomous Driving Workshop- RSS 2020 a Summary

*by Arjun Srinivasan Ambalam and Prasheel Renkuntla*

Robotics: Science and Systems-IDA2020

Interactive decision-making for Autonomous vehicles is a higher-level planning problem that takes into account the behavior of other vehicles/pedestrians/autonomous vehicles in the environment of the ego vehicle. Rule-based approaches mostly result in conflicts as the assumptions made by any model deployed do not factor all the possible situations. Some of the main approaches which researchers are trying to focus more on can broadly be classified under these categories

- Solving through
*POMCP**based solvers* *Game-Theoretic*Approaches- Learning-based methods-
*Reinforcement Learning*, Imitation learning, etc.

Most of these approaches have produced good results where the assumptions are relatively insignificant when compared to conventional methods that depend on pre-determined rules. In different scenarios like Lane Changing, Merging, Roundabouts, Intersections, these algorithms produced a better maneuver when compared to the rule-based approaches and reactive approaches that predict the trajectory of the surrounding objects and plan accordingly.

We selected these papers from the workshop such that they cover most of the recent approaches researches are focusing.

**Reinforcement Learning-based Control of Imitative Policies for Near-Accident Driving **paper, video

**Summary**

They have proposed a method to address driving in near-accident scenarios, namely the hierarchical reinforcement and imitation learning (H-REIL) approach which consists of low-level policies that learn discrete driving modes such as aggressive and timid modes through Imitation Learning which is best suited for learning from few expert demonstrations, and a high-level policy that maximizes the return based on rewards that contain a trade-off between various terms, such as efficiency and safety learned by Reinforcement Learning that switches between these modes.

In the aggressive mode, the ego car’s objective is efficiency so it always drives fast in order to reach the destination in minimal time. In the timid mode being safer is the objective, so it usually drives at a mild speed and pays attention to all potential threats.

Technical Contributions of the paper:

- H-REIL, approach which does the optimal switching between driving modes which are individually learned by IL
- Their policy outperforms imitation learning policies, the individual policies learned for each mode, and a policy based on random mixtures of modes, in terms of efficiency and safety which are tested through a simulation platform
- User study which shows the preference of the H-REIL driving policies compared to other methods in near-accident scenarios.

Our thoughts and future directions:

*They have hand-designed the near-accident scenarios for which they have learned a policy through IL which can be improved as cited in the **paper**.*

*A more elaborate number of **driving behaviors** can be considered based on different types of near-accident scenarios can be considered for a more extensive study through simulations and user study.*

**POMDP Autonomous Vehicle Visibility Reasoning **paper, video

This paper presents a solution for the decision-making problem for the autonomous vehicle on how to behave at a T-intersection under limited visibility. They propose a novel architectural implementation of a mathematical framework called MODIA (multiple online decision-components with interacting actions). It consists of POMDP decision problems (DPs) that are solved offline. DPs are instantiated as decision components (DCs) as they perceive vehicles on the go. DCs recommend an action (stop, edge, and go) along the arbitration points in the route, with conflicts resolved by an executor arbitration function that takes the safest action. Virtual vehicles imagined just outside of the field-of-view, are also created and instantiate DCs to allow for reasoning about possible imperceptible vehicles. Multiple collaborative AVs are also able to share information based on Vehicle to Vehicle communication, automatically affecting virtual vehicles and improving POMDP decision-making.

*Technical Contributions of the paper are*

*a general architecture for limited visibility POMDP decision-making in AVs**a novel POMDP solution for T-intersections**a real-world demonstration of POMDPs in MODIA*

Our thoughts and future directions:

The results obtained through experiments of their MODIA architecture on Autonomous Vehicle prove better than a rule-based architecture and a previous MODIA architecturesIgnorant and naive are promising. To test such algorithms a simulation environment can be developed or a method of formulating them in the simulation would be better as experimentation on actual vehicle platform seems difficult and progress of improvement of algorithms would be delayed due to practical implementation difficulties.

This architecture seems to be generalizable to other robot tasks which need to be explored

**Autonomous Driving with Interpretable Goal Recognition and Monte Carlo Tree Search ***paper**, **video*

**Summary**

Autonomous driving algorithms lack the ability to predict the intentions and driving trajectories of other vehicles i.e. to reason about “why and to what end, a vehicle performed its current and past maneuvers”. This paper approaches the problem with a unique combination of planning and prediction. An extended version of the paper can be found here.

The problem is formulated as a POMDP where each state has information about the vehicle’s pose, velocity, acceleration, an observation containing states of nearby vehicles, and action that controls vehicle’s latitude and longitudinal movements. The local observations and actions are independent. The goal here to find an optimal policy that selects actions for the ego vehicle to achieve a specified goal while optimizing the driving trajectory with a defined reward function.

Technical Contributions of the paper are:

- Propose a novel integration of rational inverse planning to recognize the goals of other vehicles.
- Plan an optimal maneuver for the ego vehicle with the information obtained from inverse planning and use it with Monte Carlo Tree Search
- Evaluate the system for four scenarios that include junctions, roundabout entry, and dense lane merging.

Their approach is to first, calculate the possible goals for each vehicle in the surroundings and make an inverse plan (with A*) to each goal thus giving the probabilities and predicted trajectories for their goals. Then, use this information at the simulation process of a Monte Carlo Tree Search that generates an optimal maneuver plan for the ego vehicle.

The figure below shows the unique goal recognition algorithm based on the Bayesian posterior distribution over each vehicle’s goal at a particular time.

With rational decision making, the algorithm was tested in four scenarios as stated above and compared to different versions of the MCTS as Full (full system with goal recognition), MAP (Most probable goal and trajectory from the algorithm), CVel (without their algorithm but based on velocity), CVel-Avg (same like before but predicts velocity to avg. velocity), Cons(conservative give-way maneuver). With time analysis, it was found that Full and MAP versions of MCTS were able to determine the other vehicle’s behavior in much lesser time for most of the scenarios.

Our thoughts and future directions:

*Overall, this is one of the best approaches to solve a behavior planning task in less time that is sub-optimal and converges as the computational power increases. Furthermore, this work can be extended with the inclusion of the behavior of other traffic participants such as cyclists, or can even be applied to other domains like safety robots indoor navigation, etc.*

**Altruistic Decision-Making for Autonomous Driving with Sparse Rewards-**paper, video

**Summary**

This paper is a solution for the decision making task-based on Game Theory. Here, the game is a two-player Stackelberg game. A short presentation about game theory can be found here: Game Theory.

Consider a lane merging task, where an autonomous vehicle has no interaction with another vehicle already in the lane. If the vehicles were to follow assumptions that were defined before deployment, it would result in conflicting conclusions.

Technical Contributions of the paper are:

- Propose a method for interactive decision making that does not require a priori information about the other vehicles on the road and be less susceptible to conflicts
- Combine insights from Pure altruism, Psychological and behavioral studies, to implement a version of Social Value Orientation (SVO) that is novel and unique.

This paper considers a state of the art approach based on a combination of Game Theory and Psychology and behavioral economics which swaps the role of leader-follower determined by the current reward. This decision making task is evaluated with a metric — Area of Conflict (AoC) to quantify the extent to which conflicts will arise based on the assumptions on who takes the role of the leader/follower.

A general reward matrix is shown above where C is the column player and R is the reward for choosing A, B actions. With their approach of computationally tractable **augmented altruism**, they consider that (A2, B1) is optimal for R and (A1, B2) is optimal for C and with no ambiguities in each of the agent’s rewards. In the theoretical analysis, this model is compared with the vanilla Baseline, Pure Altruism (from game theory), Social Value Orientation (from Psychology and behavior), Altruism (another version of pure altruism), and it was found that the AoC value is the least for Augmented altruism. It was observed that when B = 1, the range 0.33 < A < 3 results in minimal values. The lowest number of conflicts with 9 conflict cells and 16 non-conflict cells was achieved with their augmentation altruism model.

Our thoughts and future directions:

*The author’s contribution is in a way rare in solving the decision-making task particularly focused on lane merging. It combines different domains like game theory, psychology, and behavioral studies to determine a reward function that achieved minimal conflicts when compared to methods employed in the specific domains individually. One possible future task is that it can be expanded for more number of players with a new matrix and a reward computed for each agent.*

**Combined Learning and Optimization for Motion Trajectory Prediction with Environment Constraints **paper, video

In order to achieve interactive decision making for autonomous driving, the ever-existing motion planning problem is quite challenging as it has to take into consideration the surrounding environment. Generally, a learning-based model does not factor in all the constraints towards the feasibility of a prediction whereas trajectory optimization methods try to obtain a path that confines itself to a given cost without incorporating any collected data. This paper addresses such a problem of **including environmental constraints on a machine learning-based model** solved in conjunction with **trajectory optimization**.

Technical Contributions of the paper are:

- Propose a framework to incorporate both learning and optimization to generate obstacle-free trajectories.
- Predict future trajectories as distributions over continuous-time functions and develop a method to learn these distributions from the data.
- Get an optimized distribution of trajectories that complies with environmental constraints and minimizes divergence from the learned solution.

*The work is limited to predicting trajectories in 2D based on past coordinates of a dynamic object along with occupancy information. The motion is modeled as continuous-time trajectories in Reproducing Kernel Hilbert Space (**RKHS**).*

Once the neural network captures distributions of continuous-time trajectories, the optimization problem is set up with the KL divergence, where this factor determines the closeness of the optimal trajectory under the environmental constraints. These constraints are enforced using the cost operator which unlike the motion planning problems, takes a distribution of trajectories instead of a single trajectory as input. Combined with the occupancy grid map, we thus obtain the future near-optimal collision-free trajectories.

Our thoughts and future directions:

*This is an interesting approach where the data collected over the past is used to determine a distribution of trajectories which is then optimized based on the given environmental constraints. Such combined learning, though in its early stages can be useful in determining real-time trajectories with enough computational power and efficiency. There is room for future work where more complex factors can be encoded in the learning process, i.e, given as an input to the neural network. Overall, it has a good scope in interactive decision making.*

**Concluding Remarks:**

Interactive decision making for autonomous vehicles is a challenge. There is no method that has been awarded a clear win.

Game theory is a rational approach where it tries to maximize one’s own payoff and is useful to predict realistic behaviors. Every action, good or bad, can be rationalized in the name of self-interest. The difficulty with game theory-based approaches is defining, limiting, isolating, or accounting for every set of factors and variables that influence the strategy and outcome.

In a Reinforcement learning-based approach, the algorithm produces significant results in simulations but the bridge that connects this to a real scenario is still not robust i.e. RL faces a big challenge when it tries to make decisions in a real-world scenario. Though there are good approaches they are specifically tuned for a given application.

Out of all, the POMCP solver based on a novel goal recognition algorithm is a rare approach which not only produces a near-optimal solution with a given computational power but also, is more practical compared to the emerging game-theoretic approaches.

All these methods discussed have their own limitations and merits which need a lot of research and discussions from the community as a whole

Thank You!