Visual perception entails solving a wide set of tasks, e.g.,
object detection, depth estimation, etc. The predictions made
for multiple tasks from the same image are not independent,
and therefore, are expected to be ‘consistent’. We propose a
broadly applicable and fully computational method for augmenting learning with Cross-Task Consistency.
1 The proposed formulation is based on inference-path invariance
over a graph of arbitrary tasks. We observe that learning
with cross-task consistency leads to more accurate predictions and better generalization to out-of-distribution inputs.
This framework also leads to an informative unsupervised
quantity, called Consistency Energy, based on measuring
the intrinsic consistency of the system. Consistency Energy correlates well with the supervised error (r=0.67),
thus it can be employed as an unsupervised confidence
metric as well as for detection of out-of-distribution inputs
(ROC-AUC=0.95). The evaluations are performed on multiple datasets, including Taskonomy, Replica, CocoDoom, and
ApolloScape, and they benchmark cross-task consistency
versus various baselines including conventional multi-task
learning, cycle consistency, and analytical consistency.