Seeing Tree Structure from Vibration


Figure 1: We want to infer the hierarchical structure of the tree in video (a). Inference based on a single frame has inherent ambiguities: figure (b) shows an example, where it is hard to tell from appearance whether point P1 is connected to P2 (orange curve) or to P3 (blue curve). Time domain motion signals do not help much, as these branches have almost identical movements (c). We observe that the difference is significant in the frequency domain (d), from which we can see P1 is more likely to connect to P2 due to their similar spectra. We therefore develop an algorithm that infers tree structure based on both vibration spectra and appearance cues. The results are shown in (e).


Humans recognize object structure from both their appearance and motion; often, motion helps to resolve ambiguities in object structure that arise when we observe object appearance only. There are particular scenarios, however, where neither appearance nor spatialtemporal motion signals are informative: occluding twigs may look connected and have almost identical movements, though they belong to different, possibly disconnected branches. We propose to tackle this problem through spectrum analysis of motion signals, because vibrations of disconnected branches, though visually similar, often have distinctive natural frequencies. We propose a novel formulation of tree structure based on a physics-based link model, and validate its effectiveness by theoretical analysis, numerical simulation, and empirical experiments. With this formulation, we use nonparametric Bayesian inference to reconstruct tree structure from both spectral vibration signals and appearance cues. Our model performs well in recognizing hierarchical tree structure from real-world videos of trees and vessels.