a stream synchronization scheme based on event-time to pace the parsing of new data and reduce memory consumption,
a query planner which produces streaming join plans that support application updates, and
a stream time estimation scheme that handles the variations on the distribution of event-times observed in real-world streams and achieves high join accuracy.
Providing Streaming Joins as a Service at Facebook. Jacques-Silva, G., Lei, R., Cheng, L., et al. (2018). Proceedings of the VLDB Endowment, 11(12), 1809-1821.
Stream-stream joins are a hard problem to solve at scale. “Providing Streaming Joins as a Service at Facebook” provides us the overview of systems within Facebook to support stream-stream joins.
The key contributions of the paper are:
Trade-offs in Stream-Stream Joins
Stream-stream joins have a 3-way trade-off of output latency, join accuracy, and memory footprint. One extreme of this trade-off is to provide best-effort (in terms of join accuracy) processing-time joins. Another extreme is to persist metadata associated with every joinable event on a replicated distributed store to ensure that all joinable events get matched. This approach provides excellent guarantees on output latency & join accuracy, with memory footprint sky-rocketing for large, time-skewed streams.
The approach of the paper is in the middle: it is best-effort with a facility to improve join accuracy by pacing the consumption of the input streams based on dynamically estimated watermarks on event-time... Read More