Failure detection plays a central role in the engineering of distributed systems. Furthermore, many applications have timing constraints and require failure detectors that provide quality of service (QoS) with some quantitative timeliness guarantees. Therefore, they need failure detectors that are fast and accurate.
We introduce the Two-Windows Failure Detector (2W-FD), an algorithm able to react to sudden changes in network conditions, property that currently existing algorithms do not satisfy. We ran tests on real traces and compared the 2W-FD to state-of-art algorithms. Our results show that our algorithm presents the best performance in terms of speed and accuracy in unstable scenarios.