Degradable fault-tolerant systems can be evaluated using rewarded continuous-time Markov chain (CTMC) models. In that context, a useful measure to consider is the distribution of the cumulative reward over a time interval [0, t]. All currently available numerical methods for computing that measure tend to be very expensive when the product of the maximum output rate of the CTMC model and t is large and, in that case, their application is limited to CTMC models of moderate size. In this paper, we develop two methods to compute bounds for the cumulative reward distribution of CTMC models with reward rates associated with states: BT/RT (Bounding Transformation/Regenerative Transformation) and BT/BRT (Bounding Transformation/Bounding Regenerative Transformation). The methods require the selection of a regenerative state, are numerically stable and compute the bounds with well-controlled error. For a class of rewarded CTMC models, class C'''_1 , and a particular, natural selection for the regenerative state the BT/BRT method allows to trade off bounds tightness with computational cost and will provide bounds at a moderate computational cost in many cases of interest. For a class of models, class C''_1 , slightly wider than class C'''_1 , and a particular, natural selection for the regenerative state, the BT/RT method will yield tighter bounds at a higher computational cost. Under additional conditions, the bounds obtained by the less expensive version of BT/BRT and BT/RT seem to be tight for any value of t or not small values of t, depending on the initial probability distribution of the model. Class C''_1 and class C'''_1 models with those additional conditions include both exact and bounding typical failure/repair performability models of fault-tolerant systems with exponential failure and repair time distributions and repair in every state with failed components and a reward rate structure which is a non-increasing function of the collection of failed components. We illustrate both the applicability and the performance of the methods using a large CTMC performability example of a fault-tolerant multiprocessor system.
Degradable fault-tolerant systems can be evaluated using rewarded continuous-time Markov chain (CTMC) models. In that context, a useful measure to consider is the distribution of the cumulative reward over a time interval [0, t]. All currently available numerical methods for computing that measure tend to be very expensive when the product of the maximum output rate of the CTMC model and t is large and, in that case, their application is limited to CTMC
models of moderate size. In this paper, we develop two methods to compute bounds for the cumulative reward distribution of CTMC models with reward rates associated with states: BT/RT (Bounding Transformation/Regenerative Transformation) and BT/BRT (Bounding Transformation/
Bounding Regenerative Transformation). The methods require the selection of a regenerative state, are numerically stable and compute the bounds with well-controlled error. For a class of rewarded CTMC models, class C′′′_1 , and a particular, natural selection for the regenerative state the BT/BRT method allows to trade off bounds tightness with computational cost and
will provide bounds at a moderate computational cost in many cases of interest. For a class of models, class C′′_1, slightly wider than class C′′′_1 , and a particular, natural selection for the regenerative state, the BT/RT method will yield tighter bounds at a higher computational cost. Under additional conditions, the bounds obtained by the less expensive version of BT/BRT and BT/RT
seem to be tight for any value of t or not small values of t, depending on the initial probability distribution of the model. Class C′′_1 and class C′′′_1 models with those additional conditions include both exact and bounding typical failure/repair performability models of fault-tolerant
systems with exponential failure and repair time distributions and repair in every state with failed components and a reward rate structure which is a non-increasing function of the collection of failed components. We illustrate both the applicability and the performance of the methods using a large CTMC performability example of a fault-tolerant multiprocessor system.