As we have analyzed the timing behaviour of DeX one-way and two-way interactions, the next step is to formally guarantee correctness of the implemented solutions. This is particularly needed in the case of IoT systems as there are varied interaction patterns with timing constraints that could be affected by incorrect parameter setting. To verify safety and reachability properties of these interactions, we make use of model checkers. Using the expressive nature of some of these models, we can include stochastic delays along with deterministic time bounds to formally model the interactions. This further allows to guarantee absence of deadlocks or livelocks within the system, which is difficult to estimate via simulations or data analysis.
In this section, we build timed automata models which represent the typical behavior of the DeX connector model for performing the timed DeX interactions described in the previous section. A timed automaton [5] is essentially a finite automaton extended with real-valued clock variables. These variables model the logical clocks in the system, which are initialized with zero when the system is started, and then increase synchronously at the same rate. Clock constraints are used to restrict the behavior of the automaton. A transition represented by an edge can be taken only when the clock values satisfy the guard labeled on the edge. Clocks may be reset to zero when a transition is taken. Clock constraints are also used as invariants at locations, which are represented by vertices: they must be satisfied at all times when the location is reached or maintained.
In order to study DeX interactions with timed automata, we make use of UPPAAL [6]. UPPAAL is an integrated tool environment for modeling, validation and verification of real-time systems modeled as networks of timed automata. In such networks, automata synchronize via binary synchronization channels. For instance, with a channel declared as chan c, a transition of an automaton labeled with c! (sending action) synchronizes with the transition of another automaton labeled with c? (receiving action). UPPAAL makes use of computation tree logic (CTL) [18] to specify and verify temporal logic properties. We employ the committed location qualifier (marked with a ‘C’) for some of the locations. In UPPAAL, time is not allowed to pass when the system is in a committed location; additionally, outgoing transitions from a committed location have absolute priority over normal transitions. The urgent location qualifier (marked with a ‘U’) is also used: time is not allowed to pass when the system is in an urgent location (without the priority clause of committed locations, though).
By relying on the expressive power of timed automata, we are able not only to model the timing conditions of DeX interactions, but also to introduce basic stochastic semantics regarding the behavior of peers. Using the UPPAAL model checker, we provide and verify essential properties of our timed automata model, including formal conditions for successful DeX interactions. Note that the time to verify the properties in UPPAAL is on the scale of one second maximum.
5.1 Analysis of one-Way interactions
We represent one-way DeX interactions with the connector roles DeX sender, DeX receiver, and with the corresponding DeX one-way glue. The two roles model the behavior expected from application components employing the connector, while the glue represents the internal logic of the connector coordinating the two roles. We detail in the following the modeling of these components.
Figure 7 shows the sender behavior. Typically, a sender entity repeatedly emits a post! action (message) to the glue without receiving any feedback about the end (successful or not) of the post operation. We have enhanced (and at the same time constrained) the sender’s behavior with a number of features. The committed locations post_event (post! sent to the glue) and post_end_event (post_end? received from the glue) have been introduced to detect the corresponding events. Upon these events, the automaton oscillates between the post_on and post_off locations, which correspond to the δpost-on and δpost-off intervals presented in Fig. 5. delta_post is a clock that controls the δpost interval between two successive post operations. delta_post is reset upon a new post operation and set to lifetime at the end of this operation (note that the post_init location and its outgoing transition serve to initialize delta_post at the beginning of the sender’s execution – this unifies verification also for the very first post operation). The invariant condition delta_post<=max_delta_post (where max_delta_post is a constant) at the post_off location ensures that a new post operation will be initiated before the identified boundary.
This setup results in at most one post operation active at a time. This post remains active (δpost-on interval) for lifetime interval (and then it expires) or less than lifetime interval (in case of successful interaction). In both cases, we set delta_post to lifetime at the end of the post operation (this enables verification, since we can not capture absolute times in UPPAAL). Hence, the immediately following δpost-off interval will last a stochastic time uniformly distributed in the interval [lifetime,max_delta_post]. With regard to the one-way timing model of Section 4, we opted here for restraining concurrency of post operations for simplifying the architecture of the glue. The present model (sender, receiver and one-way glue) can be compared to one of the infinite on-demand servers of the G/G/ ∞/∞ model of Section 4. Nevertheless, this model is sufficient for verifying Conditions (1) and (2) for successful DeX interactions. These conditions relate any post operation with an overlapping get operation; possible concurrency of post operations has no effect on this. Moreover, in the following sections we prove that these conditions are independent of the probability distributions characterizing the sender and receiver’s stochastic behavior.
Figure 8 shows the receiver behavior. Typically, a receiver entity repeatedly emits a get! action to the glue, with at most one get operation active at a time. The duration of the get operation is controlled by the receiver with a local time_on; upon the time_on, a get_end! action is sent to the glue. Before reaching the time_on, multiple messages (posted by senders) may be delivered to the receiver by the glue, each with a get_return? action. We have enhanced the receiver’s behavior with similar features as for the sender. Hence, we capture the events and time intervals presented in Fig. 5 with the get_event, get_end_event, get_on, get_off locations, as well as with the delta_get clock and the invariant conditions delta_get<=time_on (at get_on) and delta_get<=max_delta_get (at get_off). This setup results in a succession of δget-on and δget-off intervals, with the former lasting time_on time and the latter lasting a stochastic time uniformly distributed in the interval [time_on,max_delta_get]. We have additionally introduced the committed location no_trans, which, together with the Boolean variable get_ret, helps detecting whether the whole time_on period elapsed with no interaction performed or at least one message was received.
The glue one-way automaton is shown in Fig. 9. It determines the synchronization of the incoming post? and get? operations. A successful synchronization between such operations leads to a successful interaction, which is represented in the automaton by the trans_succ location. Note that the timing constraints specified in Section 4 regarding the lifetime of posted messages have been applied here with the additional clock delta_post_on employed to guard transitions dependent on the lifetime period. Two ways for reaching the trans_succ location are considered:
-
If the get? operation occurs from the initial location (leading to location glue_get), a consequent post? operation results in a get_return!message and eventually the successful interaction location trans_succ (Eq. 1). At the same time, the sender is notified of the end of the post operation with post_end!. Note that we employ the urgent location qualifier for glue_get_post; thus, the glue completes instantly the successful interaction and is ready for a new one. At the glue_get location, if the get_end? action is received from the receiver automaton (suggesting delta_get >= timeout), the glue is reset to the initial location glue_init.
-
If the post? operation occurs initially (leading to location glue_post), a get? operation before the constraint delta_post_on <= lifetime results again in a successful interaction (Eq. 2). Exceeding the lifetime period without any get? results in location trans_fail, and the automaton returns to its initial location glue_init, notifying at the same time the sender with post_end!. This is done without any delay, thanks to the invariant delta_post_on <= lifetime at the glue_post location.
5.1.1 Verification of properties
We verify reachability and safety properties of the combined automata DeX sender, DeX receiver and DeX glue one-way, by using the model checker of UPPAAL. A reachability property, specified in Uppaal as E<>φ, expresses that, starting at the initial state, a path exists such that the condition φ is eventually satisfied along that path. A safety property, specified in UPPAAL as A[]φ, expresses that the condition φ invariantly holds in all reachable states.
Sender Automaton. We verify a set of reachability and safety properties that characterize the timings of the sender’s stochastic behavior.
$$ \texttt{A[] sender.post\_event imply delta\_post==0} $$
(8)
$$ \texttt{A[] sender.post\_on imply delta\_post<=lifetime} $$
(9)
$$ \begin{aligned} \texttt{A[] sender.post\_off imply (delta\_post>=lifetime and}\\ \texttt{ delta\_post<=max\_delta\_post)} \end{aligned} $$
(10)
$$ \texttt{E<> sender.post\_end\_event and delta\_post< lifetime} $$
(11)
Equation 8 states that post events occur at time 0 captured by the delta_post clock. Equation 9 and 11 together state that [0,lifetime] is the maximum interval in which a post operation is active; nevertheless, the operation can end before lifetime is reached. Equation 10 states that [lifetime,max_delta_post] is the maximum interval in which there is no active post operation. This confirms the fact that we artificially “advance time” to lifetime at the end of the post operation.
Receiver Automaton. We verify similar properties that characterize the timings of the receiver’s stochastic behavior.
$$ \texttt{A[] receiver.get\_event imply delta\_get==0} $$
(12)
$$ \texttt{A[] receiver.get\_on imply delta\_get<=time\_on} $$
(13)
$$ \begin{aligned} \texttt{A[] receiver.get\_off imply (delta\_get>=time\_on and}\\ \texttt{ delta\_get<=max\_delta\_get)} \end{aligned} $$
(14)
$$ \texttt{A[] receiver.get\_end\_event imply delta\_get==time\_on} $$
(15)
Hence, Eq. 12 states that get events occur at time 0 captured by the delta_get clock. Equation 13 and 15 together state that a get operation precisely and invariantly terminates at the end of the [0,time_on] interval. Equation 14 states that [time_on,max_delta_get] is the maximum interval in which there is no active get operation.
Glue one-way Automaton. We verify conditions for successful interactions using the glue automaton.
$$ \begin{aligned} \texttt{A[] glue.trans\_succ imply (sender.post\_on and receiver.get\_on}\\ \texttt{ and (delta\_post==0 or delta\_get==0))} \end{aligned} $$
(16)
In addition to the reachability property (E <> glue.trans_succ), we verify the safety property in Eq. 16. According to this, a successful interaction event implies that while a post operation is active a get event occurs, or while a get operation is active a post event occurs.
$$ \begin{aligned} \texttt{A[] glue.trans\_fail imply (sender.post\_on and receiver.get\_off}\\ \texttt{ and delta\_post==lifetime and delta\_get-time\_on>=lifetime)} \end{aligned} $$
(17)
We verify both the reachability property (E <> glue.trans_fail) and the safety property in Eq. 17. A failed interaction event means that lifetime is reached for an active post operation and no get operation is active. Additionally, the ongoing inactive get interval entirely includes the terminating active post interval. With regard to the stochastic post and get processes of our specific setting, we explicitly checked that if the condition max_delta_get-time_on>=lifetime does not hold for the given values of the included constants, then the reachability property E<> glue.trans_fail is indeed not satisfied.
$$ \begin{aligned} \texttt{A[] receiver.no\_trans imply (receiver.get\_on and sender.post\_off}\\ \texttt{ and delta\_get==time\_on and delta\_post-lifetime>=time\_on)} \end{aligned} $$
(18)
We verify both the reachability property (E <> receiver.no_trans) and the safety property in Eq. 18. Symmetrically to Eq. 17, a no-interaction event implies that time_on is reached for an active get operation and no post operation is active. Additionally, the ongoing inactive post interval entirely includes the terminating active get interval. Similarly to Eq. 17, we check that if this safety property is not satisfied, then the state receiver.no_trans is indeed not reachable.
Observing Eqs. 16, 17, 18, we see (as intuitively expected) that successful, failed and no-interactions are determined by the durations and relative positions in time of the δpost-on,δpost-off,δget-on and δget-off intervals. These depend on the deterministic parameter constants lifetime, time_on and on the stochastic parameters δpost and δget. It is also worth nothing that Eqs. 16, 17, 18 are expressed in a general way, independently of the specific post and get stochastic processes. For example, Eq. 17 states that, to have a failed interaction, a lifetime period must be lower than the time_off period. But time_off is probabilistic. Therefore, to avoid failed interactions, a system designer can tune the system by changing (or trying to affect) these parameters accordingly, while they can employ any probability distribution for the disconnection parameter. Similarly, Eq. 18 provides the developer with hints of how to possibly avoid no-interactions. Hence, the analysis results of this section provide general formal conditions for successful DeX interactions and their reliance on observable and potentially tunable system and environment parameters. Using these results, we perform experiments to quantify the effect of varying these parameters for successful interactions in Section 6.
5.2 Analysis of two-Way synchronous interactions
We represent two-way synchronous DeX interactions with the connector roles DeX client, DeX server, and DeX two-way sync glue. We detail in the following the modeling of these components.
Figure 10 shows the client behavior. Typically, a client emits a post_req (request) to the glue and waits for timeout to receive the get_res (response). The committed location post_req_sent is introduced to detect the event of sending a request (post_req!) to the glue. Upon such an event, the automaton stays on the post_req_on location to either receive the response or until the timeout expires, which corresponds to the δpost-req-on interval presented in Fig. 6. Upon the timeout expiration or the get_res reception, the automaton stays in the post_req_off for δpost-req-off time period.
delta_post is a clock that controls the δpost interval between two successive post_req operations. delta_post is reset upon a new post_req operation and set to timeout upon a get_res? (prior to the timeout expiration). On the other hand, when the timeout period is reached, the delta_post clock is already set to timeout on the post_req_off location. We initialize delta_post at the beginning of the client’s execution (post_init location). The invariant condition delta_post<=max_delta_post (where max_delta_post is a constant) at the post_req_off location ensures that a new post_req operation will be initiated before the identified boundary.
Based on the above setup, the client sends at most one post_req operation active at a time. This request remains active (δpost-req-on interval) for timeout period (and then it expires) or less than timeout period (in case of successful interaction). In both cases, delta_post equals timeout at the end of the post_req operation. Hence, the immediately following δpost-off interval will last a stochastic time uniformly distributed in the interval [timeout,max_delta_post]. Such a model is sufficient for verifying the condition (Eq. 3) for successful DeX two-way sync interactions. This condition relates any post_req operation with an overlapping get operation, by taking also into account the deterministic parameter serve_time at the server side.
Figure 11 shows the server behavior. Typically, a server entity repeatedly becomes online (location get_on) to receive requests from the glue. Thus, the server automaton oscillates between the locations get_off and get_on. The get_event committed location is used to detect the online status of the server. It is worth noting that the server entity operates independently from the glue – i.e., it does not notify the glue when changing between the get_on and get_off locations. The automaton stays on the get_on location for a specific interval, which is controlled by the server with a local time_on; upon the time_on, the automaton returns to the get_off location. Similar to the client entity, the delta_get clock is used to measure the time_on interval and switch between the two locations (get_on and get_off). Furthermore, the invariant conditions delta_get<=time_on (at get_on) and delta_get<=max_delta_get (at get_off) guarantee the correct operation of our automaton.
Before reaching the time_on, multiple requests (posted by clients) may be delivered to the server by the glue, each with a get_req? action. Upon a get_req?, the automaton stays in the proc_req location for serve_time interval, which corresponds to the necessary time period for processing a request. We use the urgentres_event_1 and res_event_2 locations to detect successful responses through the post_res! action. Particularly, the res_event_1 location is reached only if the server is still online (delta_get<=time_on). However, while being in location proc_req, the server entity may become offline. For such case, the res_event_2 location is reached after serving the request (because of the invariant delta_serve_time<=serve_time), and then the automaton returns to the get_off location. Finally, the automaton returns to get_on or get_off locations upon a fail_to_s? action received by the glue, which corresponds to the request (post_req) expiration due to the timeout period.
This setup results in a succession of δget-req-on and δget-req-off intervals (see Fig. 6), with the former lasting time_on time and the latter lasting a stochastic time uniformly distributed in the interval [time_on,max_delta_get].
The glue two-way sync automaton is shown in Fig. 12. It determines the synchronization of the incoming (post_req? and post_res?) and outgoing (get_req!) operations. A successful synchronization between such operations leads to a successful interaction, which is represented in the automaton by the trans_succ location. Note that the timing constraints specified in Section 4 regarding the timeout of sent requests have been applied here with the additional clock delta_req_on employed to guard transitions dependent on the timeout period.
The trans_succ location is reached through the following operations: if the post_req? operation occurs from the initial location and the invariant delta_req_on <= timeout is satisfied, a consequent get_req! request is sent to the server (if the server automaton is on the location get_on). While the request is processed on the server side, the glue automaton waits for the reply. After the specified server_time a post_res? operation occurs to the glue and eventually the successful interaction location trans_succ (the Eq. 3 is satisfied). At the same time, the client is notified of the end of the post_req operation with get_res!. Note that we employ the get_req channel as urgent. In this way, upon a post_req? and if the server is online, the get_req! action occurs instantly, without any delay as indicated by the invariant delta_req_on<=timeout.
With regard to the timeout, time_on and serve_time parameters, we identify failed interactions in the glue trough the trans_fail_1 and trans_fail_2. Two ways for reaching the fail locations are considered:
-
If the post_req? operation occurs from the initial location and the server automaton is offline (stays on the get_off location) for a time period that leads to the timeout expiration (delta_req_on>=timeout), the trans_fail_1 location is reached. At the same time, the client is notified with fail_to_c! in order to move at the post_req_off location.
-
If the post_req? operation occurs from the initial location and the server automaton is online (stays on the get_on location), a consequent get_req! request is sent to the server. While the request is processed for serve_time, the timeout period may expire (delta_req_on>=timeout) and the trans_fail_2 location is reached. At the same time, the client is notified with fail_to_c! to move at the post_req_off location, and the server is notified with fail_to_s! to move either to get_on or to get_off locations, depending of the delta_get clock.
5.2.1 Verification of properties
We verify reachability (E<>φ) and safety (A[]φ) properties of the combined automata DeX client, DeX server and DeX two-way sync glue, by using the model checker of UPPAAL.
Client Automaton. We verify a set of safety properties that characterize the timings of the client’s stochastic behavior.
$$ \texttt{A[] client.post\_req\_sent imply delta\_post==0} $$
(19)
$$ \texttt{A[] client.post\_req\_sent imply delta\_post<=timeout} $$
(20)
$$ \begin{aligned} \texttt{A[] client.post\_req\_off imply (delta\_post>=timeout and}\\ \texttt{ delta\_post<=max\_delta\_post)} \end{aligned} $$
(21)
Equation 19 states that post_req events occur at time 0 captured by the delta_post clock. Equation 20 states that [0,timeout] is the maximum interval in which a post_req operation is active, nevertheless, the operation can end before timeout is reached. Equation 21 states that [timeout,max_delta_post] is the maximum interval in which there is no active post_req operation. Similar to the DeX sender automaton, we artificially “advance time” to timeout at the end of the post_req operation.
Server Automaton. We verify similar properties that characterize the timings of the server’s stochastic behavior.
$$ \texttt{A[] server.get\_event imply delta\_get==0} $$
(22)
$$ \texttt{A[] server.get\_on imply delta\_get<=time\_on} $$
(23)
$$ \begin{aligned} \texttt{A[] server.get\_off imply (delta\_get>=time\_on and}\\ \texttt{ delta\_get<=max\_delta\_get)} \end{aligned} $$
(24)
Equation 22 states that at the beginning of the server’s online period, the automaton passes from the location get_event at time 0 captured by the delta_get clock. Equation 23 states that the server stays online (at the location get_on) at least for time_on interval. Equation 24 states that [time_on,max_delta_get] is the maximum interval in which the server is offline (at the location get_off).
Glue two-way sync Automaton. Finally, we verify conditions for successful interactions using the glue automaton.
$$ {\displaystyle \begin{array}{c}\mathtt{A}\left[\right]\ \mathtt{glue}.\mathtt{trans}\_\mathtt{succ}\ \mathtt{imply}\ \Big(\mathtt{client}.\mathtt{post}\_\mathtt{req}\_\mathtt{on}\ \mathtt{and}\ \mathtt{delta}\_\mathtt{post}<=\mathtt{time}\mathtt{out}\\ {}\mathtt{and}\ \left(\mathtt{server}.\mathtt{res}\_\mathtt{event}\_\mathtt{1}\ \mathtt{or}\ \mathtt{server}.\mathtt{res}\_\mathtt{event}\_\mathtt{2}\right)\\ {}\mathtt{and}\ \mathtt{delta}\_\mathtt{get}<=\mathtt{time}\_\mathtt{on}+\mathtt{serve}\_\mathtt{time}\Big)\end{array}} $$
(25)
We verify both the reachability property (E <> glue.trans_succ) and the safety property in Eq. 25. According to this, a successful interaction event implies that while a post_req operation is active the timeout period is not reached. Additionally on the server side, one of the committed locations res_event_1 or res_event_2 is active and the condition delta_get<=time_on+serve_time holds.
$$ \begin{aligned} \texttt{A[] glue.trans\_fail\_1 imply (client.post\_req\_on and delta\_post==timeout}\\ \texttt{ and ((server.get\_off and delta\_get-time\_on>=timeout)}\\ \texttt{ or (server.get\_on and delta\_get==0)))} \end{aligned} $$
(26)
We verify both the reachability property (E <> glue.trans_fail_1) and the safety property in Eq. 26. A failed interaction event means that timeout is reached for an active post_req operation. Additionally, the request can not reach the server either because it is offline (get_off location) for time period greater of timeout (delta_get-time_on>=timeout), or due to the fact that the server automaton moved on to location get_on and at the same time the timeout period is reached.
$$ \begin{aligned} \texttt{A[] glue.trans\_fail\_2 imply (client.post\_req\_on and delta\_post==timeout}\\ \texttt{ and server.proc\_req and delta\_get<=serve\_time)} \end{aligned} $$
(27)
Upon an interaction if the above condition is not verified, it means that the request is processed at the server side. However, an additional failure can occur in location trans_fail_2 while the request is processed. In addition to the reachability property (E<> glue.trans_fail_2), we verify the safety property in Eq. 27. Such a failed interaction event means that timeout is reached for an active post_req operation. Additionally, the request is processed in location proc_req since the condition delta_get<=serve_time is valid.
Equations 25, 26 and 27 provide us with general formal conditions which can be utilized by system designers to tune timing parameters such as timeout, time_on and serve_time and achieve successful interactions.
5.3 Analysis of two-Way asynchronous and streaming interactions
Based on subsections 4.3 and 4.4, the time models for one-way interactions can be leveraged to model the timing behavior of two-way async and streaming interactions. Similarly, the Timed Automata models provided in this section be can be leveraged to derive general formal conditions for successful two-way async and streaming interactions. In particular, we leverage the glue one-way automaton shown in Fig. 9, to derive conditions similar to Eqs. 16, 17, 18 using two-async and streaming DeX operations based on Table 1. For example, in two-way async interactions, we can verify the safety property for successful request transmissions as follows:
$$ \begin{aligned} \texttt{A[] glue.trans\_succ imply (client.post\_req\_on and server.get\_on}\\ \texttt{ and (delta\_post\_req==0 or delta\_get==0))} \end{aligned} $$
(28)
According to the above, a successful request transmission implies that while a post_req operation is active a get event occurs at the server side, or while a get operation is active at the server side, a post_req event occurs. Similarly, formal conditions for successful/failed requests, responses, open stream requests and delivery of stream items can be derived. Such conditions can be leveraged by system designers to tune timing parameters.
5.4 Summary of verification outputs
In this section, we have provided a detailed view of timed automata modeling, verification and parameter tuning of one-way, two-way and streaming interactions. Having a unified framework for verification allows us to study the effect of timing delays, lifetime parameters and success rates of transmissions. This process does not need to be repeated by users of DeXM within unique deployment scenarios. The verified properties demonstrate that the get and post operations can be managed in a safe way via the timing constraints provided. While we have concentrated on safety and reachability properties composing the ordering of event operations, the models may be reused to verify other associated properties. This formalizes the notion of successful interactions within DeXM.