Conceptually, a tuple space can be seen as a shared memory object that provides operations to store and to retrieve ordered data sets, called tuples. Processes in a distributed system can then interact through this shared memory abstraction. A tuple is an ordered sequence of fields, where a field that contains a value is said to be defined. A tuple t where all the fields are defined is called entry (or tuple). A tuple \(\overline {t}\) is called template if any of its fields does not have a defined value. A tuple t and a template \(\overline {t}\) combine (or match) if, and only if, both has the same numbers of fields and all the values and types of the defined fields in \(\overline {t}\) are identical to the values and types of the corresponding fields in t. For example, a tuple 〈JISA,2017,SBC〉 combines/matches with the template 〈JISA,∗,∗〉 (’ ∗’ denotes a undefined field, called wildcard).
Process coordination through tuple spaces, introduced by the programming language LINDA for parallel systems [4], supports decoupled communications in space (processes do not need to know each other locations) and in time (processes do not need to be active at the same time). Besides that, this model of coordination provides some synchronization power.
Manipulations performed in tuple spaces consist in invocations of three basic operations [4]: out(t) that stores the tuple t in the space; \(in(\overline {t})\), that removes from the space a tuple that matches the template \(\overline {t}\); \(rd(\overline {t})\), used to read from the space a tuple that matches the template \(\overline {t}\), without removing it. Operations in and rd are blocking, i.e., if there is no tuple that matches the template in the space, the process gets blocked until one is available. A common extension to this model is the inclusion of non-blocking variants of these operations, denominated inpn and rdp. These operations work exactly like the previously, except by the fact that they return even if there is not a tuple that matches the template (indicating its nonexistence).
Another operation implemented in some tuple spaces (e.g., DEPSPACE [5]) is the \(cas(\overline {t}){t}\) (conditional atomic swap) [12, 13]. This operation works like an atomic execution of the code: if
\(\neg \ rdp(\overline {t})\)
then
out(t) (\(\overline {t}\) is a template and t an entry/tuple). The operation inserts t in the space iff \(rdp(\overline {t})\) does not return any tuple, i.e., if there is no tuple in the space that matches \(\overline {t}\); otherwise it returns a tuple that matches \(\overline {t}\).
Notice that according to the previous definitions, tuple spaces work as an associative memory: tuples/data are accessed by their contents, not by their addresses. Figure 1 illustrates the out(t), \(rdp(\overline {t})\) and \(inp(\overline {t})\) operations, showing the operation sent by the client, the servers replies and the final state of the tuple space.
2.1
DEPSPACE: A BFT coordination system
The DEPSPACE [5] system provides a Byzantine Fault-Tolerant (BFT) [14] coordination service based on the tuple space model. The following security and dependability attributes (or properties) are necessary for this model [1]: (1) reliability – the operations executed in the tuple space change its state according to their specification; (2) availability – the tuple space is always ready to execute the operations required by authorized parties; (3) integrity – no improper alteration of the tuple space can occur; (4) confidentiality – the content of tuple fields cannot be disclosed to unauthorized parties. With the goal of ensure these properties, DEPSPACE is built over a set of layers, each one responsible for the execution of a different functionality.
2.1.1
DEPSPACE layers
This section introduces the DEPSPACE layers emphasizing the confidentiality layer, which is responsible for the aspects related to this work. Figure 2 shows the layers and their location in the stack at both clients and servers.
Replication. To maintain consistency in the tuple space, DEPSPACE utilizes State Machine Replication [15, 16] as the bottom layer. This mechanism is related mainly with the properties of availability, integrity and confidentiality. Considering a system with n replicas/servers, it ensures that operations are executed according to their specification even if up to f=(n−1)/3 replicas are malicious (the correct replicas mask the behavior of the malicious ones). Through these protocols, the correct replicas execute the same sequence of operations and returns the same values, evolving in a synchronized way.
Confidentiality. Since tuples are maintained replicated in a set of servers, the provision of confidentiality (and privacy) must not be attributed to a single server because up to f of them could fail and expose the tuple contents to unauthorized parties.
Consequently, DEPSPACE implements confidentiality through the use of a (n,f+1)-Publicly Verifiable Secret Sharing (PVSS) [15] scheme. The clients, which represent the dealers in the scheme, generate a secret that they use to encrypt the tuples. Later, they generate a set of n shares of this secret and one different share is sent to each server. The secret can be recovered only with a combination of f+1 shares, what makes it impossible for a collusion of up to f malicious servers to expose the tuple contents.
As servers cannot access the tuple contents (since they are encrypted by the client), the protocol employs a fingerprint for the tuple, making it possible to implement and compute the matches between tuples and templates at the servers. The fingerprint is computed according to the type of each tuple fields, which can be classified as follows:
-
Public (PU): the field content itself is used as its fingerprint, i.e, no cryptographic method is applied to the field content and it remains exposed.
-
Comparable (CO): a hash of the field content is used as its fingerprint (using a collision resistant hash function), allowing servers to execute searches/matches in these type of fields while, at the same time, providing some level of security.
-
Private (PR): a special symbol (PR) is used as fingerprint of these fields. Although it provides a level of security higher than the CO classification, no information in this field is available at the servers to verify if a tuple matches a template.
Once it is not possible to send different versions of a request for different servers in the state machine replication approach (containing only its share of the secret used to encrypt the tuple), the client encrypts each share with a secret key shared with the server that will store it. Consequently, each server will access only its share and, as a malicious server does not have access to all shares, it can not restore and expose the tuples contents.
In a nutshell, a insertion operation (out) works as follows:
-
The client generates a secret s and encrypts the tuple using this secret.
-
The client uses the PVSS scheme to generate n shares of s.
-
The client encrypts each share with a secret key shared with each server (one share per server).
-
The client computes the fingerprint according to the fields classification.
-
The client uses the state machine replication protocol to send a request to the servers (in this protocol it must wait for f+1 replies to finish the request execution). The request contains the encrypted tuple, the encrypted shares, the proof that these shares are valid and the tuple fingerprint.
-
When a server executes this request, it only stores all received data and sends an acknowledge to the client as a reply.
On the other hand, the protocol to read/remove a tuple works as follows:
-
The client computes the fingerprint for the template according to the field classification. The fingerprint of a undefined field is the wildcard itself.
-
The client uses the state machine replication protocol to send a read/remove operation to the servers containing the generated fingerprint.
-
When a server executes this request, it chooses a tuple deterministically such that its fingerprint matches the received fingerprint (if it is a removal operation, this tuple is removed from the space). In case its share was not yet verified, it extracts its share and verify if this share is valid using the proofs received during the out operation. Afterward, the server replies to the client with the encrypted tuple, its encrypted share (to avoid eavesdropping on the replies), the tuple fingerprint and proofs that the share is valid.
-
The client waits for f+1 replies, decrypts the shares, verifies their validity and combines them to recover the secret s. Finally, the client decrypts the tuple using s.
-
The client verifies if the fingerprint it used is valid for the recovered tuple. If the fingerprint is valid, the operation is finished. Otherwise, a repair procedure is executed to remove invalid data from the space and the operation is repeated.
Notice that, according to the fingerprint definitions, searches are possible only in public and comparable fields, i.e., private fields cannot be used to verify if a tuple matches a template and are always used as undefined fields on the template. This limitation brings at least two consequences. On the one hand, a tuple with many private fields makes the search very restricted, losing the flexibility in the development of applications because a template with many undefined fields does not allow a fine-grained match at the servers. On the other hand, a tuple with many public and/or comparable fields is susceptible to many attacks, like correlation and preimage attacks.
Policy enforcement. This layer allows a fine-grained access policy execution [7] that takes into account three parameters (identifier of the invoker, operation and arguments, and the current tuples stored in the space) to decide if an operation is approved or denied. These policies are defined by the users and are loaded at the servers during the system setup.
Access control.
Access control is a fundamental mechanism to keep the integrity and confidentiality of information (tuples) stored in the DEPSPACE since it prevents unauthorized clients from getting access to the tuples. Moreover, this mechanism prevents malicious clients from saturating the tuple space by sending a lot of tuples. Currently, the DEPSPACE implements the access control based on credentials: for each tuple inserted in the DEPSPACE, a set of credentials are necessary to access it, both to read and to remove it from the space (access control at tuple level). These credentials are defined by the process that inserts the tuple. Moreover, it is possible to define which credentials are necessary to insert a tuple into the space (access control at space level) during its setup. The implementation of this functionality is realized through the association of access control lists to each tuple and space.
2.1.2 Security analysis
Below we briefly summarize some security definitions. According to [17], the attacks against the cryptographic schemes aim to obtain the plaintext or the decryption key through the following methods:
-
Ciphertext-only attack (COA): In this kind of attack, an adversary tries to obtain the decryption key or the plaintext only having the ciphertext at its disposal. This is the weaker type of attack and, therefore, a system vulnerable to this attack is considered insecure.
-
Known-plaintext attack (KPA): In this attack, the adversary has at its disposal a significant amount of plaintexts and their corresponding ciphertexts. Through the comparison of plaintexts and their corresponding ciphertexts, the adversary tries to discover the decryption key or to decrypt another ciphertext.
-
Chosen-plaintext attack (CPA): the adversary chooses a plaintext and receives the corresponding ciphertext for analysis, which may allow him/her o discover the plaintext corresponding to another ciphertext.
-
Adaptive chosen-plaintext attack (CPA2): This attack is similar to CPA, however the attacker can choose new plaintexts depending on the received answer.
-
Chosen-ciphertext attack (CCA) In this kind of attack, the adversary chooses a ciphertext and receives (without access to the decryption key) the corresponding plaintext. The adversary uses the analysis of this correlation to discover the plaintext corresponding to another ciphertext.
-
Adaptive chosen-ciphertext attack (CCA2): This attack is similar to the CCA, however the attacker can choose new ciphertexts depending on the received answer. This attack is considered very strong and much harder to implement.
The attacks above are presented in order of increasing complexity. A system vulnerable to a weak attack will be classified at a lower security level, even if it resists a stronger attack. Although these are the main attacks considered in the literature, many other attacks could be possible depending on the system characteristics. For example, in [3] the authors show that it is possible to perform inference attacks by means of correlation of the ciphertexts with additional public information. In this case, if there is a strong correlation between the encrypted and the public data, the plaintexts could be recovered with high accuracy. Considering encrypted databases of hospitals, [3] presented a study in which more than 60% of the data deterministically (Section 3.1) encrypted (e.g.: sex, race and mortality risk) could be discovered in 60% of the hospitals, while more than 80% of data encrypted with order preserving (Section 3.2) encryption (e.g.: age and disease severity level) were recovered in 95% of the hospitals.
As in practice it is impossible to achieve total security against these attacks for all the mathematically possible adversaries, a weaker security definition is necessary, taking into account only the computationally possible adversaries. In this context, a system is defined informally as semantically secure if it is able to resist, with high probability, to attacks performed by any adversary computationally efficient [18]. Based on the formal definitions of [19], we define informally that for any efficient adversary \(\mathcal {A}\), a cipher E=(E,D) defined over (K,M,C) offers:
-
Indistinguishability against chosen-plaintext attacks (IND-CPA): the cipher offers IND-CPA if for all attempts i=1,2,...q, given two messages \(m_{i0}, m_{i1}\in \mathcal {M}\) of the same size, chosen by \(\mathcal {A}\) and submitted to an oracle that answers with the ciphertext \(c_{i}=E(k, m_{ib}) \in \mathcal {C}\) for some key k selected randomly in \(\mathcal {K}\) and b∈{0,1}, the probability that \(\mathcal {A}\) can distinguish between c
i
=E(k,m
i0) or c
i
=E(k,m
i1) is negligible.
-
Indistinguishability against chosen-ciphertext attacks (IND-CCA): the cipher offers IND-CCA if, for the same conditions of the IND-CPA, the adversary \(\mathcal {A}\) also can get access to a oracle that given a ciphertext c
i
∉{c
1,...,c
i−1} answers with the corresponding plaintext m
i
=D(k,c
i
) and, in the same way, the probability that \(\mathcal {A}\) can distinguish between c
i
=E(k,m
i0) or c
i
=E(k,m
i1) is negligible. In this case, \(\mathcal {A}\) can make as many requests as it wants to the decryption oracle, however only until it has received the challenge ciphertext from the encryption oracle.
-
Indistinguishability against adaptive chosen-ciphertext attacks (IND-CCA2): the cipher offers IND-CCA2 if, besides the conditions established to the IND-CCA, the adversary can continue using the decryption oracle even after it had received the challenge cryptogram. The only restriction is that it is not allowed to submit this cryptogram for decryption.
Additionally, we have the following IND-CPA relaxations for both deterministic (Section 3.1) and order-preserving (Section 3.2) ciphers, respectively:
-
Indistinguishability against distinct chosen-plaintext attacks (IND-DCPA): the cipher \(\mathcal {E}\) offers IND-DCPA if it is deterministic and for all attempts i=1,2,...,q, given two messages \(m_{i0}, m_{i1}\in \mathcal {M}\) of the same size chosen by \(\mathcal {A}\), distinct for each attempt (∀i,j∈{1,2,...,q},m
i0≠m
j0 and m
i1≠m
j1), submitted to the oracle that answers with the ciphertext \(c_{i}=E(k, m_{ib}) \in \mathcal {C}\) for some key k selected randomly in \(\mathcal {K}\) and b∈{0,1}, the probability that \(\mathcal {A}\) can distinguish between c
i
=E(k,m
i0) or c
i
=E(k,m
i1) is negligible [19].
-
Indistinguishability against ordered chosen-plaintext attacks(IND-OCPA): the cipher \(\mathcal {E}\) offers IND-OCPA if it preserves the order between the plaintexts and for all attempts i=1,2,...,q, given two messages \(m_{i0}, m_{i1}\in \mathcal {M}\) of the same size, chosen by \(\mathcal {A}\) and submitted always in the same order (i.e., m
i0<m
j0 ⇔ m
i1<m
j1 for all 1≤i,j≤q) to an oracle that answers with the ciphertext \(c_{i}=E(k, m_{ib}) \in \mathcal {C}\) for any key k selected randomly in \(\mathcal {K}\) and b∈{0,1}, the probability that \(\mathcal {A}\) can distinguish between c
i
=E(k,m
i0) or c
i
=E(k,m
i1) is negligible [20].
Using these definitions, it is possible to highlight some vulnerabilities of DEPSPACE. The main focus for the investigation is the way the fingerprint is generated. In the following we discuss the vulnerabilities related with comparable and public fields classifications:
-
Comparable fields allow tuple selection/matches without servers knowing the field contents, but the use of hash functions makes the system vulnerable to collision and preimage attacks. In fact, an adversary is able to get a desired amount of inputs and their respective outputs by calculating their hashes. Consequently, if the set of values that a comparable field could assume is small and known, then the attacker can calculate the hashes for all possible values, learning the correspondence between plaintexts and ciphertexts. This attack is similar to the known-plaintext attack, except for the fact that in this case there is no encryption and decryption functions.
-
Public fields are not subject to a disclosure attack since their contents are already public. However, these fields could provide useful information to an attacker, which could correlate the encrypted tuple contents with a public database and execute an inference attack. Comparable fields also could be used for these attacks since their contents could be inferred.