Secure Data Computation

Digitalization has brought us many conveniences but also turned us into "gold mines" of others, exploited recklessly by monopolistic organizations for massive returns.

Meanwhile, as businesses conduct more activities, different organizations have accumulated a large amount of data, forming their own "data lakes".

Within such a context, how to make these "data lakes" flow and become "living water" while being regulated has brought issues and challenges to computing technology.

Over recent years, along with the rapid advancement in the relevant fields, privacy-preserving computation technology has become an essential means to meet this demand.

In the future, the technology will become the infrastructure of the digital age and promote data flow while safeguarding data privacy and security.

This is why Datum has provided a series of application-oriented, out-of-the-box privacy-preserving computing capabilities based on privacy-enhancing technologies such as PSI and MPC (both provided by the privacy-preserving AI framework Rosetta), HE, etc.

Specifically, Datum has built a decentralized off-chain computation network through decentralized scheduling and on-chain financial incentives, and fully supports the following types of data computation:

Users can execute the relevant functions by calling APIs in a clear, concise manner, without having to know too much about the execution details of different cryptographic algorithms. This makes the platform more accessible, solves the risks and concerns of traditional data sharing, and enables more smooth data collaboration and flows.

Outsourcing Computation

Homomorphic encryption technology is adopted to encrypt the data to be sent to a third-party computing node for outsourced computation. The computation is conducted based on the ciphertext data, which discloses any information about the raw data.

The above figure shows that Datum uses homomorphic encryption computation to fully decouple the data party and the computing party, which fundamentally solves the problem of confidentiality when entrusting data and data operations to third parties.

Data Cooperation

Let's consider the following case:

Company A is an automobile vendor and has a list of potential clients. It wants to answer one question: Who can afford this new car model?

To answer the question, a simple method is to find out who has enough money.

Obviously, banks will not tell you the amount of the customer's deposit directly. But they might be willing to offer you a list of clients that fall within a certain income class (the middle class, for example).

This allows Company A to determine existing clients that are on the list provided by banks, i.e., those who can afford the new car model.

In this process, no participant disclosed any private data except for the necessary information, which forms a Private set intersection (PSI).

Similarly, many real-world problems that relate to privacy-preserving can be expressed as follows:

Datum fully supports this paradigm. Based on the MPC algorithm protocol, the platform achieves different types of multi-party computation, covering joint query, joint match, and joint stat & analysis, without disclosing any additional information of the parties involved, and provides data services for the chain through Data Oracle.

Joint Machine Learning

More complex applications may require automated modeling based on data to cope with unknown events efficiently.

All sorts of technologies have been developed based on methods used to maintain data security. For instance, the security of data transmission and computation is guaranteed by adopting cryptographic technologies such as MPC or hardware-enabled methods like TEE. In the case of federated learning, users don't even have to transmit the raw data but only need to transfer the incremental updates of the model.

Datum provides a unified call interface for privacy-preserving AI computation. By integrating mainstream privacy-preserving computation technologies, the interface achieves a wide range of private data operations, including AI algorithms such as feature engineering, machine learning, and deep learning, which enables multiple parties to jointly train AI models with great efficiency. Moreover, Datum provides on-chain prediction and reasoning services through Data Oracle.

Note:

  1. The fact that the training methods in the figure above include MPC, TEE, and federated learning does not indicate that the training is centralized. Instead, it shows that the APIs are fully integrated.

  2. The trained model may be held by multiple parties, depending on the requirements and training methods.

Please refer to APIs for Secure Data Computation for more instructions

Terms:

  • Private Set Intersectionopen in new window (PSI) allows two parties holding raw data to compute the intersection of their data without disclosing any other information about the raw data. It supports the Diffie-Hellman (DH) algorithm and the Homomorphic Encryption (HE) algorithm.

  • Privacy Label Query uses the label-based technology of Labeled Private Set Intersection and allows the Requester to obtain the intersection of data between himself and the Provider, as well as the Label data that corresponds to the intersection, while making sure that the Requester does not disclose the query content and the Provider does not disclose any other data. The function supports the Diffie-Hellman (DH) algorithm and the Homomorphic Encryption (HE) algorithm.

  • Secure Multi-Party Computationopen in new window (MPC) allows a group of distrusting parties to jointly compute a function of their inputs and obtain the correct result while keeping those inputs private. It requires that each participant cannot obtain any private information other than its own data (and the computation result) and the expected disclosure by other participants.

  • Homomorphic Encryptionopen in new window (HE) is a form of encryption that permits users to perform computations on its encrypted data without decrypting. These resulting computations, when decrypted, are identical to the computation result of the plaintext data.

  • Privacy-preserving AI Framework Rosetta based on TensorFlow carries and combines three typical technologies: AI, privacy-preserving AI, and blockchain. The framework is more accessible to AI developers. With Rosetta, AI developers can convert existing AI codes into programs with privacy protection functions by changing only two or three lines of code without having to be an expert on privacy-preserving AI technologies.

  • Federated_Learningopen in new window (FL) is a machine learning technique that trains an algorithm across multiple decentralized edge devices or servers holding local data samples, without exchanging them.

  • Trusted Execution Environmentopen in new window (TEE) is an area of the CPU of mobile devices (smartphones, tablets, smart televisions, etc.) that offers a more secure space for the execution of data and code while ensuring their confidentiality and integrity.