top of page

Bringing Privacy to AI – Leveraging DeAI, DePIN, PETs and Synthetic Data

  • Writer: Pete Harris, Principal, Lighthouse Partners, Inc.
    Pete Harris, Principal, Lighthouse Partners, Inc.
  • Dec 17, 2025
  • 7 min read

As Artificial Intelligence (AI) embeds itself into pretty much every aspect of life – both business and personal – so there is increasing concern relating to how it is impacting privacy, including ensuring that data used to train and run models can be governed and managed to maintain its confidentiality and usage rights, as required.


In some cases, data concerns are related to desires by individuals to keep their everyday information – their social security number, home address, financial details and purchase history – private to prevent fraud. Other individual information, such as personal health history is generally considered highly sensitive (and in some cases it is protected by law). Creatives, including bloggers, authors, artists, and musicians generally want to control how their intellectual property is used by others, and for what purpose. And businesses own vast quantities of commercially confidential data – including product plans, client records, financials, project reports and personnel information.


While individuals and businesses are looking to benefit from using AI tools to process their data, that processing can lead to data leakage, which occurs when sensitive, private, or proprietary information becomes accessible or exposed through the training, deployment, or usage of AI systems.

 

A generally cited best business practice is to not use confidential data as inputs for public AI chatbots, such as OpenAI’s ChatGPT. That’s because such inputs can become part of the data used for the AI’s training, potentially exposing secretive business information, such as client data or internal product details. In some cases, such exposures can lead to regulatory compliance issues. As examples, both JP Morgan and Samsung have experienced such data leeks when computer source code and meeting transcripts were fed into AI systems.

 

As a result of such concerns and restrictions, several different approaches have emerged to allow AI to leverage confidential data. These include:

 

Decentralized AI (DeAI)


A key aim of DeAI is that control over artificial intelligence is removed from large organizations and corporations (such as OpenAI, Meta and Google) and placed in the hands of individuals, smaller companies, or a community of users. Practically speaking that means installing server hardware locally and loading AI models (often open source) to run on it. Typically, servers are interconnected with others via the internet to form a private cluster that can run models more efficiently and share data inputs or outputs.

 

One company that is transforming the promise of DeAI into reality is PAI3, a US-based startup that offers what it calls Power Nodes. Each Power Node is a desktop “AI Data Center in a Box” that packs a considerable compute punch – containing a 14-core CPU, a 20-core GPU, 64GB of RAM and 5TB of disk storage. Pre-configured AI models include Llama and OpenAI OSS 4.0.

 

The PAI3 network is designed to allow privacy-conscious owners (including financial services, healthcare, and legal entities to run AI models on their own premises with sensitive data remaining encrypted and private.

 

PAI3’s Power Node – An AI Data Center in a Box


With a list price of $34K+, a Power Node is certainly an investment, but it’s one that should pay off when it is connected into a network and benefits from a Decentralized Physical Infrastructure Network (DePIN) economic model based on the $PAI3 token, which rewards network participants for their contributions. Power Node owners earn $PAI3 tokens in various ways. Simply operating a node, limited in supply to just 3,141 units, generates a fixed number of tokens over a set period. Ahead of the network’s token generation event, PAI3 estimates that a Power Node owner (500+ have been sold already) might expect to cover a substantial portion of the purchase price over 36 months from this basic token supply alone.

 

Once the PAI3 network is launched – scheduled for Q1 of 2026 – additional tokens are earned when owners contribute a node’s processing power to run AI workloads and inference for the network, and for contributing data for model training. Tokens are also earned by contributing models and AI agents for other network nodes to use.

 

The network’s Mainnet blockchain network will not only support the tokenization capability that underpins the DePIN approach by rewarding nodes for uptime, reliability, and capacity but also provides an audit function for services offered by Power Nodes.


Fundamentally, PAI3’s vision is to provide a platform that champions data privacy and supports ethical AI development through community governance based on decentralized ownership of AI infrastructure. 


Privacy Enhancing Technologies (PETs)


Privacy-enhancing technologies (PETs) have emerged as an important approach to safeguarding sensitive data while still allowing it to be used for business processes, analytics, and AI. While PETs generally leverage cryptography, they extend encryption functionality by enabling data to be processed while remaining protected. Examples of PETs that have already found practical AI uses include:

  • Zero-Knowledge Proofs – which enable one party to prove to another that a statement is true without revealing any information beyond the validity of the statement itself. Useful for identity or age verification without revealing other personal details.

  • Secure Multi-Party Computation – a technique that allows multiple parties to jointly compute a function or analyze combined datasets without revealing their individual inputs to one another. Used by companies to benchmark their financial performance against others.

  • Homomorphic Encryption – permitting computations to be performed directly on encrypted data without first decrypting it. The result remains encrypted and can only be decrypted by the data owner. A use case would be collaborating with an untrusted third party to perform some mathematical processing.


ZK Proofs Explained by The Glitch AI


Since AI models and agents require large amounts of data for training and inference, and often that data includes sensitive information, PETs are providing approaches to leverage this data effectively while upholding strong privacy guarantees.


Founded in 2021 and headquartered In Zug, Switzerland with a significant presence in New York City, Nillion has emerged as a leading startup in the PET space. Its blockchain-based ‘Blind Computer’ network enables applications to perform processing on sensitive data while the data remains fully encrypted throughout its lifecycle (storage, processing, and transfer).


Nillion leverages a set of technologies to support secure processing on encrypted data, including: (1) Secure Multi-Party Computation to split data into secret shares and distribute them across multiple independent network nodes, with no single node holding enough information to reconstruct the original data, making collusion ineffective; and (2) Fully Homomorphic Encryption to operate directly on encrypted data with no need to decrypt it.


The Blind Computer design males use of two network layers, one being an orchestration layer where private data is stored and processed, and a coordination layer for governance and payments, using the $NIL token. The coordination later currently uses the Cosmos blockchain, though a move to Ethereum is planned for 2026 to leverage its large developer community.


In essence, Nillion shifts the trust model for data used to power AI from relying on a central authority's promises to mathematical, cryptographic guarantees, ensuring user data remains private and secure by default.

 

Synthetic Data


Often considered as another PET approach, synthetic data fundamentally differs from techniques like Homographic Encryption in that it does not maintain data privacy by altering it. Instead, it mathematically mimics actual data but leaves out any actual identifiable information.


Synthetic data can be generated at scale and used for training AI models without introducing privacy risks. As such it is a useful approach not only to keep data private but also when original data is scarce. Relatedly, it can be used to create diverse datasets to reduce bias in model training.


Because of its flexibility, synthetic data is increasing in popularity, with global market size estimates as high as $3.7 billion in 2030. While major tech companies including Microsoft and IBM create and manage synthetic data, the current demand for it has fueled a significant rise in startups offering products.


One emerging player that’s creating a suite of tools for synthetic data is CUBIG Corp. Founded in 2021 in Seoul, South Korea, CUBIG plans to expand operations to Europe and the US in 2026, leveraging financial and operational backing from Iona Star, a UK-based venture firm with a focus on furthering data access for AI applications.


Using a technique known as Differential Privacy, which adds calibrated amounts of “noise” to datasets, the synthesized data that CUBIG generates for models preserves up to 99% of the utility of original data, but with zero traceability.



The Math of Differential Privacy. Source: differentialprivacy.org.


To support its global expansion, CUBIG has begun to offer its tools via the AWS Marketplace, beginning with its LLM Capsule, which allows organizations to feed synthetic data in real time into popular LLMs, such as ChatGPT, Anthropic’s Claude and Google’s Gemini. These Generative AI LLMs can then operate without the risk of exposure of private data sources.


Other products that CUBIG provides currently in its domestic market include its Azoo data marketplace, the DTS data generation system (which supports data augmentation, bias reduction, and data labelling), and SYNFLOW for data integration between organizations. Leveraging these tools, and others, CUBIG is looking to provide a broad range of synthetic data infrastructure services.


Delivering on Data Privacy for AI is a Team Sport


Companies looking to implement a robust data privacy framework to support their full AI needs are likely to adopt different approaches and solutions from several vendors, at least for the foreseeable future. Expect many vendor partnerships and integrations to emerge to fill such needs.


Already, partnerships such as one between Nillion and io.net – a DePIN service providing GPUs for AI processing – are rolling out to provide data privacy alongside compute scale with geographic reach.


Both training and inference workloads are likely to benefit, and in practice companies will be able to ramp up low-latency AI applications across countries and regions while offering total privacy for data assets to meet the requirements of different regulatory and compliance regimes and industries.

 

bottom of page