• Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions
Flyy Tech
  • Home
  • Apple
  • Applications
    • Computers
    • Laptop
    • Microsoft
  • Security
  • Smartphone
  • Gaming
  • Entertainment
    • Literature
    • Cooking
    • Fitness
    • lifestyle
    • Music
    • Nature
    • Podcasts
    • Travel
    • Vlogs
  • Camera
  • Audio
No Result
View All Result
  • Home
  • Apple
  • Applications
    • Computers
    • Laptop
    • Microsoft
  • Security
  • Smartphone
  • Gaming
  • Entertainment
    • Literature
    • Cooking
    • Fitness
    • lifestyle
    • Music
    • Nature
    • Podcasts
    • Travel
    • Vlogs
  • Camera
  • Audio
No Result
View All Result
Flyy Tech
No Result
View All Result

BigDL Privacy Preserving Machine Learning with Occlum OSS on Azure Confidential Computing

flyytech by flyytech
November 14, 2022
Home Microsoft
Share on FacebookShare on Twitter


 

Typical security measures may assist data at rest and in transit but can fall short of fully protecting data while it is actively used in memory. Intel® Software Guard Extensions (Intel® SGX) provides a protective hardware environment to secure data used in memory. For confidential computing, users can create Virtual Machines (VMs) with Intel® SGX to secure their applications during computation. However, building an end-to-end confidential computing application is not only knowledge intensive, but also requires a sound understanding of the application, Intel® SGX, and other security components.

 

This blog introduces you to a confidential computing solution for Privacy-Preserving Machine Learning (PPML) made available by Occlum and BigDL on the Azure cloud. This blog demonstrates the solution using a sample analytics application built for the NYTaxi dataset. This sample application leverages Azure confidential computing (ACC) components such as SGX Nodes for Azure Kubernetes Service (AKS), Microsoft Azure Attestation, Azure Key Vault (AKV), etc., as well as Occlum LibOS and BigDL PPML.

 

 

Let’s first review a typical PPML workflow on Kubernetes cluster as illustrated below. The solution on Azure cloud is built by applying the same workflow on Azure cloud using ACC components.    

 

image 1.png 

 

Users can follow the steps in the diagram above to walkthrough the PPML flow:   

  1. User submits job to K8s (Kubernetes) and creates the driver node
  2. Client attests Attestation Service and submit policy
  3. Driver initiates additional Executor nodes
  4. Driver and Executor nodes attest with Attestation Service
  5. Driver and Executor request keys from KMS (Key Management Service)
  6. Executors read and decrypt input data
  7. Executors run distributed Big Data, ML and DL programs
  8. Executors encrypt and write output data

 

Now, let’s apply this same workflow on the Azure cloud. The diagram below illustrates the Azure PPML solution built with ACC components, Occlum LibOS and BigDL PPML.  

 

image 2.png

In this Azure PPML solution, the User Application is a Spark application that can be written in Scala or Java. For our sample application, it’s a simple Spark application for querying the NYTaxi dataset.  In this case, the Spark Driver takes the Spark jobs submitted by the Spark Client, distributes and schedules the work across the Spark Executors, and responds to the User Application. The Spark Executors execute the code assigned to them and report the state of the computation back to the Spark Driver

 

Intel® BigDL, which is the core enabler behind the end-to-end and distributed AI processing, works together with the Occlum LibOS to enable the User Application, Spark Driver and Spark Executors to run on a SGX-enabled AKS cluster. The Azure Attestation is used to fulfill the attestation process; while the Azure Data Lake Storge is used to host the data to be processed, and the Azure Key Vault can be KMS in end-to-end workflow.

 

Deployment

We use a sample NYTaxi dataset analytics application to demonstrate the PPML deployment procedure on ACC cluster. Following are the steps to deploy the solution:

 

image 3.png 

Step 0: Deploy the Azure cloud services

  • Create the AKS cluster with Intel® SGX
  • Create an Azure storage account and upload data to the account. The NYTaxi in this example, is 50GB Dataset containing 1.5 billion records. The NYTaxi dataset is pre-existing on the Microsoft public Azure Storage Account here.
  • Set up Microsoft Azure Attestation Service. In this example, we use the default Microsoft Azure Attestation Service Provider and the default policies.
  • Create an Azure Linux VM: download, extract, and install Spark Client 3.1.2 on the VM. It doesn’t need to be Azure confidential computing VM.
  •  Then install OpenJDK-8 and export the SPARK_HOME=${Spark_Binary_dir}

Note that NYTaxi data is not encrypted on storage, so secret key provisioning is not included in this demo. In real-world deployment, it’s recommended to encrypt data on storage (encyrption at rest), then set up key management service (e.g., Azure Key Vault) and secret key provisioning in deployment.

 

Step 1: Build the sample application

Create NYTaxi query sample application using standard Spark SQL with Azure Storage data source. 

 

Step 2: Submit job to the AKS Cluster

On the Azure VM, submit NYTaxi query on AKS by:

        git clone https://github.com/intel-analytics/BigDL-PPML-Azure-Occlum-Example.git

  • Configure the environment variables in the run_nytaxi_k8s.sh, driver.yaml, and executor.yaml files. 
  • Configure the AKS address and submit the Nytaxi query task by running the run_nytaxi_k8s.sh.

bash run_nytaxi_k8s.sh

 

Step 3: execute the job on the AKS Cluster

The job is executed on the AKS cluster: in Spark driver/executor pod, Micrsoft Azure Attestation Service runs the attestation process to verify the trustworthiness of the platform and the integrity of the binaries running inside it.  Upon completion of the attestation process, Spark executors will then run the data analysis with the Spark SQL query.

 

Step 4: Review the results

You should get a NYTaxi dataframe count and aggregation duration upon successful completion.

 

Performance data

To evaluate the performance of this solution, we make a simple benchmark based on our sample application. The benchmark runs on SGX environment and Non-SGX environment to give an intuitive performance compare.

 

Scenarios:

Scenario

Description

No Intel SGX

The driver and the executors are running without SGX support using regular Spark image (vanilla Spark).

Occlum

The driver and the executors are encrypted and run on Intel CPUs with SGX support using BigDL Occlum image.

 

Cluster info:

  • The cluster consist of 4 Standard_DC8ds_v3 nodes, same for two scenarios.
  • There is one spark driver plus several (1, 2 and 3) executors for benchmarking.
  • Each spark driver/executor is running on different cluster node.
  • All Pods (driver or executor) have the same CPU (4 cores) and memory requests/limits (8GB for No SGX, 8GB EPC for Occlum).

 

Results:spark_benchmark.jpg

We run the benchmark with executors number 1, 2 and 3 for multiple times, and put the average duration time to the chart above.

The run time of the sample appilcaiton consists of Initialization Time and Execution Time. For this specific sample application, the Execution Time of BigDL PPML on Occlum is 130% of vanilla Spark when running on 1 executor, and reduced to 116% of vanilla Spark when running on 3 executors. That indicates BigDL PPML on Occlum has very limited performance impact (at most 30 %) to existing Spark applications, and this performance overhead will reduce when adding more executors.

The Initialization Time is considered a fixed time for this SGX environment, it takes around 50 seconds regardless of running on 1 executor or 3 executors.  This Initialization Time is related to SGX enclave size. The larger enclave is used, the longer time it will need to initialization. In near future, Initialization Time will be greatly reduced by SGX Enclave Dynamic Memory Management (EDMM). For real-world Big Data or AI applications, when the execution time is longer, the performance impact introduced by the initialization time will be reduced. 

 

 

These key components have been leveraged to build the end-to-end confidential computing workflow.

 

Azure Cloud Services:

 

Intel® SGX

 

Intel® SGX helps protect data in use via application isolation technology. By protecting selected code and data from modification, developers can partition their applications into hardened enclaves or trusted execution modules to help increase application security.

 

image 5.png

Occlum

 

Occlum is a memory-safe, multi-process library OS (LibOS) for Intel SGX. As a LibOS, it enables legacy applications to run on Intel® SGX with little to no modifications of source code, thus protecting the confidentiality and integrity of user workloads transparently.

Here is the high-level overview of Occlum.

 

image 6.png

 

Occlum also has a unique “Occlum -> init ->application” boot flow. Generally, all operations which are required but not part of the application, such as remote attestation, could be put into the “init” process. This feature makes Occlum highly compatible with any remote attestation solution without involving application change. For example, to support Azure Attestation, Occlum provides the below boot flow.

 

image 7.png

This design offload the remote attestation burden from the application.  For more details, please refer to the Occlum  MAA init demo and the Occlum GitHub repo.

 

BigDL PPML

 

BigDL PPML provides a distributed platform for securing and protecting the end-to-end Big Data AI pipeline including data ingestion, data analysis, machine learning, and deep learning. In addition, it extends the single-node Trusted Execution Environment (TEE) to a Trusted Cluster Environment and allow unmodified Big Data analysis and ML/DL programs to run securely on a private or public cloud. The diagram and tasks below show the work behind BigDL PPML:

  • Computation and memory protected by Intel® SGX Enclaves
  • Network communication is protected by remote attestation and Transport Layer Security (TLS)
  • Storage (e.g., data and model) protected by encryption
  • Optional Federated Learning support

 

image 8.png

Please refer to the BigDL github repository and document site for more details.

  1. Deploying Confidential computing Intel SGX VM Nodes with AKS
  2. https://github.com/intel-analytics/BigDL-PPML-Azure-Occlum-Example
  3. https://www.intel.com/content/www/us/en/developer/tools/software-guard-extensions/overview.html
  4. https://www.databricks.com/glossary/what-are-spark-applications
  5. https://github.com/occlum/occlum
  6. https://github.com/intel-analytics/BigDL
  7. https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow
  8. https://azure.microsoft.com/en-us/services/storage/data-lake-storage/
  9. https://azure.microsoft.com/en-us/services/key-vault/
  10. https://azure.microsoft.com/en-us/services/azure-attestation/
  11. https://github.com/Azure-Samples/confidential-container-samples/blob/main/confidential-big-data-spar…
  12. https://bigdl.readthedocs.io/en/latest/doc/PPML/Overview/ppml.html

 



Source_link

flyytech

flyytech

Next Post
Twitter’s CISO Takes Off, Leaving Security an Open Question

Twitter's CISO Takes Off, Leaving Security an Open Question

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended.

Seized Genesis malware market’s infostealers infected 1.5 million computers

Seized Genesis malware market’s infostealers infected 1.5 million computers

April 7, 2023
Daily Authority: 📱 Android 14’s first preview has landed

Daily Authority: 📱 Android 14’s first preview has landed

February 9, 2023

Trending.

Shop now. Pay later. on the App Store

Shop now. Pay later. on the App Store

February 25, 2023
Thermalright Peerless Assassin 120 SE Review: Incredible, Affordable Air Cooling Performance

Thermalright Peerless Assassin 120 SE Review: Incredible, Affordable Air Cooling Performance

September 27, 2022
Volla Phone 22 review

Volla Phone 22 review

March 26, 2023
USIU student team qualifies for Microsoft Imagine Cup World Championship

USIU student team qualifies for Microsoft Imagine Cup World Championship

April 5, 2023
Light Lens Lab 50mm f/2 Review: The Classic Speed Panchro II Reborn

Light Lens Lab 50mm f/2 Review: The Classic Speed Panchro II Reborn

March 22, 2023

Flyy Tech

Welcome to Flyy Tech The goal of Flyy Tech is to give you the absolute best news sources for any topic! Our topics are carefully curated and constantly updated as we know the web moves fast so we try to as well.

Follow Us

Categories

  • Apple
  • Applications
  • Audio
  • Camera
  • Computers
  • Cooking
  • Entertainment
  • Fitness
  • Gaming
  • Laptop
  • lifestyle
  • Literature
  • Microsoft
  • Music
  • Podcasts
  • Review
  • Security
  • Smartphone
  • Travel
  • Uncategorized
  • Vlogs

Site Links

  • Home
  • About Us
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

Recent News

Pharmaceutical Giant Eisai Hit By Ransomware Incident

Pharmaceutical Giant Eisai Hit By Ransomware Incident

June 9, 2023
Diablo 4 – How To Beat The Wandering Death World Boss

Diablo 4 – How To Beat The Wandering Death World Boss

June 9, 2023

Copyright © 2022 Flyytech.com | All Rights Reserved.

No Result
View All Result
  • Home
  • Apple
  • Applications
    • Computers
    • Laptop
    • Microsoft
  • Security
  • Smartphone
  • Gaming
  • Entertainment
    • Literature
    • Cooking
    • Fitness
    • lifestyle
    • Music
    • Nature
    • Podcasts
    • Travel
    • Vlogs

Copyright © 2022 Flyytech.com | All Rights Reserved.

What Are Cookies
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
Cookie SettingsAccept All
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
CookieDurationDescription
cookielawinfo-checkbox-analytics11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional11 monthsThe cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance11 monthsThis cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy11 monthsThe cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Functional
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytics
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Others
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
SAVE & ACCEPT