Blogs

Search

Why the future of AI inferencing is at the edge

When it comes to AI inferencing, concerns over latency, cost and compliance have led many organizations to rethink their reliance on the cloud.

And when it comes to implementing and adapting IT strategies, the cloud was a game changer. After IT managers experienced the ease of spinning up servers and storage in the cloud, many never looked back.

Now, however, as AI assumes a greater role in the IT world, many organizations have discovered that the same cloud strategies that once drove innovation can just as easily hold it back. Instead, hybrid approaches that incorporate cloud and edge computing are growing in popularity.

 

Public cloud disadvantages

The disadvantages of a cloud-only-based approach for AI typically become evident during inferencing.

After training their models in the cloud using historical data, organizations often stay in the cloud for the next training phase, as new information is introduced to those models. When the goal is then to also inference, extracting insights for long-range decision-making or planning (and there is no real-time processing requirement), the continued reliance on cloud can work out just fine.

However, training and inference that depend on processing information in real time—video feeds for example or voice interactions—the advantages that the cloud offers are likely to be outweighed by the disadvantages of increased latency, high bandwidth costs, and privacy/compliance concerns.

Latency

Latency—the delay in getting data from point A to point B—is a challenge in any information system. Latency is caused by many things—the number of network hops, the type of switching equipment being used, the amount of caching that occurs—but ultimately distance is the primary contributor. For instance, if a data file that’s generated in Singapore needs to be processed by a cloud server in Dubai and then make the trip back to Singapore, there will be more latency than if the file was processed entirely by edge servers located in Singapore.

How much latency is tolerable depends on the application and the desired user experience. An extra few seconds of latency might be tolerable for populating an ecommerce web page. But if the user experience involves near instant interaction—such as sports betting, gaming, financial trading, or supporting driver-less vehicles—those extra seconds or even milliseconds involved in making the roundtrip to the cloud can render the application useless.

Bandwidth costs

For many AI applications—especially those which have a major video component–escalating bandwidth costs are also going to work against hyperscalers. For example, consider an application that involves video monitoring of industrial systems for predictive maintenance or quality control.

A thousand or more cameras operating 24/7, streaming raw video at 2 Mbps, will generate hundreds of terabytes of data every month. Sending that data to the cloud and back for AI inferencing will trigger monthly cloud egress costs amounting to tens of thousands of dollars. However, setting up a system to process that video locally on edge devices—with only anomalies transmitted to the cloud—can shrink those cloud processing costs dramatically.

Security and compliance

Security and regulatory compliance also make edge computing an attractive option for inferencing. Because so many AI applications involve highly private information—such as medical records or personal messages—transmitting that data across long distances to the cloud increases the vulnerability to interception, especially if encryption protocols are weak or not configured correctly. This is why many data privacy regulations, such as GDPR in Europe and PIPL in China, emphasize strict controls over cross-border data transfers and, in some cases, require sensitive data to remain within specific geographic boundaries.

 

Continuity, scalability and personalization

Many organizations make the shift to the edge to address latency, cost or security/compliance concerns and then discover a range of additional benefits, including:

Continuity

Edge solutions can be set up to operate in environments with limited or intermittent connectivity—with applications handling essential functions in offline mode and re-syncing when connectivity is restored. Many AI-based smart city applications work this way with edge nodes managing real-time operations and then providing cloud servers with global updates when connectivity is ideal.

Scalability

AI traffic patterns are often unpredictable with bursts of data e.g., from sensors, cameras, or IoT devices. Distributed edge servers can quickly handle these bursts without waiting for cloud resources. Tasks can be allocated dynamically across multiple servers, balancing the load during peak usage. And more servers can be added to the network based on demand.

Personalization

Processing data locally enables AI models to better adapt to specific user behaviors or regional characteristics. These might include customized product recommendations or language translation tuned to individual speech patterns.

 

Inferencing chips and federated learning

The rise of the edge for handling AI workloads is already well underway as evidenced by hardware trends and the move to federated learning:

Inference chips

There are now a number of hardware companies—both established players as well as newcomers—focused on developing specialized AI accelerators to compete in the inference market.

Federated learning

More organizations are adopting federated learning approaches specifically designed to facilitate AI model training and inferencing across decentralized devices without transferring raw data to a central server. The development of open-source frameworks such as TensorFlow Federated, PySyft and Flower are helping to facilitate this trend, supported by high-performance and resilient networks that ensure fast communication and data processing across distributed systems.

 

Zenlayer’s approach to distributed AI inferencing

Realistically, there is no one IT configuration that is right for every AI application. Both the cloud and the edge have their place. An obvious scenario—and one that Zenlayer encounters often with our customers—is a hybrid approach using the cloud for model training but then relying on hybrid edge/cloud approach for inferencing.

With over 350 Points of Presence across five continents, our primary goal at Zenlayer is to make possible the deployment of AI applications as close as possible to end-users while complementing your global network requirements

Our edge options include both dedicated bare metal servers (delivering the predictable performance latency-sensitive applications demand), virtual machines (ideal for light to moderate inferencing workloads), and edge GPUs for more intensive applications.

At the same time, Zenlayer provides direct, high-speed connections to major cloud providers like AWS, Microsoft Azure, Google Cloud, Alibaba Cloud, Tencent Cloud, and Oracle Cloud.

Tying everything together is Zenlayer’s hyperconnected network with its ultra-low latency and high-bandwidth connections.

By delivering instant access to computing resources on demand, the cloud made it possible to integrate technology ever more deeply into our lives. Now edge computing is poised to do something similar for AI, delivering the low latency performance that’s essential for bringing AI capabilities out of the model training phase and into the inferencing mainstream.

 

Streamline your AI infrastructure with Zenlayer

Zenlayer’s integrated service ecosystem and end-to-end solutions are designed to meet the evolving needs of your AI business, ensuring seamless operations, ultra-low latency connectivity, and high-performance GPU compute to power your AI builds.

Contact a Zenlayer AI expert for today to kickstart your project with a free consultation.

Share article :

Live Webinar on Jan 28: Edge Computing 201 – Innovations, Migration, & Use Cases