Close
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Big Data and Analytics
    • IT Management

    NVIDIA Unveils TensorRT8 to Accelerate AI Inferencing

    The latest generation of NVIDIA AI software is aimed at improving chatbots, search and recommendations.

    By
    Zeus Kerravala
    -
    July 20, 2021
    Share
    Facebook
    Twitter
    Linkedin
      Artificial intelligence
      Artificial intelligence

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      On July 20, NVIDIA launched TensorRT 8, a software development kit (SDK) designed to help companies build smarter, more interactive language apps from cloud to edge. The latest version of the SDK is available for free to members of NVIDIA’s developer program. Plug-ins, parsers, and samples are also available to developers from the TensorRT GitHub repository.

      TensorRT 8 features the latest innovations in deep learning inference or the process of applying knowledge from a trained neural network model to understand how the data affects the response. TensorRT 8 cuts inference time in half for language queries using two key features:

      • Sparsity is a new performance technique in the NVIDIA Ampere architecture graphics processing units (GPUs), which increases efficiency for developers by diminishing computational operations. Not all parts of a deep learning model are equally important and some can be turned down to zero. Therefore, computations don’t need to be performed on those particular “weights” or parameters within a neural network. Using sparsity within GPUs, NVIDIA is able to turn down nearly half of the weights on certain models for improved performance, throughput, and latency.
      • Quantization allows developers to use trained models to run inference in eight-bit computations (known as INT8), which significantly reduces compute and storage for inference on Tensor Cores. INT8 has grown in popularity for optimizing machine learning frameworks like TensorFlow and NVIDIA’s TensorRT because it reduces memory and computing requirements. By applying this technique, NVIDIA is able to retain accuracy while offering exceedingly high performance in TensorRT 8.

      TensorRT is widely deployed across many industries

      Over the past five years, developers in industries spanning healthcare, automotive, financial services, and retail, have downloaded TensorRT nearly 2.5 million times.

      For example, GE Healthcare is using Tensor RT to power its cardiovascular ultrasound systems. The digital diagnostics solutions provider implemented automated cardiac view detection on its Vivid E95 scanner, accelerated with TensorRT. With an improved view detection algorithm, cardiologists can make more accurate diagnosis and identify diseases in early stages. Other companies using TensorRT include Verizon, Ford, the US Postal Service, American Express and other large brands.

      What NVIDIA also introduced in TensorRT 8 is a flexible set of compiler optimizations that provide twice the performance of TensorRT 7, irrespective of the transformer model a company is using. TensorRT 8 is able to run BERT-Large—a widely used transformer-based model—in 1.2 milliseconds, which means companies can double or triple their model size for greater accuracy.

      There are numerous inference services that are using language models like BERT-Large behind the scenes. However, language-based apps typically don’t understand nuance or emotion, which creates a subpar experience across the board. With TensorRT 8, companies can now deploy an entire workflow within a millisecond. These advancements could enable a new generation of conversational AI apps that offer a smarter, low latency experience to users.

      “This is a huge improvement beyond what we have ever delivered in the past,” said Sharma. “We look forward to seeing how developers are going to use TensorRT 8.”

      Real Time Apps with AI

      Real-time applications that use artificial intelligence (AI) like chatbots are on the rise. But as AI gets smarter and better at delivering new kinds of services, it gets more complicated and more difficult to compute. This creates some challenges for those building AI based services.

      Today’s developers must make hard choices across different parameters when dealing with complex AI models. There could be hundreds of models served in the data center, all running together within just a few milliseconds.

      “This is one of the biggest challenges in deploying AI apps today. How do you maximize or retain the amount of accuracy that you train with and then offer it to your customers with the least amount of latency?” said Siddharth Sharma, NVIDIA’s head of product marketing for AI software, during a news briefing.

      AI has the potential to have the biggest, transformative effect on society since the birth of the Internet. AI success is dependent on the quality of models and the speed of execution. NVIDIA’s GPUS are widely regarded as the best silicon to execute AI processing but its software, such as TensorRT, is equally important in making AI mainstream and usable in everyday life.

      Zeus Kerravala
      Zeus Kerravala
      https://zkresearch.com/
      Zeus Kerravala is an eWEEK regular contributor and the founder and principal analyst with ZK Research. He spent 10 years at Yankee Group and prior to that held a number of corporate IT positions. Kerravala is considered one of the top 10 IT analysts in the world by Apollo Research, which evaluated 3,960 technology analysts and their individual press coverage metrics.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      10 Best Artificial Intelligence (AI) 3D Generators

      Aminu Abdullahi - November 17, 2023 0
      AI 3D Generators are powerful tools for creating 3D models and animations. Discover the 10 best AI 3D Generators for 2023 and explore their features.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Applications

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×