Close
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Applications
    • Applications
    • Big Data and Analytics
    • Cloud
    • IT Management

    Scaling Deep Learning/AI Applications: 4 Best Practices

    Derived from industry trends, practitioner literature, and codified MLOps guidelines, these four best practices will improve the process of deploying models in production.

    By
    eWEEK EDITORS
    -
    March 29, 2022
    Share
    Facebook
    Twitter
    Linkedin
      enterprise IT

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      A common lament among machine learning engineers is that, in general, putting machine learning (ML), deep learning (DL), and AI applications into production is problematic. Many engineers would attest that a large percentage – estimated at 90% – of ML models never see the light of the day in production.

      Let’s examine four best practices to prevent these problems. Derived from industry trends, practitioner literature, and codified MLOps best practices, these four best practices will improve the process of deploying models in production.

      Also see: Top AI Software 

      1) Get Your Data Story Sorted

      Data is complex and dirty. It’s complex because data varies in form and format: batch or streaming, structured, unstructured, or semi-structured data. It’s dirty because often, data has missing or erroneous values. No deep learning or AI model is valuable without voluminous, cleansed, and feature-engineered data.

      You need a robust data store to process the big data needed to build accurate models. A modern data lake stores both structured and unstructured data. A modern data lake has the following data properties:

      • Provides ACID transactions so multiple readers and writers can update and read data without contention or conflict, allowing concurrent ETL (extract, transform, and load) jobs and constant inserts and updates.
      • Stores data in an open format such as Parquet, JSON, or CSV, allowing popular deep learning frameworks to openly and efficiently access data.
      • Holds structured data as tables that can be versioned and adhere to a schema, allowing model reproducibility and accountability.
      • Offers fast and searchable access to the various kinds of data (labeled, images, videos, text, etc.), allowing machine learning engineers to fetch the right data sets for the right business problem.

      Modern Data Lake

      A modern data lake as a data store.

      Without a cohesive data store as the canonical reliable and cleansed data source to all your ML models for training and inference, you’ll stumble onto your first “data” pitfall. This is true no matter which deep learning framework you use — which brings us to the next consideration.

      Also see: DevOps, Low-Code and RPA:  Pros and Cons 

      2) Use Well-Adopted and Widely Used Frameworks

      Just as good data begets good models, so do widely adopted deep learning toolkits and frameworks. No one would dispute among the ML/DL practitioners that PyTorch, PyTorch Lightning, TensorFlow, HuggingFace, Horovod, XGBoost, or MXNet are popular frameworks for building scalable deep learning models. Aside from being widely used, they have become the go-to framework to build sophisticated deep learning and AI applications.

      Adopt these frameworks because:

      • They have a large supporting community, copious documentation, tutorials, and examples to learn from.
      • They support training and inference on modern accelerated hardware (CPUs, GPUs, etc.).
      • They are battle-tested in production (used by notable companies).
      • They are natively integrated with Ray, an emerging open-source general-purpose framework for building and scaling distributed AI applications.
      • They have Python API bindings, a de facto language for deep learning.
      • They have integrations with MLflow, KubeFlow, TFX, SageMaker, and Weights & Biases for model provisioning and deploying and orchestrating model pipelines in production.

      ML-DL Registries

      ML/DL Framework integrations with Ray and Model Registries.

      Not using the battle-proven frameworks well-integrated with popular open-source tools for model development cycle and management or distributed training at scale is a sure way to stumble onto another pitfall: wrong tools and frameworks. The need for a model development lifecycle leads us to why you want to track model training using a model store.

      Also see: Data Mining Techniques 

      3) Track Model Training with a Model Store

      The model development lifecycle paradigm is unlike the general software development lifecycle. The former is iterative, experimental, data and metric-driven. Several factors affect the model’s success in production:

      • Are the model’s measured metrics accurate? A model registry will track all the metrics and parameters during training and testing that you can peruse to evaluate.
      • Can I use more than one DL framework to obtain the best model? Today’s model registries support multiple frameworks to log models as artifacts for deployments.
      • Were the data sources reliable and sufficient in volume to represent a general sample? This is the successful outcome of getting the data story right.
      • Can we reproduce the model with the data used for training? The model registry can store meta-data as references to data tables in the data store. Access to the original training data and meta-data allows reproducing a model.
      • Can we track a model’s lineage of evolution — from development to deployment — for provenance and accountability? Model registries track versioning of each experiment run so that you can extract an exact version of the trained model with all its input and output parameters and metrics.

      Tracking experiments for models and persisting all aspects of its outcome — metrics, parameters, data tables, and artifacts — gives you the confidence to deploy the right and the best model into production. Not having a data-and-metric-driven model for production is another pitfall to avoid. But that’s not enough; observing models’ behavior and performance in production is as important as tracking experiments.

      Also see: Data Mining Techniques 

      4) Observe the Model in Production

      Model observability in production is imperative. It’s often an afterthought – but this is at your peril and pitfall. It should be foremost. ML and DL models can degrade or drift over time for a few reasons:

      • Data and model concept drift over time: Data is complex and never static. The data with which the model was trained may change over time. For example, a particular image trained for classification or segmentation might have additional features unaccounted for. As a result, the model concept of inference may drift, resulting in errors, drifting away from the ground truth. Such detection of drifts requires retraining and redeploying a model.
      • Models’ inferences fail over time: Model predictions fail over time, giving wrong predictions, resulting from the above data drift.
      • Systems degrade over a heavy load: Though you account for reliability and scalability as part of your data infrastructure, keeping vigilant health of your infrastructure – especially your dedicated model servers during spikes of heavy traffic – is vital.

      Also see: Tech Predictions for 2022: Cloud, Data, Cybersecurity, AI and More

      About the Author: 

      Robert Nishihara is the Co-Founder and CEO of Anyscale. He has a PhD in machine learning and distributed systems. 

      eWEEK EDITORS
      eWEEK EDITORS
      eWeek editors publish top thought leaders and leading experts in emerging technology across a wide variety of Enterprise B2B sectors. Our focus is providing actionable information for today’s technology decision makers.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      10 Best Artificial Intelligence (AI) 3D Generators

      Aminu Abdullahi - November 17, 2023 0
      AI 3D Generators are powerful tools for creating 3D models and animations. Discover the 10 best AI 3D Generators for 2023 and explore their features.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Applications

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×