Team: NeuroVision
In the digital era, machine learning models are increasingly being applied in fields such as economics, education, and finance. However, the real-world value of these models is only realized when they are deployed and integrated into business and management processes. This integration is often where many companies and organizations face challenges in applying machine learning models effectively.
A key hurdle in the machine learning field is the gap between training a model and deploying it. This gap arises due to inconsistencies in data formats, the logic of variables (the input values for machine learning models), and a lack of continuity between deployment stages. These factors often lead to long deployment times and increase the likelihood of discrepancies between the model's development and its real-world performance, which can affect the accuracy of predictions. This is a major concern for data scientists working with machine learning models.
Recognizing these challenges, the NCB-CDS-AIML team has proposed a Cloud-Based Integrated Platform for Machine Learning Model Deployment, which includes two main features:
Feature serving - Centralized variable store
- Automates the calculation of millions of variables from raw data generated daily by business systems.
- Centralized management of variables and variable groups, with flexible configuration changes and logic customization.
- Stores variable sets in a centralized database, known as the "feature store," allowing easy sharing and reuse across different domains and use cases.
- Provides a list of variables through APIs, which can be customized and easily integrated with various systems or user applications.
Model serving - Centralized machine learning model results
- Provides comprehensive management of deployed machine learning model versions across the system.
- Integrates automatically with the Feature Serving system to fetch variable values as input for models, generating predictions based on machine learning algorithms.
- Offers a list of deployed machine learning models via APIs, allowing users or systems to select models as needed. The system then automatically processes and returns predictions based on the chosen models.
To meet the demands of real-time computation with large, asynchronous datasets, the team has selected and built upon several advanced technologies:
Real-Time data processing and variable calculation
- Feature Generation: Functions that automatically convert data and calculate variables using a variety of optimized algorithms.
- Oracle GoldenGate: A technology for real-time data synchronization from various business data sources to Kafka.
- Redis: Optimizes variable processing and calculation by using Redis for quick storage and response to requests.
Asynchronous data processing
- Data Connector: A function that connects, checks, and retrieves data from caches, APIs, or other data sources, with mechanisms to handle connections and processing as needed.
- Apache Kafka: Uses a pub/sub model to handle asynchronous data streams, accommodating data sources with varying response times.
Big data storage and processing
BigQuery: A tool for storing and processing large datasets, optimizing the execution of complex queries and calculations.
Integration and Query Management
- GraphQL: Utilizes the Strawberry schema to customize variable or model prediction queries according to user needs.
- FastAPI: Provides rapid integration solutions with other systems.
This platform utilizes a variety of optimized algorithms, from raw data processing to variable calculation, ensuring quick and accurate responses to all requests.
With its high degree of customization, wide applicability, and easy integration, the Cloud-Based integrated platform for machine learning model deployment can effectively support the deployment of machine learning models, optimizing resources in technology and AI across various societal sectors such as healthcare management, education, and security.