Dataiku

Job Description:

Headquartered in New York City, Dataiku was founded in Paris in 2013 and achieved unicorn status in 2019. Now, more than 1,000+ employees work across the globe in our offices and remotely. Backed by a renowned set of investors and partners including CapitalG, Tiger Global, and ICONIQ Growth, we’ve set out to build the future of AI.

Internship goal

Enhance the Data Drift computation feature to tackle client’s production use cases : data drift on big data, drifting images, drift on LLM, etc.

Detailed description

Once deployed in production, AI models need to be monitored in order to ensure their consistency over time. In Dataiku, such monitoring is done through the Evaluation Recipe.

In the Evaluation, we compute :

Drift metrics : Is the input data distribution different from the data distribution of the training dataset? What about the data distribution of the prediction? Which feature is drifting? How so? Drift metrics quantify those questionings.

Performance metrics : Given the ground truth of a prediction, we can score the model and have metrics such as Accuracy, F1 Score, Precision, etc.

In a real production environment, clients are most likely not to have the ground truth. Therefore, only the drift metrics will be computed. Those metrics are then essential in MLOps.

As of today, drift computation is almost always made on samples of the training and production data. However, the samples might not be representative of the data and you can end up with fake results. On the other hand, taking all the data to do the computation might be way too expensive in terms of computation time. We need to find a smart solution to tackle this : eg. adaptive sampling, metric to evaluate the relevance of a sample, allow drift on whole data and disable some model evaluation capabilities, etc.

Also, only numerical and categorical data are supported for drift computation. Other feature types become important such as images for DeepLearning models, text for LLMs, etc. Their support is needed.

During this internship, you will :

Get familiar with Dataiku and the Evaluation code base

Research of a way to make efficient drift computation on big data

Suggest ways to integrate this research to the existing evaluation feature

Develop the suggested feature, implementing both frontend and backend

Eventually enhance data drift on other feature types : image, text, etc.

Stack

Java/Spring

AngularJS, HTML/CSS

Python

Internship goal

Enhance the Data Drift computation feature to tackle client’s production use cases : data drift on big data, drifting images, drift on LLM, etc.

Detailed description

Once deployed in production, AI models need to be monitored in order to ensure their consistency over time. In Dataiku, such monitoring is done through the Evaluation Recipe.

In the Evaluation, we compute :

Performance metrics : Given the ground truth of a prediction, we can score the model and have metrics such as Accuracy, F1 Score, Precision, etc.

In a real production environment, clients are most likely not to have the ground truth. Therefore, only the drift metrics will be computed. Those metrics are then essential in MLOps.

During this internship, you will :

Get familiar with Dataiku and the Evaluation code base

Research of a way to make efficient drift computation on big data

Suggest ways to integrate this research to the existing evaluation feature

Develop the suggested feature, implementing both frontend and backend

Eventually enhance data drift on other feature types : image, text, etc.

Stack

Java/Spring

AngularJS, HTML/CSS

Python

Alert me to jobs like this

Software Engineer Intern – Data Drift at Large (MLOps) Contract

Dataiku

Job Description:

Job Overview

Job Location

Log In

Sign Up

Software Engineer Intern – Data Drift at Large (MLOps) Contract

Dataiku

Job Description:

Related Jobs

MLOps Engineer Full Time

Associate Architect – MLOps Full Time

MLOps Engineer Full Time

Sr Data Scientist Forecasting and MLOps Undefined

Senior MLOps Engineer (R-15549) Contract

Senior Software Engineer (Python | MLOps | Platform Processing) Undefined

Job Overview

Apply For This Job

Job Location

AI Match Score