Skip to Content

‘Data lineage is essential,’ says Milestone CEO

‘Data lineage is essential,’ says Milestone CEO Thomas Jensen discusses Project Hafnia & ethically sourced data

‘Data lineage is essential’

COPENHAGEN, Denmark — Milestone System’s Project Hafnia is taking aim at democratizing AI-model training with NVIDIA’s Cosmos Curator, at the heart of which is data that is ethically sourced and regulation-compliant, says Milestone CEO Thomas Jensen. 

In a two-part interview, Jensen discusses Milestone’s partnership with NVIDIA, emphasizing its long-standing collaboration in AI-driven video modeling. He highlights the importance of ethical AI development, data lineage and compliance with global regulations like GDPR. 

This is the first part of the interview, edited for clarity and length. 

SSN: How did the partnership between Milestone and NVIDIA come about? 

Thomas Jensen: Well, we've been partners with NVIDIA for eight years. Anyone who has been modeling vast amounts of video data for more than 27 years cannot overlook the importance of compute power and GPUs. 

Naturally, we worked closely with NVIDIA on many different elements. As AI became a prominent topic, especially with the rise of OpenAI and ChatGPT, we started discussing its implications for AI modeling in video processing—what is known as a vision language model. 

One of the key discussions we had with NVIDIA revolved around responsible technology principles. We emphasized ethical sourcing of data and respecting individual privacy and human rights, which are high on our agenda. A major challenge with large language models, and even in our field, is that vast amounts of data are not always traceable. Data lineage is essential to proving where data comes from, how it was sourced and how it led to a particular model’s conclusions. 

This shared focus on ethical AI strengthened our relationship with NVIDIA. We asked ourselves: What if someone could deliver finely tuned AI models for video data—vision language models that are fully compliant, ethically sourced, and adhere to major regulations like GDPR and equivalent U.S. laws? This would create a unique value proposition for computer vision developers struggling with limited access to ethically sourced data. 

SSN: Let's dig into that a little bit. Could you elaborate on the measures that Project Hafnia takes to ensure that all video data is ethically sourced and compliant with global regulations? 

Thomas Jensen: First and foremost, we do not train models on data we don’t have the rights to use. As part of this project, we’ve introduced what we call a “license-to-data” framework. When a customer or technology partner wants to collaborate, the first step is securing a data license, which provides data lineage, usage rights and the ability to curate, annotate and fine-tune a vision language model. 

The next step is analyzing the data. It’s crucial to determine if any personally identifiable information (PII) is present—anything that could compromise privacy or expose individuals to discrimination or targeting, such as biometric data or license plate recognition. 

If such data is present, we anonymize it using deep natural anonymization technology instead of conventional blurring methods. Blurring prevents AI models from training on obscured features—so if a face is blurred, for example, we cannot train a model for facial recognition. Deep natural anonymization retains facial features while making them untraceable to any known algorithm. The same applies to license plates; they remain trainable but are altered to ensure privacy. 

Additionally, to prevent de-anonymization through pattern recognition, we apply different anonymizations each time a person appears in a video. This means that even if the same person appears multiple times, they will look different in each instance, preventing backward tracking. 

These are the two key principles we use to ensure both privacy and ethically sourced data from the outset. 

SSN: The platform claims to accelerate AI development by up to 30 times. What specific features or tools within Project Hafnia contribute to this significant improvement? 

Thomas Jensen: Several factors contribute to this acceleration. First, the entire NVIDIA technology stack is a key enabler. We’ve also optimized conventional computer vision models by curating and fine-tuning data on behalf of developers and technology partners. 

By ensuring that the data is already curated, filtered, annotated and compliant with regulatory requirements, we significantly reduce the time needed for AI development. This optimization enables up to 30 times greater efficiency. Additionally, NVIDIA’s tools, including their curation technology, further enhance the speed and efficiency of our platform. 

Comments

To comment on this post, please log in to your account or set up an account now.