• Our Underlying Technology

Purpose-built for performance: speed, accuracy and flexibility like no other.

At Vintra, our approach is simple: we believe in creating exceptional and ethical AI technology. Our customers demand it, our engineering team thrives on it and our mission brings it to life every single day for customers with some of the most demanding security needs in the market. Our goal? To empower security professionals to create the safest and smartest environments possible.

Our Solution

Vintra delivers AI-powered video analytics solutions that transform any real-world video into actionable, tailored and trusted intelligence. Our enterprise-grade software solution makes existing security cameras – whether fixed or mobile – smarter and improves how organizations and governments automatically monitor and search video for critical security and safety events.

As a result, our customers are able to:

  • Detect potential threats earlier, respond to ongoing incidents smarter, and dramatically increase investigative results
  • Deliver a ROI beyond security via valuable insights generated from security cameras
  • Scale security and safety capabilities, not headcount

Our Vintra Fortify solution uses purpose-built deep learning models to extract relevant and potent information about the scenes that matter for security, safety, and operations applications. It powers the automation of two important security workflows: the real-time monitoring of hundreds or thousands of live cameras for critical events and the lightning fast and accurate post-event search of recorded video.

How it works

Powering our solution is a set of core technology building blocks that incorporate the very latest in computer vision expertise, AI and deep learning to create an industry-leading approach that is purpose-built for the security industry.

Purpose-built for security professionals: faster, more accurate and more flexible – at a lower price point

At the heart of these building blocks sits Vintra’s suite of multi-class algorithms for detection, classification, tracking, and re-identification which works with fixed or mobile video – either live or recorded – that automatically finds events that matter so they can be alerted on or quickly searched. This all comes together in our purpose-built, end-to-end solution that is faster, more accurate and more flexible than open-sourced algorithms that underpin many other systems on the market.

At Vintra, our experienced research team has demonstrated the distinct advantages of using purpose-built deep learning technology. This means that an algorithm is imagined, designed, trained and deployed for a single purpose – such as making sense of security camera footage.

Research around the globe has made available a series of open-sourced detectors, which some providers have utilized for their core AI technology in the hopes of expediting their development efforts or minimizing the impact on their research and engineering teams. However, these models were a). created and trained to serve as a general-purpose detector; and b). not designed for a single, end-to-end application.

These two points represent an important limitation for security applications. For mission-critical video analytics in which accuracy and speed are critical, incorporating multiple models that are domain-optimized is of massive importance – and it is a key differentiator in our approach.

As a result, benchmarks consistently show that our models are faster, more accurate, more secure and more flexible (in both how they can be retrained and be deployed) than others that use a purely open-sourced based approach. It’s why 3 of the Fortune 10 trust Vintra with their most demanding video intelligence applications.

Multi-class detector

A multi-class detector means that a single algorithm can identify more than one type of object or event in a given video. The current standard objects and events that Vintra can detect in live or recorded, fixed or mobile video, are represented in the blue circles below.

Quickly localize key objects in a given scene, focusing on the things that matter

The list is always growing and Vintra’s detector is incredibly flexible, with the ability to create custom detectors that are specific to your use case without sacrificing speed or accuracy of the already-deployed detectors. This flexibility is a key feature of Vintra’s technology as we believe enterprise users should be selecting tech that can grow with their unique use cases over time, not limit their potential.


Once those objects have been detected, our novel ML architecture can provide more details about what is going on in the video. The Vintra platform can:

  1. Extract fine information regarding the objects such as gender, vehicle type, specific colors and more;
  2. Track specific objects in a given scene;
  3. Identify the same object across multiple scenes over time; and
  4. Create a knowledge graph that correlates people with objects in space-time domains.

Once those objects and events have been detected, they can then be understood to provide more details about what is going on in the video. Our approach to understanding the detected objects contains two parts:

  1. We associate a unique signature to each detection, which is a numerical representation of the visual characteristics of the detected object. The signature is an output of a deep learning model which is trained to be prone to changes in illumination, angles and fields of the cameras, weather conditions, resolution, body pose and more.This means that two detections of the same object/person/face captured by two different cameras would have (almost) the same signatures (we will cover this in the Re-Identification section below). We designed an efficient training procedure that allows us to use training samples that encourage the training, without adding extra computational time. Additionally, unlike others, our training is done by directly optimizing for the final goal, without using extra information such as color, orientation etc.
  2. We classify the detected objects based on their descriptive features, such as gender of a person, color of a car, etc. Our classifiers do not only take a detection as an input, but they also rely on the intermediate results of the deep learning network that is trained to generate unique signatures. As a result, our classifiers are lightweight, as they share the architecture with the signature generator (see above), as well as highly accurate and flexible.

A smarter approach to training data, resulting in a more-accurate and faster model.

The training data that we use for both tasks are collected carefully in order to represent the final scenarios. As deep learning needs volumes of data in order to give accurate results, we use real images, synthetic images as well as the combination of the two.


Tracking is useful for a number of reasons, such as associating the same object identity over multiple timeframes. The task to be performed by a tracking system plays a major role in video surveillance space (for example, reconstructing trajectories of the object, unifying views for forensics investigations and more).

Our tracking model is purpose-built, generating tracklets based on the unique appearance of the target object. The efficiency of our tracking can still associate identities over very adverse situations such as illumination changes, severe occlusions, and it is also robust when used with low-fidelity cameras.


Person Re-ID is a task of localizing a person in videos that are coming from various cameras that might not share the same field of view

Vintra’s platform also includes a powerful application called “Re-ID”, which we can perform on a face, a whole detection of a person or vehicle. You’re probably familiar with facial re-identification, but what should security leaders do when they do not or cannot use face recognition due to stakeholder concerns, legal issues, or just plain challenging camera angles/conditions but still need the ability to re-identify a person during a critical event?
Person Re-ID is a task of localizing a person in videos that are coming from various cameras that might not share the same field of view (eg. localize a person that was captured by the lobby entrance camera and search for them among the hundreds of cameras at the facility). This means that, given a single reference shot of the person of interest, a user should get back all instances of that person over time across various cameras.

This task is very challenging due to variations in pose that can appear through time, difficult lighting conditions, occlusions etc. Similar to face recognition, person re-identification systems retrieve images of one identity, but Re-ID does not rely on information of the face but instead the entire detection of a person. You can use this same technology on vehicles (without needing a license plate) and, in the future, we’ll be able to deliver it on other objects as well.

Think of it like reverse image search for the things that matter to security and safety pros. Fundamentally, this technology enables you to balance your stakeholder privacy (by not using the face as a reference and by storing no long-term PII) and security to build a person of interest journey and find out where they have been and where they are right now when you have a critical event.


The novelty of this technology is that, unlike other systems available in the market today, the Vintra platform can understand the composition and appearance of an object to then search for it. While other approaches make use of pre-compute labels (for example, “red passenger car”) to execute similarity searches, our Re-ID technology learns about the unique features of the object and uses those learnings in its comparison logic.

For example, when the Vintra platform performs a Re-ID, our underlying models define the search effort based on a reference image that our system quickly “learns”. Then, knowing those unique characteristics, Re-ID can compare that reference image with thousands of images – across multiple scenes – in just seconds, comparing the composition and appearance of the reference image to find the most-similar objects.

The result? A faster, smarter platform that can quickly locate individuals and objects across different cameras with fewer false positives, creating higher levels of situational awareness as security professionals search for the things that matter most.

Looking to Learn More?