Flying Fuzzy: A Privacy Preserved No-Fly List for Global Airlines Using Fuzzy String Matching

Flying Fuzzy: A Privacy Preserved No-Fly List for Global Airlines Using Fuzzy String Matching

Author: Conor Moran (Senior Director Business Development, Inpher) 

Introduction

As of February 22nd, the Federal Aviation Administration has reported that there have been 607 reports to the agency of unruly passengers by airline flight crews in 2022 alone. The overwhelming majority of these incidents are related to facemasks and their use onboard a plane, with the corresponding spike in the number of investigations relating to the outbreak of COVID-19. There are serious repercussions for being found guilty of creating an onboard disruption, including civil penalties of up to $37,000 per violation or prosecution on criminal charges. In addition to any actions taken by the FAA, the passenger can also be placed on the airline’s no-fly list.

Each airline maintains a list of previously unruly passengers that they forbid from flying on future trips. Several weeks ago, the CEO of Delta Air Lines, Edward H. Bastian, took it a step further and requested that the federal government establish a centralized, nationwide no-fly list to keep passengers from flying on other carriers after one individual airline has barred them for their previous disruptive behavior. In his belief, this national no-fly list “would serve as a strong” deterrent to passengers “of not complying with crew member instructions.”

While I think most of us who frequently travel would agree that preventing a potentially disruptive passenger from being onboard an aluminum tube hurtling through the air at 600 miles per hour is a good thing, there are several logistical and legal challenges to the type of collaboration that Mr. Bastian is suggesting, including:

  • While previous disruptions might have resulted in less severe consequences for the passengers in question, the FAA and airlines are currently in a zero-tolerance environment. As a result, significantly more passengers are now being placed on airline no-fly lists due to the environment.

  • Each airline has its no-fly list in a format that is unique to its carrier. There is no standardization.

  • Airlines, like Delta, often have operations in multiple countries. The transfer of explicit, personal data about a passenger can be prevented because of an individual country’s privacy laws.

  • There are currently a number of issues with false positives in airline no-fly lists; passengers with similar-sounding names or attributes can be incorrectly prevented from flying.

  • The free exchange of information in a centralized database maintained by the federal government raises fundamental concerns about civil liberties. In fact, the American Civil Liberties Union has challenged previous attempts at a centralized no-fly list.

  • The historical reluctance of the federal government to take ownership of a centralized database is inherently risky as it pools together data in a central location that would need to be queried by each airline periodically.

What if I told you that both the airlines and civil liberties advocates could have their cake and eat it too? We not only brainstormed a solution, but we also went out and built one and made it available so that anyone could try it (nudge @delta).

Building a Collaborative No-Fly List with XOR Platform

Inpher’s XOR Platform is based on secure multi-party computation (MPC). MPC allows for a cryptographically-secure way of distributed computing on data sources without revealing the inputs (the passenger data) and without the data leaving the privacy zone, which in this case is each individual airline. Each airline would keep their data local in an XOR Machine, also referred to as a networked virtual machine, which would contain the passenger data fields. For example, Delta would have an XOR Machine deployed in London, New York, Mexico City, and other locations worldwide, as would other airlines, like American Airlines and United. Each airline would essentially serve as a node in the network configuration and be able to perform analytics that would match their no-fly list against other airlines’ no-fly lists, while systematically ignoring passengers that are not on the lists.

The above setup addresses several of the previously stated concerns; notably that this process obviates the need for a federally operated and centralized database, that sensitive passenger data remains protected, and that data is localized and maintains residence in the country in which it is originated. However, a few challenges remain, including how do you ensure that a prospective passenger is not being unfairly excluded from flying, along with data standardization issues?

 

Building Collaborative No-Fly List with XOR Platform

To solve these problems, we allow for XOR to use probabilistic  “fuzzy matching” to aid us in our efforts. A relatively easy way of finding someone, obliviously, across multiple data sets is to use something called a private join based on a search for a unique identifier, also known as an explicit match. However, rarely do passengers enter that kind of information, like a unique national ID, when ordering plane tickets. This means that there are more problems with performing a private join, as you are inherently dealing with data inputs that have more variability than an explicit ID. Users of our XOR Platform can now use fuzzy matching, which is a process that provides an improved ability to process matching queries to find matching phrases or sentences across data sets.

An example of such a scenario is below.

  • Passenger John P. Smith from Poughkeepsie, New York, buys a ticket to fly American Airlines to Miami, Florida. He is not on an airline no-fly list.

  • Passenger John O. Smith from Poughkeepsie, New York, is on Delta Air Lines’ internal no-fly list.

  • Let’s assume that both Delta and American are using Inpher’s XOR Platform to ensure that they are not letting on passengers prohibited by their respective airlines.

  • Using Inpher’s XOR Platform, American Airlines can now perform a fuzzy matching computation to look for John Smith of Poughkeepsie, but also start modifying the parameters on other fields, like his address or other data points they may have collected, to filter out the wrong person and ensure that John P. Smith is not improperly flagged.

  • XOR allows the airlines to process multiple rounds of iterations on these fields while at the same time allowing the airlines to restrict which fields are revealed, ensuring that the right person is allowed to get on the plane while ignoring everyone else on the respective lists.

We made sure John P. Smith is on the flight, but what about John O. Smith, who is already on the no-fly list? One of the issues frequently raised is that once a passenger (or someone whose data points are closely similar to that passenger) is on a no-fly list, it is really difficult to get them off of it in a zero-tolerance environment. One of the significant advantages of MPC and XOR is that we allow for vertical data stacking. Imagine that Delta Air Lines has a field for both their carrier code and for what date that John O. Smith was added to their no-fly list. Perhaps ten years from now, fliers will look differently at those passengers who were put on the list during the stress of the COVID outbreak, and airlines will be more forgiving. By “stacking” features like the airline and date in the resulting revealed output, American Airlines might opt to let him fly if there was enough time and distance from the original incident. Additionally, if John O. Smith were incorrectly put on the list, it would make it easier to track down the initial carrier who placed him on the list to rectify the situation versus having all of his information dumped into a centralized database.

    Conclusion 

    As I mentioned, we built this solution and made it a reality that anyone can try. You do not need to download or install any software, nor do you need to work for an airline or be a data analyst. Just head to our XOR Trial platform, sign up, and get started on a use case involving two hypothetical international carriers; National Airlines and SunSet Airlines. While we have applied fuzzy matching to solve the airline no-fly list challenge, you can imagine using fuzzy matching in XOR to solve any kind of data problem where non-explicit matches on private data inputs might be required, from advertising technology to customer comparisons.