What Authentication Means
We use the term authentication
to mean three different things [KEN03]:
We authenticate an individual, identity, or attribute. An individual is a
unique person. Authenticating an individual is what we do when we allow a
person to enter a controlled room: We want only that human being to be allowed
to enter. An identity is a character string or similar descriptor, but it does
not necessarily correspond to a single person, nor does each person have only
one name. We authenticate an identity when we acknowledge that whoever (or
whatever) is trying to log in as admin has presented an authenticator valid for
that account. Similarly, authenticating an identity in a chat room as SuzyQ
does not say anything about the person using that identifier: It might be a
16-year-old girl or a pair of middle-aged male police detectives, who at other
times use the identity FrereJacques.
Finally, we authenticate an
attribute if we verify that a person has that attribute. An attribute is a
characteristic. Here's an example of authenticating an attribute. Some places
require one to be 21 or older in order to drink alcohol. A club's doorkeeper
verifies a person's age and stamps the person's hand to show that the patron is
over 21. Note that to decide, the doorkeeper may have looked at an identity card
listing the person's birth date, so the doorkeeper knew the person's exact age
to be 24 years, 6 months, 3 days, or the doorkeeper might be authorized to look
at someone's face and decide if the person looks so far beyond 21 that there is
no need to verify. The stamp authenticator signifies only that the person
possesses the attribute of being 21 or over.
In computing applications we
frequently authenticate individuals, identities, and attributes. Privacy issues
arise when we confuse these different authentications and what they mean. For
example, the U.S. social security number was never intended to be an
identifier, but now it often serves as an identifier, an authenticator, a
database key, or all of these. When one data value serves two or more uses, a person
acquiring it for one purpose can use it for another.
Relating an identity to a
person is tricky. In Chapter 7 we tell
the story of rootkits, malicious software by which an unauthorized person can
acquire supervisory control of a computer. Suppose the police arrest Ionut for
chewing gum in public and seize his computer. By examining the computer the
police find evidence connecting that computer to an espionage case. The police
show incriminating e-mail messages from Ionut on Ionut's computer and charge
him. In his defense, Ionut points to a rootkit on his computer. He acknowledges
that his computer may have been used in the espionage, but he denies that he
was personally involved. The police have, he says, drawn an unjustifiable
connection between Ionut's identity in the e-mail and Ionut the person. The
rootkit is a plausible explanation for how some other person acted under the
identity of Ionut. This example shows why we must carefully distinguish
individual, identity, and attribute authentication.
We examine the privacy
implications of authentication in the next section.
Individual Authentication
There are relatively few ways
of identifying an individual. When we are born, for most of us our birth is
registered at a government records office, and we (probably our parents)
receive a birth certificate. A few years later our parents enroll us in school,
and they have to present the birth certificate, which then may lead to
receiving a school identity card. We submit the birth certificate and a photo to
get a passport or a national identity card. We receive many other
authentication numbers and cards throughout life.
The whole process starts with
a birth certificate issued to (the parents of) a baby, whose physical
description (height, weight, even hair color) will change significantly in just
months. Birth certificates may contain the baby's fingerprints, but matching a
poorly taken fingerprint of a newborn baby to that of an adult is challenging
at best. (For additional identity authentication problems, see Sidebar 10-2.)
Fortunately, in most settings
it is acceptable to settle for weak authentication for individuals: A friend
who has known you since childhood, a schoolteacher, neighbors, and coworkers
can support a claim of identity.
Sidebar
10-2: Will the Real Earl of Buckingham Please Step Forward?
In a recent case [PAN06], a man
claiming to be the Earl of Buckingham was identified as Charlie Stopford who
had disappeared from his family in Florida in 1983 and assumed the identity of
Christopher Buckingham, an 8-month-old baby who died in 1963. Stopford was
questioned in England in 2005 after a check of passport details revealed the
connection to the Buckingham baby and then arrested when he didn't know other
correlating family details. (His occupation at the time of his arrest? Computer
security consultant.) So the British authorities knew he was not Christopher
Buckingham, but who was he? The case was solved only because his family in the
United States thought they recognized him from photos and a news story as a
husband and father who had disappeared more than 20 years earlier. Because he
had been in the U.S. Navy (in military intelligence, no less) and his adult
fingerprints were on file, authorities were able to make a positive
identification.
As for the title he appropriated for
himself, there has been no Earl of Buckingham since 1687.
Consider the case of certain people who,
for various reasons need to change their identity. When the government does
this, for example when a witness goes into hiding, the government creates a
full false identity, including school records, addresses, employment records,
and so forth. How can we authenticate the identity of war refugees whose home
country may no longer exist, let alone a civil government and a records office.
How does an adult confirm an identity after fleeing a hostile territory without
waiting at the passport office for two weeks for a document?
Identity Authentication
We all use many different
identities. When you buy something with a credit card, you do so under the
identity of the credit card holder. In some places you can pay road tolls with
a radio frequency device in your car, so the sensor authenticates you as the
holder of a particular toll device. You may have a meal plan that you can
access by means of a card, so the cashier authenticates you as the owner of
that card.
You check into a hotel and
get a magnetic stripe card instead of a key, and the door to your room
authenticates you as a valid resident for the next three nights. If you think
about your day, you will probably find 10 to 20 different ways some identity of
you has been authenticated.
From a privacy standpoint,
there may or may not be ways to connect all these different identities. A
credit card links to the name and address of the card payer, who may be you,
your spouse, or anyone else willing to pay your expenses. Your auto toll device
links to the name and perhaps address of whoever is paying the tolls: you, the
car's owner, or an employer. When you make a telephone call, there is an
authentication to the account holder of the telephone, and so forth.
Sometimes we do not want an
action associated with an identity. For example, an anonymous tip or
"whistle-blower's" telephone line is a means of providing anonymous
tips of illegal or inappropriate activity. If you know your boss is cheating
the company, confronting your boss might not be a good career-enhancing move.
You probably don't even want there to be a record that would allow your boss to
determine who reported the fraud. So you report it anonymously. You might take
the precaution of calling from a public phone so there would be no way to trace
the person who called. In that case, you are purposely taking steps so that no
common identifier could link you to the report.
Because of the accumulation
of data, however, linking may be possible. As you leave your office to go to a
public phone, there is a record of the badge you swiped at the door. A
surveillance camera shows you standing at the public phone. The record of the
coffee shop has a timestamp showing when you bought your coffee (using your
customer loyalty card) before returning to your office. The time of these
details matches the time of the anonymous tip by telephone. In the abstract
these data items do not stand out from millions of others. But someone probing
a few minutes around the time of the tip can construct those links. In this
example, linking would be done by hand. Ever - improving technology permits
more parallels like these to be drawn by computers from seemingly unrelated and
uninteresting datapoints.
Therefore, to preserve our
privacy we may thwart attempts to link records. A friend gives a fictitious
name when signing up for customer loyalty cards at stores. Another friend makes
dinner reservations under a pseudonym. In one store they always ask for my
telephone number when I buy something, even if I pay cash. Records clerks do
not make the rules, so it is futile asking them why they need my number. If all
they want is a number, I gladly give them one; it just doesn't happen to
correspond to me.
Anonymized Records
Part of privacy is linkages:
Some person is named Erin, some person has the medical condition diabetes;
neither of those facts is sensitive. The linkage that Erin has diabetes becomes
sensitive.
Medical researchers want to
study populations to determine incidence of diseases, common factors, trends,
and patterns. To preserve privacy, researchers often deal with anonymized
records, records from which identifying information has been removed. If those
records can be reconnected to the identifying information, privacy suffers. If,
for example, names have been removed from records but telephone numbers remain,
a researcher can use a different database of telephone numbers to determine the
patient, or at least the name assigned to the telephone. Removing enough
information to prevent identification is difficult and can also limit the
research possibilities.
As described in Chapter 6, Ross Anderson was asked to study a
major database being prepared for citizens of Iceland. The database would have
brought together several healthcare databases for the benefit of researchers
and healthcare professionals. Anderson's analysis was that even though the
records had been anonymized, it was still possible to relate specific records
to individual people [AND98a, JON00]. Even though there were significant
privacy difficulties, Iceland went ahead with plans to build the combined
database.
In one of the most stunning
analyses on deriving identities, Sweeney [SWE01]
reports that 87 percent of the population of the United States is likely to be
identified by the combination of 5-digit zip code, gender, and date of birth.
That statistic is amazing when you consider that close to 10,000 U.S. residents
must share any birthday or that the average population in any 5-digit zip code
area is 30,000. Sweeney backs up her statistical analysis with a real-life
study. In 1997 she analyzed the voter rolls of Cambridge, Massachusetts, a city
of about 50,000 people, one of whom was the then current governor. She took him
as an example and found that only six people had his birth date, only three of
those were men, and he was the only one of those three living in his 5-digit
zip code. As a public figure, he had published his date of birth in his
campaign literature, but birth dates are sometimes available from public
records. Similar work on deriving identities from anonymized records [SWE04, MAL02]
showed how likely one is to deduce an identity from other easily obtained data.
Sweeney's work demonstrates
compellingly how difficult it is to anonymize data effectively. Many medical
records are coded with at least gender and date of birth, and those records are
often thought to be releasable for anonymous research purposes. Furthermore,
medical researchers may want a zip code to relate medical conditions to
geography and demography. Few people would think that adding zip codes would
lead to such high rates of breach of privacy.
Conclusions
As we have just seen,
identification and authentication are two different activities that are easy to
confuse. Part of the confusion arises because people do not clearly distinguish
the underlying concepts. The confusion is also the result of using one data
item for more than one purpose.
Authentication depends on
something that confirms a property. In life few sound authenticators exist, so
we tend to overuse those we do have: an identification number, birth date, or
family name. But, as we described, those authenticators are also used as
database keys, with negative consequences to privacy.
We have also studied cases in
which we do not want to be identified. Anonymity and pseudonymity are useful in
certain contexts. But data collection and correlation, on a scale made possible
only with computers, can defeat anonymity and pseudonymity.
As we computer professionals
introduce new computer capabilities, we need to encourage a public debate on
the related privacy issues.
In the next section we study
data mining, a data retrieval process involving the linking of databases.
Related Topics
Privacy Policy, Terms and Conditions, DMCA Policy and Compliant
Copyright © 2018-2024 BrainKart.com; All Rights Reserved. Developed by Therithal info, Chennai.