Site Loader

Big
data is a term that is used to describe a large amount of data, that is used by
organizations on a day to day basis. Big data comes from a variety of different
sources. This data is usually mixed and, is generated at a very large scale and
at a very fast rate. Big Data in its entirety is a double-edged sword. It
brings huge benefits, as it allows organizations to personalize a good or
service on a mass scale and also helps researchers to better understand a
topic. But at the same time, the use of big data also has its pitfalls, as it
allows data scientists, researchers and organizations to analyze and exploit
information. Use of big data can also enable access to data that can compromise
an individual’s privacy. Even though there are two sides, there is usually very
little talk of the pitfalls of Big Data, as it has to do with privacy and
ethics.

An
industry analyst, named Doug Laney, defined the definition of big data as 3
V’s: Volume, Velocity, and Variety 1. First V, Volume is referred to as the
scale of data that is collected from a variety of sources such as social media,
sensors. This data can range from Megabytes to Zettabytes. The second V,
Velocity refers to the speed at which big data must be analyzed at. This data
must be ingested, correlated and analyzed in regards to the question that is
being asked, this data can be analyzed in Batch to Real time. The third V
stands for variety. Data Variety refers to the different types of data, from
structured numeric data in traditional databases such as SQL to unstructured
data.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

            Data
in today’s world has a much higher value. This data can either be stolen,
bought as well as sold. This can occur in any major databases including ones
such as:

–    Social Media (Facebook, Twitter, Instagram,
and many more)

–    Government Data (Canada’s Open Government
and others)

–    Location Based Data (GPS and others)

–    Marketing Data (Datasets stored by
companies such as Amazon)

–    Banking Data (used for loan assessment,
transfers, and others)

–    Political Data

And
many other.

            Data
collection is not going away; in fact, it is becoming more standard. Data is
collected by every organization in one form or another, and with more
technological advancements it is becoming easier to mine customer and user
behavior. There are ways where organizations can collect and analyze data
without alarming users or compromise one’s privacy. Most organizations decide
to collect all kinds of data on a user, even when it is not needed. People’s
biggest concern when it comes to big data is the fact that organizations being
able to use personally identifiable data that’s stored against them. For
example, an app asking the users to allow them to use their storage for app
data as well as using personal account information. Even a simple game app,
such as Angry Birds, requires huge amounts of user data simply to function.

Consumers
are becoming more aware of the risks of their data being used without their
consent, and as a result companies like Google are restricting apps from
extracting and selling certain user data. This is forcing apps like “Angry
Birds” to rethink the permissions they request, and how they handle their
user’s data. Even though efforts are being made to protect consumer data, the
increase of connectivity between devices and the internet has users asking hard
questions about their privacy. A key example of this is voice activated devices
such as Google Mini, Amazon’s Alexa and other similar products, where user data
is being recorded at most of the times. Moreover, people are more worried about
unofficial organizations using their data and making inferences about their
likes, dislikes, affiliations and many types of beliefs and preferences such as
political, religious. An example of this is ISPs, Internet Service Providers,
having access to their user’s data. Furthermore, in a study called “Social
Media Mining to Understand Public Mental Health”, done by two Waterloo
researchers unofficial data was used to guide their research 3. This data was
collected from Reddit and a journaling app, where the user was not made aware
that their data was being used for research purposes, this was unethical. 

            The
public has become more conscious of the fact that through surveillance, their
personal data is being collected and as a result, some personal freedoms are
being limited. Big data is a challenging topic for many people as it challenges
their understanding and trust. When it comes to big data the ethical concerns
of the people include questions about personal privacy, personal consent,
transparency and data ownership. In general, people do not have a very trusting
relationship with companies as they do not trust that their data is being used
appropriately.

As
more and more predictive algorithms are being developed, it is getting easier
to determine what an individual’s beliefs and patterns are. However, these
algorithms have not been optimized to pick up what is important and things that
matter. This in itself is a huge violation of privacy of an individual, as they
did not give consent. An argument can be made that this restriction on personal
rights is justified as it is necessary for nation-states to impose these
restrictions to uphold the security and well-being of its citizens. However, it
has been shown that governments can take this too far, and infringe upon the
privacy of its own citizens. The clearest example of this is when Snowden
leaked, materials from the NSA 4.

As
the years progress more whistleblowers will be needed as governments and
organizations will be collecting and inferencing more data than ever. Moreover,
countries are now using geolocation to control and restrict their citizens, as
they try to be more controlling. Governments are also becoming more aware of
data breaches. Meanwhile, More and more governments are passing DPA, Digital
Privacy Act, which replaces and updates some of the already declared rules on
how organizations collect, use and disclose personal information. The act is
intended to be more encouraging for organizations to properly safeguard any
private data they have or will collect on their users. The DPA in most cases
requires organizations to: Report any breaches involving private information,
notify all the individuals that have been affected, and maintain all records of
the breaches. 7

            When
using big data normally the inconsistency, incompatibility and the content does
not matter, as most organizations just want to use all the data that is
available to them. Organizations don’t care because they have systems like
Hadoop which can handle this jumbled data assembly. Organizations also use the
Bayesian methods to distinguish and extract, probabilistic understandings.
These methods while inconsistent are essentially prone to errors, as data will
be incompatible, as this data will be unverified and there is no way of
verifying it. This data comes from traffic patterns, sensors, devices,
audio/video, networks log files, transactional applications, web data and
social media.

            For
statisticians, data can be represented as an opportunity to use data from a
variety of different sources, as the volume of data being collected, has
increased. Big data at times is often referred to as the by-product of other
data collection. Using statistics, new technologies and the data collected by
the government, such as the census, can help provide close to real-time
information about the society.

            As
data evolves so will the ethical concerns. Data generated by the public through
the use of social media, such as Twitter, Facebook, Instagram, Snapchat, and
others, will have completely different ethical concerns than those of the data
generated through administrative government surveys. Automated systems for
cryptocurrencies, which carry an abundance of data are already in place. Such
systems do not require any human intervention, for the data to keep flowing.

            Moving
on to the stakeholders. Big data has three main stakeholders; Collectors,
utilizers, and generators. These stakeholders can be defined as the following:

a.    Collectors: determine which data is
collected, stored and for how long. These collectors run the collection.

b.    Utilizers: the re-define the purpose for
which the data is going to be used for.

c.    Generators: are individuals that help in
the recording of a massive amount of data either knowingly, unknowingly,
voluntarily or involuntarily. Also, things such as sensors, home assistant
devices also generate a copious amount of data.

These
are entities that are affected directly by Big Data. All of these stakeholders
interact with each other at one point or another, and due to their dependent
relationship, each stakeholder is as powerful as the other.

            There
are many ethical dilemmas when it comes to big data these include dilemmas such
as data masking, privacy, group-privacy, research ethics and also propensity.

Data
masking is the process of replacing authentic data with an inauthentic version
of the data, it is used to protect valuable and sensitive information. If
masking is not used properly, analysis of Big Data could easily unravel
everything such as individual’s thoughts, beliefs, and in general their
identity.

Moving
on, Privacy is one of the biggest concern when it comes to big data. As there
is more and more stuff that people put online the more their data is recorded.
This makes everything about the individuals transparent.

Furthermore,
Group privacy is also a huge issue when it comes to big data as it can be used
to find out information about an individual. Such as their Location, age,
gender, shopping preferences, friendships and many other things. This can be
done by removing some elements from heterogeneous data that are connected to
one specific dataset. Examples of this can be seen everywhere with things such
as targeted marketing, where online ads are displayed to an individual using
their search history.

Another
concern with big data is research-ethics, with many of the standards and
ethical codes being out of date. It is a question of privacy when it comes to
research ethics as the use of data from social media such as Twitter, Facebook,
and other forms are still in question. This is because for many companies’
individual data might be important and having access to such massive databases.
Many researchers use these databases to reveal information that may be
unsettling. As an example the study done by two Waterloo researchers on mental
health accessed data from Reddit and a journaling application, that people used
to record their moods, to further prove that their algorithm that used text
mining and topic modeling worked. Given that this data is meant to be public in
the case of Twitter, Facebook, and other social media sites; no user actually
gives the consent to being part of a research that studies their data on these
social media sites.

And
lastly, Propensity is when predictions are made from big data. A great example
of propensity is when companies, such as Amazon, use their user data to suggest
an item to them that they may be interested in buying. Propensity can also,
however, have negative repercussions as the outcome of the algorithm can be
completely wrong. Propensity makes it highly unlikely for researchers or
organizations to find specific outliers as it makes everything in a data
connected.

There
are four ethical theories that can be used to analyze the topic of Big Data.
These include Kant, Utilitarianism, Social Contract and Virtue Theory.

Using
Kant’s concept, Researchers, and organizations that use data of individuals to
gain a competitive advantage or to prove a certain concept is morally wrong.
For example, using data of Reddit uses and data from users of the journaling
app in order to prove that an algorithm that used text mining and topic
modeling 3 was working effectively is morally wrong. This is because the
study violated Kant’s categorical imperative.

Using
the utilitarian approach researchers or organizations can help benefit their
users or the society as a whole. The same Example that applied to Kantian
theory can be applied here as Utilitarianism focuses on the consequences, the
study was designed to help identify which mental health issues were talked
about more publicly. This study proved successful in detecting several users
expressing suicidal thoughts on social media. 3 There are many organizations
and researchers that use the data they collect, to benefit their users or the
society as a whole. But, if there is a data breach it should be the
organizations and the researchers that should be held morally responsible.

            When
talking about Social contract theory in connection to big data, people assume
that the government is there to protect them, and their data. The collection of
data of individuals at a mass scale challenges the social contract. Even though
there are many benefits, the unethical and unregulated collection of big data
causes greater challenges. An example of this is the mass scale surveillance,
led by the NSA, that was leaked by Snowden. 4 Along with this the collection
of data by monopolies also undermines the huge benefits of big data.

            Lastly, Virtue Ethics argues the fact
that an action is good if the person performing that action is virtuous. When
collecting data large entities must take into account that no individual’s
rights are in violation. In many cases if the organizations and researchers
collecting and analyzing, an individual’s, data for a certain purpose are
virtuous, then according to virtue ethics, it does not matter if the privacy of
an individual is on the line. For this and many other reasons, is why it is so
necessary to set specific rules for analyzing and collecting individual and
group data.

            In
order to make sure that no data, that was, has, and is being collected by Data
collectors, falls into the wrong hands has to anonymized or masked. Doing this
will make sure that it will not reveal any identifying information about an
individual, and risk their privacy. If for a certain reason identifying data is
collected and analyzed, all the individuals affected should be notified. As
well as, Organizations that do collect data (identifying or non-identifying)
from their user, no matter what the reason, should not sell this data to
third-parties. If the data is to be sold all the users should be notified
before-hand as it is a matter of their right and privacy.

            In
conclusion, Big Data is still and always will be a topic with blurred ethical
concerns. But rules such as ones described above and others can be set in
place, to make sure that data collection and analyzation is done correctly and
in an ethical way. Eventually, as there will be more discussions about big
data, many new policies and new rules will have to be introduced to prevent
abuse of Big Data.

Post Author: admin

x

Hi!
I'm Russell!

Would you like to get a custom essay? How about receiving a customized one?

Check it out