Is Big Data Ethical?

How much data do you create every day? Do you mind who sees it, uses it and makes money from it? How important is your privacy? Big Data has revolutionised the way we analyse and process large data sets, allowing organisations to make better decisions by leveraging large sets of data. However, the collection, storage, and analysis of personal information has raised ethical concerns, including privacy, security, and bias. In this essay, I will explore both sides of the argument and examine the ethical implications of Big Data.

Over the past decade mobile phone ownership in UK households alone has risen 51%. Since the 1990s, the number of people using the internet has increased from 2.62 million to a whopping 4.7 billion in 2020. All this data being sent and received leaves a ‘digital footprint’ - this is a trace of your activity online. Right now, if you visit Facebook, and click on a Recommended post - pushed to you by its recommendation algorithm, Facebook tracks that and records it. It tracks and files who you enjoy seeing posts from the most, whose posts you like frequently, and your interests based on what you look at. All that data gets fed back into the Facebook algorithm, which aggregates it and recommends topics that it knows you’ll like. This vicious circle was created because Facebook’s main goal is to try and keep you engaged on the platform while collecting and selling data you agreed to in its cookies, gradually building its large data sets.

Let’s define what we mean. Big data means “extremely large data sets that have been analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.” This is the large-scale manipulation and collection of aggregated data - new abilities to process data and gain trends and patterns from to then use to an advantage. What are ethics? This can be slightly more ambiguous, but in this essay, I am defining it as the principles of the collection, storage, analysis, and use of large sets of data in a way that is fair, transparent, and respectful of individual rights and privacy.

Big Data has revolutionised the way we analyse and process large data sets, allowing organisations to make better decisions by leveraging large sets of data. However, the collection, storage, and analysis of personal information has raised ethical concerns, including privacy, security, and bias. In this essay, I will explore both sides of the argument and examine the ethical implications of Big Data.

On one hand, Big Data is ethical, and itself has ethical benefits to consumers as it allows corporations to make more informed and less biased decisions by having a larger picture of their user base, and the population around the product. Collecting this data can sometimes be mutually beneficial for both User Experience (UX) developers and the consumers as it creates a more-informed and bias free product - better for everyone.

But there are serious ethical challenges involved too. The number of international regulations around the use of data suggests that governments and citizens don’t fully trust the tech companies involved to make the right ethical choices themselves. Positive Data Ethics - the ethics behind the processing of data - is reliant on transparency for their user’s safety and privacy and from this, most world data privacy laws state and focus on the ‘Fair Information’ principles, which in summary are as follows. Companies must inform users: what data is being collected; what they are going to do with it; how it is going to be secured and stored, and how it is compliant with data privacy laws such as GDPR (General Data Protection Rights, UK & EU) and the CCPA (California Consumer Privacy Act, California, US) as two examples of regional data privacy laws. Companies must also prioritise the minimisation of collection of personally identifiable data, and only collect what they need to analyse. This ‘Fair Information’ framework conflicts with the principles of big data which are - roughly - to collect as much data as possible to create a ‘bigger picture’ aggregation, with little control over who gets what choice and not knowing exactly what you’re going to use the data for as an unexpected pattern might emerge from the dataset. From a privacy and consumer-based standpoint, this is a major issue, from the lack of control and lack of choice on what your data is used for. This is why to determine whether big data is ethical, it depends on the transparency practices and privacy communication companies offer consumers.

When implemented correctly, with good data privacy practices and a transparent agreement between the patient and the medical professional, Big Data can be used to improve patient outcomes and reduce costs by identifying high-risk patients, predicting treatment responses, and optimising treatment plans. But it’s not just the medical sector that can benefit from big data when used ethically, other sectors such as finance and marketing can have two-way effects on growth and consumers. For example, Big Data could be used to identify fraud, assess credit risk, and optimise investment strategies creating a safer and more informed banking community, enabling more people to get access to loans and money to improve their quality of life. Furthermore, in marketing Big Data could be used to target consumers with more relevant and personalised messages, improving the customer experience and enabling marketing departments to make more sales and grow their brand.

Opponents of the growth of Big Data argue that it is used unethically because it can infringe on people’s privacy and lead to discrimination and bias. The collection, storage, and analysis of personal information can put individuals at risk of identity theft and other forms of cybercrime. And moreover, that the algorithms used to analyse Big Data can highlight and prioritise bias and discrimination, resulting in unfair treatment of certain individuals or groups and leading to a lower standard of living.

To take a medical example, in the Lancet’s Big Data and Health, Snyder & Zhou (2019) argue that whilst it is currently illegal in the US to discriminate in employment and health insurance of the basis of genetic information, but yet “use of big data in healthcare…can be considered in long-term disability and life insurance contracts”. The use of this data can seriously negatively affect people’s quality of life, disregard individual privacy concerns and lead to biased, non-transparent results. With no way of opting out, such use of big data may not be ethical.

Another example of the ethical concerns of the usage of big data was the Cambridge Analytica scandal in 2018 during the US presidency election campaign for Donald Trump. It was revealed that Cambridge Analytica had collected the personal voting affiliation data from millions of US citizens without their consent and used it to push targeted adverts that were customised to the user’s political affiliation and used it to influence millions of voters to cast their ballot in favour of Trump. When this breach of privacy and trust hit the headlines on newspapers across the world, concerns were raised about the ethical implications of covertly collecting and using personal data without consent for political purposes. Not all was lost with this situation as governments globally were rallied to revisit and strengthen their data protection clauses to ensure that companies were being transparent about their data practices.

In conclusion, the question whether big data is ethical is complex and multifaceted. But it’s important to recognise that the big data itself has no ethical charge or choice: it’s what we humans choose to do with it that counts. While there are many potential benefits to be gained from the correct, ethical use of big data, including the cure for diseases, increased efficiencies in society, and especially with the rise of generative AI large language models - which are trained on and inherit features of big datasets - there are also significant risks and ethical concerns associated with the collection and use of large-scale data sets. Therefore, it is crucial that we ensure that data practices are transparent and mutually agreed to ensure that our big data is used ethically. Ultimately, the ethical usage of big data must be evaluated on a case-by-case basis, considering ethical values.

The Tech Stack 2024