Inspect

Version 3: Sensemaking

Sensemaking is a research lens about making sense of the world or part of it when we are surprised by work interruptions or more misinformation published online.

Version 2: Web App

Based on the findings in the version 1 mobile app, I made another version to mitigate news information overload:

The resulting web application at https://inspect.datagotchi.net is pictured here:

After publishing this web app and creating some example insights that cite news articles and technical blog posts, I learned a few things:

  • My initial approach of saving news articles as they appear online to later be put into insights is a ton of work.
  • Although I have created some insights as examples, I have a hard time communicating things that are important and surprising to people.
  • Comments and tags are useful for explaining what insights are about, what to do about them, and how articles/other online information inform them, but it’s a lot of text that overloads users.
  • The list of insights and comments/tags inside them has helped me understand the example ones I created, but it does not show how they are related or why anyone should care.

Version 1: Mobile App

Online information overloads us because it is no longer geographically or socially constrained. Since we can no longer rely on many cultural institutions, we will need to make sense of the world ourselves.

Therefore, we need to:

  • Reliably create and share source trust data,
  • Consistently evaluate the truth of claims, and
  • Use true claims to improve source trust data.

To support these needs, I envisioned:

  • An Ontology-Driven Source Evaluator,
  • A User-Centered Claim Evaluator, and
  • Combined, an Iterative Truth Propagation Process.

I created a mobile app with React Native so it worked on iOS (iPhones, iPads, Apple TVs) and Android. Because people have largely converged to only ingesting information that they subscribe to via newsletters, social media, or other niche websites and apps (e.g., Google News, Apple News+, etc.), I focused the app on following authors you know/trust.

I tested this mobile app with friends and family members, and found that:

  • News often takes the form of clickbait headlines and is hidden behind paywalls because the companies are incentivized by profits, not spreading important information.
  • However, this information is still very important for us to make good decisions and live our lives.
  • People are still overloaded by several news articles published every day.
  • People don’t want to download yet another mobile app, especially for something they do not do very often.

Inspect: A Multiplatform Social Network for Trustworthy News

Problem Space

As the world becomes more connected and more complex, it is increasingly difficult to know what to believe: events happen far away from us, to other people, and we usually hear about them after the fact. A long time ago, we would get this information from newspapers. More recently, there were also television stations that focused on local news. 

Nowadays, there are so many online sources of information — from newspaper websites to social media posts — that it overloads us and makes it difficult for us to discern what information is important. As a result, many people have started completely ignoring the news. For media literacy in our democracy, those people should still be kept aware of important news. Therefore, we need a new way to get the news that is important to us

It is also difficult to know which information sources are reliable. Unreliable sources frequently publish fake news and unverified information that serves political or ideological agendas. As a result, echo chambers and filter bubbles are created that limit exposure to diverse perspectives. This is also very destructive to our democracy. Therefore, we need a new way to determine what news sources are reliable.

Even if a new source is otherwise reliable, they often have pressure to publish quickly, attract clicks, and make more money. As a result, there is less thorough fact-checking and investigative reporting, as well as less context and background information. Even quality news articles are often framed with clickbait headlines and sensationalist content, and are often hidden behind subscription paywalls — even articles that are essential information for our democracy! Therefore, we need a new way to obtain the content and context hidden in articles from reliable news sources

Technical Requirements

To help people get the news that is important to us, different people’s conceptions of importance and usage contexts throughout the day need to be supported. For different conceptions of importance, the author can communicate the importance to their followers in a way that they understand it due to their personal connection to the author, or, failing that, in a way that most people in their social groups would probably understand. For different usage contexts throughout the day, a technology can be designed to “fit” into a user’s life with different mediums or platforms they can choose to use at different times of the day. Therefore, a solution is needed to enable social communication of important news throughout the day onto multiple platforms

To help people determine what news sources are reliable, critically thinking about sources’ reliability is very taxing, so it should be assisted somehow with crowdsourcing from multiple people or with technological automation. A source can be said to be more reliable if it publishes more true articles. Automation is good at counting things, and crowdsourcing counting seems unnecessary, but judging an article to be true is ultimately a contextual judgment, therefore must be done by humans. Therefore, a solution needs to provide ways for one or more users to decide if it’s true, and can be enhanced with information that suggests the truth of of an article and the reliability of the source. Therefore, a solution is needed to provide information about source reliability and article truth adjacent to a way for users to decide if an article is true.

To help people obtain the content and context hidden in articles from reliable news sources, the the articles need to be summarized and the headlines need to be represent the content of the article. Summarizing news articles can be done manually, but that would mean creating summaries could not be done quickly and easily. To extend the efficiency of creating summaries, it can be assisted by crowdsourcing to other people or automation. Because the content that readers need to broadly understand the article is a human judgment, automated or crowdsourced assistance can be used to suggest content, rather than forcing the author to copy content from the article themselves, but the author still needs to make the final call. Similarly, assistance can mark a headline as likely clickbait and can suggest another headline, but, again, the human author needs to make the final call. Therefore, a solution is needed to assist authors summarizing articles and improving headlines.

Our Solution

For our solution to enable social communication of important news throughout the day onto multiple platforms, the types of social communication and the various times and platforms that are supported need to be determined. Nowadays, social communication online is often done with text and emojis, and emojis are now included in unicode text. What is not included in unicode text is images. Therefore, our solution will support communicating news summaries with unicode text and images. For multiple platforms and usage over time, our solution will include: a mobile app for both iOS and Android, web pages representing summaries, email digests, and easy sharing to all social media platforms. We will combine these ideas into a Multiplatform Social Network for News that will start out as the mobile app being used to create summaries and share their web pages on social media, and will hopefully turn into people signing up for Inspect to get email digests, and eventually to download the mobile app, where they will get more immediate notifications of new summaries and create their own summaries. 

For our solution to provide information about source reliability and article truth adjacent to a way for users to decide if an article is true, our solution will use a combination of the human author and crowdsourcing other users. The first component to communicate source reliability and article truth will be snippets taken from the article. These are discussed more in the next paragraph, but for source reliability and article truth, they can communicate the rationale behind the article and any sources that it cites. Not only are the snippets helpful for people who don’t have subscriptions to the news source, they also help avoid legal issues from the news source for plagiarizing their entire articles. Then, when other users see these snippets and understand what the article is about, they can discuss among themselves whether or not they think the article is true. Our solution will use discussion threading like other popular social media platforms do, where users can reply to one another and easily see the discussion history. As they become more confident that they know whether or not the article is true, they will be able to vote on it, and the truth value of the article summary will be whether true or false have more votes. Finally, the news source’s reliability will be recalculated based on how many true articles it has published and have been summarized on Inspect, and it will be visualized with common meta-information methods like color (e.g., red – yellow – green) or opacity. The resulting summaries will be called Human-Evaluated Article Summaries
For our solution to assist authors summarizing articles and improving headlines, we will again use article summaries with snippets, but in this paragraph we will discuss how authors can be assisted by crowdsourcing other users and eventually some technological automation. As implied above, the author can insert the snippets, but we will make it so that other users can suggest snippets to the author, too and, eventually, an automated algorithm will also be available to suggest snippets by intelligently parsing the articles. Similarly, other users can mark the headlines as clickbait, notifying the author that they should change it, and also suggest new headlines that are more informative. Eventually, automation will also be able to mark headlines as clickbait and suggest new headlines that are more informative. The resulting component will be called Semi-Automated Snippet and Headline Selection.

My Solution for Tracking the Reliability of Information

To know what sources to trust, we rely on institutions in a similar way as we do for information itself. However, just like institutions are failing us with information, they are also failing us about what sources to trust. This is why a canonical model of trustworthy information sources created by an organization like Facebook, Google, or the government would not be trusted by a large number of people. Therefore, to reliably create and share source trust data, we need to work with people that we still trust, such as our friends or family. To share trust data with these close connections, my approach is to model the information sources–including our personal connections–and their relationships computationally. 

To computationally model these information sources, relational formalisms like social networks, concept maps, and semantic web languages provide some inspiration in that they explicitly enumerate all the ways that entities are related to each other. Information sources can thus be modeled as having made information claims, as having authors, as being a news agency or a social media poster, and as being trusted or mistrusted by various people, all in the same model. Then these properties and relationships can be used to evaluate the trustworthiness of the sources — e..g, making “is trusted by my friend” one metric of a source’s trustworthiness. 

With this additional “meta-information,” the trustworthiness of information sources can be estimated when you see a news article or social media post, and this estimate can be visualized. Visualizing meta-information can be done with color, size, or opacity — e.g., maybe a source is more trustworthy if it is more opaque, or maybe it is less trustworthy if it is outlined in red instead of green. Along with these basic visualizations, the details on how those visualizations were generated could also be exposed to users who are interested, such as which connections of yours labeled it as trustworthy. This method of exposing the evaluation’s rationale (also known as “explaining” itself) is called progressive disclosure. To combine these ideas, I will create an Ontology-Driven Source Evaluator.

To consistently evaluate the truth of claims, two separate steps are required: 1. enumerate the claims a source is making in an article, post, or other communication, and then 2. evaluate the truth value of these claims. 

To enumerate the claims a source is making, it would be tempting to automatically scrape the content of a news article or social media post from the web page it’s on. However, this is not a great idea, at least initially, for a few reasons. First, lazy loading of content with scripting makes it difficult for an algorithm to know when to parse content. Second, document structures change over time, so successful scraping of content at one point in time will eventually break when the document structure changes. Finally, there are inconsistent document structures across sources, which means that the scraping code will have to be specialized for every information source. 

It would also be tempting to use natural language processing (NLP) to extract the claims from a news article or social media post, but this is problematic as well. First, it’s very difficult since claims are written in different dialects, with slang, and so on. Second, it’s computationally expensive when evaluated against a language model to extract sentences, phrases, and claims. Finally, it’s untrustworthy when done with something like a deep neural network because its evaluation of a claim is completely opaque to the user. 

A better approach, especially to start out with, is to enable the user to label text in a news article or social media post as a claim. This is rather manual, so it can be later supplemented with source-specific scraping templates for high-popularity sources (e.g., the New York Times or Twitter.com) and NLP suggestions of what might be a claim in a body of text. Then these algorithms can be further improved by teaching with previous manual labels about what words or phrases (n-grams) might indicate a claim statement. 

To evaluate the truth value of these claims, user labeling is a tempting approach given its use to identify claims mentioned above. However, while users can be trusted to identify language that indicates a claim, they cannot be expected to reliably evaluate the claims for their truth value because they likely do not have the knowledge to make this judgment, and they very likely have their own biases about what is true and what isn’t. Therefore, it would be better to corroborate a claim made by a source with other trusted sources (including news articles, social media posts, and our personal connections). If they agree on the claim, then it is likely true. Also, the claim can still be manually labeled when a user has additional external knowledge on the subject matter. The resulting component will be the User-Centered Claim Evaluator.

To use true claims to improve source trust data, an iterative process needs to be used because neither the truth of claims nor the trustworthiness of information sources can be entirely determined by themselves. Once you label a claim as true, your connections will see a claim populated with your trust label before it is evaluated against their source trust model. For your and your connections’ source trust models, the trustworthiness of a source can be incrementally increased as it makes more true claims. Over time, this Iterative Truth Propagation Process would converge into a distributed knowledge graph. Eventually, such a process could also generate new insights by synthesizing claims from one’s record of true claims.

Because this process is a recursive loop, it needs to start out with some data for both source trustworthiness and claim truth. Therefore, I will create initial models based on my own knowledge and research. Eventually, additional “experts” can restart this process with their own initial models.


I am working on addressing this problem right now. For more information, feel free to email me at bob@datagotchi.net.

Tracking the Reliability of Information

As the world becomes more connected and more complex, it is increasingly difficult to know what to believe. Ideally, we would simply believe what happens to be true, but events happen far away from us, to other people, and we usually hear about them after the fact. Therefore, we need to trust that other people are telling us the truth. As a society grows, this sort of trust in others results in institutions like religions, cultures, and organizations, which members then look to for beliefs, norms, and values. 

However, creation of new institutions and changes to existing institutions are done by those in power. As a result, much of our society is based on assumptions created by the powerful, such as “it’s important to work and be productive” (capitalism), “it’s more important to take care of yourself and your family first” (individualism), and even things like “smoking is cool (even though it kills you)” or “being skinny is the only way for others to like or respect you” so tobacco and beauty product companies can sell more. Lately there’s been an even larger increase in their use to serve the powerful and reduce institutions’ ability to function in their original framing, including businesses and states reopening to make money during COVID-19 rather than protecting their customers’ health; focusing federal agencies like the EPA on reducing regulations instead of protecting the environment; and the Catholic church being more about protecting child molestors than providing religious community.

Since we can no longer rely on many institutions, we will need to make sense of the world ourselves. That way, we can know the risks and the medical and other effects of reopening businesses; the effects of pollution and their severity; and when we can trust religious leaders. Nowadays, there are so many different sources of information — from news paper websites to social media posts — that it overloads us. Some information sources are more reliable than others, and it’s difficult to know which ones are. Furthermore, information sources contradict each other, either because they have different agendas or are based on different belief systems. 

In spite of these issues with sources, we will still need to refer to external sources for most of our information given the complexity of the world and the information in it. 

  • Therefore, we need to determine what sources we should trust, and which ones we should not. Once we know what sources to trust, we can also trace information back when it’s shared by others to its original source and determine whether or not to trust it. 
  • We can then also start to be confident that what we learn from them is the truth, but we may also want other ways to verify the truth of claims
  • Once we are confident that what we read is true, then we can share it with others. However, for it to be valuable to them, we would still need to share true information with others in a way that they can use it both immediately and in the future.

To determine what sources we should trust, people should not just continue using their currently-trusted sources because they are often trusted for reasons we are not aware of. We could also simply ask our friends and family for their trusted sources, but they may fail to tell us the full truth (intentionally or not), and they probably don’t have a good reason for trusting certain sources, either. Therefore, a more reliable way to determine trusted sources is needed, preferably supplemented by insights from our connections. If we scope the problem to news on the internet, then a solution is needed to reliably create and share source trust data.

To verify the truth of claims, people cannot trust their or their connections’ gut feelings because they are often as opaque to us as our gut feelings on what sources to trust. Furthermore, our gut feelings change over time, even though the truth of a claim does not change. Therefore, a solution is needed to consistently evaluate the truth of claims

Finally, to share true information with others in a way that they can use it both immediately and in the future, it is not sufficient to send it to them. They might read it and learn a piece of information, but a week or month from now they may forget it, or at least the medium in which they received it from you. To be able to use this information in the future, it needs to be saved in a way that helps them interpret information they see in the future. Therefore, a solution is needed to use true claims to improve source trust data that will help them evaluate claims against the trustworthiness of the source.


I am working on addressing this problem right now. For more information, feel free to email me at bob@datagotchi.net.