Technology Innovation
Many frame the primary problem with artificial intelligence (AI) systems as users not trusting them, or as the AI systems not being trustworthy. However, automated technologies have always been a problem because they are not resilient in the face of unforeseen circumstances. In other words, AI systems are not trustworthy because they are unhelpful for data they are not trained on.
One solution to make these AI systems more resilient is to show where their outputs came from, and, if it is unable to provide a high-quality answer, why it was not trained on relevant data. However, many modern AI systems are based on deep neural networks. Therefore, they cannot practically show the path through their network that resulted in an output because they are billions or trillions of parameters large.
(1) To be resilient in the face of unforeseen circumstances, AI systems must show rationale that users can understand and react to when the systems fail, rather than just forcing users to recall them or assume that the system is always right. Therefore, our high-risk technology innovation will be a Collaborative Copilot that uses interpretable models. This is high risk because interpretable models often do not scale well with data, are not easy to develop like deep neural networks these days due to the tools from tech giants, and are expensive to train.
For interpretable models, linear rules would only be accurate for a very niche domain of data. Their extension into decision trees would not be easy to use because they cannot be automatically pruned. Bayesian networks would have similar problems, so we will use Causal Influence Models (CIMs). CIMs are Bayesian network UIs with connections going in the direction of influence and have colors and weighted lines to show the influence sign and strength.
Although relatively easy to understand, forcing users to read through full CIMs would still result in significant cognitive overload.
(2) To minimize cognitive overload, information system UIs must fit users’ mental states when they are doing work. To explicitly consider users’ mental states, they should be objectively modeled, rather than just guessed by the developers or inferred from biased surveys of users. Therefore, we will supplement the Collaborative Copilot development by designing and empirically evaluating Mixed-Initiative Workflows using a family of research methods called Cognitive Task Analysis (CTA). However, CTA models of users may not match users in the real world, nor do they capture the higher-level objectives and constraints of their work. Therefore, we will also measure the professional contexts of users using a family of research methods called Cognitive Work Analysis (CWA), which outline the objectives and goals of the users and map them to their activities and tasks.
Technical Objectives and Challenges
1. Prototype and Evaluate the Collaborative Copilot
We will build interpretable models with Python tools like Pandas, NumPy, SciKit-learn, and Matplotlib, and will compare our models to deep neural networks created with tools like TensorFlow and PyTorch. We will evaluate these models and compare them with deep neural networks on four metrics:
- High output quality: We will start with accuracy and machine learning variations on it like the receiver operating characteristic (ROC) and area under the ROC curve. Deep neural networks are famous for being very high quality, so this will be our primary metric.
- Understandability of outputs: We will verify that users can understand the machine’s output and determine if they are mistaken. We will do this by comparing users’ mental states with the Copilot’s models.
- Ease of development: Because interpretable models do not train and scale with backpropagation like neural networks, more technical expertise is likely required. We will specifically consider the technical expertise needed to run expectation maximization (EM) training on the networks.
- Training costs: Large corporations have the resources to build deep neural networks with large amounts of data, while smaller businesses and academia often do not. Therefore, we will explicitly measure the costs for the EM training. For expensive cases, we will consider incremental learning, where they can start out as linear rules, then become networks of rules, and finally become Bayesian networks.
2. Design and Evaluate Mixed-Initiative Interaction Workflows
We will design these workflows with mixed-initiative interaction: utilizing software to process data, generate models, and make suggestions to the user, and enabling users to not only request suggestions when they need them, but to also critique them when they are confusing or mistaken.
The workflows will be evaluated in two ways: (1) by matching the states of the users during tasks to the system’s output: to ensure that users understand the machine’s outputs, and (2) the performance of users and the software accomplishing the tasks together: to ensure that users can correct or take over for the system when necessary to increase resilience.
3. Validate Commercial Viability and Impact
In the Phase II, we will apply the innovations to a use case we have researched over the past few years: reducing information asymmetries between job candidates and recruiters to improve the efficiency of hiring. Therefore, in Phase I, we will work with user representatives from our previous research and development to confirm their commercial viability and impact.
Market Opportunity
- Models: To ensure general use of the models we develop by individuals, they need to always be free. Furthermore, to enable academic research of them and both individual and commercial modification of them for different uses, they should be open source. However, we could offer support services for them to businesses.
- Mixed-Initiative Workflows: The mixed-initiative workflows would likely be exposed via research papers, and thus would be de facto free and open source. However, for commercial usage, they could also be patented and licensed.
- Candidate and Recruiter Tools: The candidate tools were always envisioned to be free to ensure high adoption. Furthermore, since the initial target users were senior software engineers, they could also be open source to enable customizations and improvements. The recruiter tools, on the other hand, are the primary market opportunity for Phase II’s use case. Whether the users are external or internal recruiters, they likely have a budget they can use to pay us.
Company and Team
- Founder: I have a diverse background in technology and data science. I have a BS in Computer Science, and an MS in Information Science. I researched AI throughout undergrad and grad school. After grad school, I was hired in the government R&D field as a decision-support analytics research scientist and software engineer. I wrote SBIR and STTR proposals on human-machine collaboration and mixed-initiative interfaces, won many of them, and led the resulting projects. In other words, I have focused my entire career on these types of problems. I have also collaborated with many of the relevant researchers who have championed resilience engineering, cognitive task analysis, and cognitive agent rationale, and who would likely be happy to consult for this project. I then moved to California and have worked with several technology startups as a full-stack engineer, UX expert, and product manager.
- Company Background and Status: Datagotchi Labs is an R&D lab I’m incubating to use the skills I’ve gained in my career. I have explored several ways to make money, starting with a Patreon page, spinning off this project to raise investments, and running Kickstarters for the consumer-side of the projects. Since none of these have been successful, in addition to pitching to the NSF Seedfund, I am currently working on a marketing campaign to better establish myself on social media to attract some consulting leads and offer freelancing services.