Joel Chan | Sensemaking technologies for collaboration at scale

In many collective creativity systems (where the number of contributors can number in the thousands to tens of thousands), enabling true collaboration is a difficult challenge. Without effective mechanisms for surfacing key insights and sharing them among collaborators, many collective creativity efforts devolve into largely independent work. This often results in a preponderance of redundant, shallow, and bad ideas. We want to know: how might we build sensemaking technologies that support effective collaboration at scale?

This research is primarily conducted with awesome collaborators, since I lack a background in machine learning. What I bring to the table is a deep understanding of what kinds of sensemaking are useful for creativity, and what representational requirements are necessary for supporting those kinds of sensemaking (e.g., relational knowledge for analogy), and techniques from crowdsourcing that can support or work in combination with novel machine learning systems.

What we’ve learned so far

Even simple, off-the-shelf machine sensemaking (e.g., Latent Semantic Analysis and k-means clustering) can improve people’s interaction with prior knowledge in a way that improves creativity (Chan et al., 2016).
Most effective computational sensemaking tools are powered by human semantic judgments that are very tedious for humans to provide, making these tools expensive to build and improve, especially if domain knowledge is necessary. We’ve discovered a design pattern called “integrated crowdsourcing” which enables us to make large-scale collection of semantic judgments tractable by seamlessly integrating the judgments into primary tasks that people are already intrinsically motivated to perform (Siangliulue et al., 2016). Read a write-up about this here.
Analogy is really useful. But really hard to do computationally (Fu et al., 2013; Fu et al., 2013). Fortunately, we’ve found that combinations of {crowdsourcing, machine learning, and relaxing the requirement for fully-specified relational knowledge} can get us surprisingly close to human-like analogical reasoning over real-world documents (Kittur et al., 2019), ranging from relatively simple consumer product descriptions (Chan et al., 2016; Hope et al., 2017) to complex research paper abstracts (Chan et al., 2018). We’ve also developed new techniques that can help users explore different ways to express their queries (with abstractions) so that our algorithms can do a better job of finding analogies that are both useful and from different domains (Gilon et al., 2018).

What’s next

Here are some things we’re currently working on and/or pondering:

Can we create a scientific communication ecosystem (or modify the existing ones) where it becomes natural and effortless to communicate knowledge in more machine-readable ways?
How far can we get with “analogy-lite” representations of documents?
How can we scalably (e.g., [quasi]automatically) create human-readable/interpretable representations of large document collections?
Why is literature reviewing so painful? And how might computing systems help?

Related publications

Scaling up Analogical Innovation with Crowds and AI Kittur, Aniket, Yu, Lixiu, Hope, Tom, Chan, Joel, Lifshitz-Assaf, Hila, Gilon, Karni, Ng, Felicia, Kraut, Robert E., and Shahaf, Dafna Proceedings of the National Academy of Sciences 2019 [Abstract] [PDF]
Analogy—the ability to find and apply deep structural patterns across domains—has been fundamental to human innovation in science and technology. Today there is a growing opportunity to accelerate innovation by moving analogy out of a single person’s mind and distributing it across many information processors, both human and machine. Doing so has the potential to overcome cognitive fixation, scale to large idea repositories, and support complex problems with multiple constraints. Here we lay out a perspective on the future of scalable analogical innovation and first steps using crowds and artificial intelligence (AI) to augment creativity that quantitatively demonstrate the promise of the approach, as well as core challenges critical to realizing this vision.
SOLVENT: A Mixed Initiative System for Finding Analogies between Research Papers Chan, Joel, Chang, Joseph Chee, Hope, Tom, Shahaf, Dafna, and Kittur, Aniket Proceedings of ACM Human-Computer Interaction: CSCW 2018 [Abstract] [PDF]
Scientific discoveries are often driven by finding analogies in distant domains, but the growing number of papers makes it difficult to find relevant ideas in a single discipline, let alone distant analogies in other domains. To provide computational support for finding analogies across domains, we introduce SOLVENT, a mixed-initiative system where humans annotate aspects of research papers that denote their background (the high-level problems being addressed), purpose (the specific problems being addressed), mechanism (how they achieved their purpose), and findings (what they learned/achieved), and a computational model constructs a semantic representation from these annotations that can be used to find analogies among the research papers. We demonstrate that this system finds more analogies than baseline information-retrieval approaches; that annotators and annotations can generalize beyond domain; and that the resulting analogies found are useful to experts. These results demonstrate a novel path towards computationally supported knowledge sharing in research communities
Analogy Mining for Specific Design Needs Gilon, Karni, Chan, Joel, Ng, Felicia Y, Assaf, Hila Lifshitz, Kittur, Aniket, and Shahaf, Dafna In Proceedings of the 2018 ACM SIGCHI Conference on Human Factors in Computing 2018 [Abstract] [PDF]
Finding analogical inspirations in distant domains is a powerful way of solving problems. However, as the number of inspirations that could be matched and the dimensions on which that matching could occur grow, it becomes challenging for designers to find inspirations relevant to their needs. Furthermore, designers are often interested in exploring specific aspects of a product– for example, one designer might be interested in improving the brewing capability of an outdoor coffee maker, while another might wish to optimize for portability. In this paper we introduce a novel system for targeting analogical search for specific needs. Specifically, we contribute an analogical search engine for expressing and abstracting specific design needs that returns more distant yet relevant inspirations than alternate approaches.
Accelerating Innovation Through Analogy Mining Hope, Tom, Chan, Joel, Kittur, Aniket, and Shahaf, Dafna In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2017 Best Paper [Abstract] [PDF]
The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery by providing people with inspiration from solutions to analogous problems. However, finding useful analogies in these large, messy, real-world repositories remains a persistent challenge for either human or automated methods. Previous approaches include costly hand-created databases that have high relational structure (e.g., predicate calculus representations) but are very sparse. Simpler machine-learning/information-retrieval similarity metrics can scale to large, natural-language datasets, but struggle to account for structural similarity, which is central to analogy. In this paper we explore the viability and value of learning simpler structural representations, specifically, “problem schemas”, which specify the purpose of a product and the mechanisms by which it achieves that purpose. Our approach combines crowdsourcing and recurrent neural networks to extract purpose and mechanism vector representations from product descriptions. We demonstrate that these learned vectors allow us to find analogies with higher precision and recall than traditional information-retrieval methods. In an ideation experiment, analogies retrieved by our models significantly increased people’s likelihood of generating creative ideas compared to analogies retrieved by traditional methods. Our results suggest a promising approach to enabling computational analogy at scale is to learn and leverage weaker structural representations.
Scaling up Analogy with Crowdsourcing and Machine Learning. Chan, Joel, Hope, Tom, Shahaf, Dafna, and Kittur, Aniket In ICCBR Workshops 2016 [Abstract] [PDF]
Despite tremendous advances in computational models of human analogy, a persistent challenge has been scaling up to find useful analogies in large, messy, real-world data. The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery in a way never previously possible. Previous approaches have been limited by relying on hand-created databases that have high relational structure but are very sparse (e.g., predicate calculus representations). Traditional machine-learning/information-retrieval similarity metrics (e.g., LSA) can scale to large, natural-language datasets; however, while these methods are good at detecting surface similarity, they struggle to account for structural similarity. In this paper, we pro- pose to leverage crowdsourcing techniques to construct a dataset with rich “analogy-tuning” signals, used to guide machine learning models towards matches based on relations rather than surface features. We demonstrate our approach with a crowdsourced analogy identification task, whose results are used to train deep learning algorithms. Our initial results suggest that a deep learning model trained on positive/negative example analogies from the task can find more analogous matches than an LSA baseline, and that incorporating behavioral signals (such as queries used to retrieve an analogy) can further boost its performance.
IdeaHound: Improving Large-scale Collaborative Ideation with Crowd-Powered Real-time Semantic Modeling Siangliulue, Pao, Chan, Joel, Dow, Steven P., and Gajos, Krzysztof Z. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology 2016 [Abstract] [PDF]
Prior work on creativity support tools demonstrates how a computational semantic model of a solution space can enable interventions that substantially improve the number, quality and diversity of ideas. However, automated semantic modeling often falls short when people contribute short text snippets or sketches. Innovation platforms can employ humans to provide semantic judgments to construct a semantic model, but this relies on external workers completing a large number of tedious micro tasks. This requirement threatens both accuracy (external workers may lack expertise and context to make accurate semantic judgments) and scalability (external workers are costly). In this paper, we introduce IdeaHound, an ideation system that seamlessly integrates the task of defining semantic relationships among ideas into the primary task of idea generation. The system combines implicit human actions with machine learning to create a computational semantic model of the emerging solution space. The integrated nature of these judgments allows IDEAHOUND to leverage the expertise and efforts of participants who are already motivated to contribute to idea generation, overcoming the issues of scalability inherent to existing approaches. Our results show that participants were equally willing to use (and just as productive using) IDEAHOUND compared to a conventional platform that did not require organizing ideas. Our integrated crowdsourcing approach also creates a more accurate semantic model than an existing crowdsourced approach (performed by external crowds). We demonstrate how this model enables helpful creative interventions: providing diverse inspirational examples, providing similar ideas for a given idea and providing a visual overview of the solution space.
Comparing Different Sensemaking Approaches for Large-Scale Ideation Chan, Joel, Dang, Steven, and Dow, Steven P. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems 2016 [Abstract] [PDF]
arge-scale idea generation platforms often expose ideators to previous ideas. However, research suggests people generate better ideas if they see abstracted solution paths (e.g., descriptions of solution approaches generated through human sensemaking) rather than being inundated with all prior ideas. Automated and semi-automated methods can also offer interpretations of earlier ideas. To benefit from sensemaking in practice with limited resources, ideation platform developers need to weigh the cost-quality tradeoffs of different methods for surfacing solution paths. To explore this, we conducted an online study where 245 participants generated ideas for two problems in one of five conditions: 1) no stimuli, 2) exposure to all prior ideas, or solution paths extracted from prior ideas using 3) a fully automated workflow, 4) a hybrid human-machine approach, and 5) a fully manual approach. Contrary to expectations, human-generated paths did not improve ideation (as meas- ured by fluency and breadth of ideation) over simply showing all ideas. Machine-generated paths sometimes significantly improved fluency and breadth of ideation over no ideas (although at some cost to idea quality). These findings suggest that automated sensemaking can improve idea generation, but we need more research to understand the value of human sensemaking for crowd ideation.
Expert representation of design repository space: A comparison to and validation of algorithmic output Fu, Katherine, Chan, Joel, Schunn, Christian, Cagan, Jonathan, and Kotovsky, Kenneth Design Studies 2013 [Abstract] [PDF]
Development of design-by-analogy tools is a promising design innovation research avenue. Previously, a method for computationally structuring patent databases as a basis for an automated design-by-analogy tool was introduced. To demonstrate its strengths and weaknesses, a computationally-generated structure is compared to four expert designers’ mental models of the domain. Results indicate that, compared to experts, the computationally-generated structure is sensible in clustering of patents and organization of clusters. The computationally-generated structure represents a space in which experts can find common ground/consensus making it promising to be intuitive/accessible to broad cohorts of designers. The computational method offers a resource-efficient way of usefully conceptualizing the space that is sensible to expert designers, while maintaining an element of unexpected representation of the space.
The Meaning of Near and Far: The Impact of Structuring Design Databases and the Effect of Distance of Analogy on Design Output Fu, Katherine, Chan, Joel, Cagan, Jonathan, Kotovsky, Kenneth, Schunn, Christian, and Wood, Kristin Journal of Mechanical Design 2013 [Abstract] [PDF]
This work lends insight into the meaning and impact of “near” and “far” analogies. A cognitive engineering design study is presented that examines the effect of the distance of analogical design stimuli on design solution generation, and places those findings in context of results from the literature. The work ultimately sheds new light on the impact of analogies in the design process and the significance of their distance from a design problem. In this work, the design repository from which analogical stimuli are chosen is the U.S. patent database, a natural choice, as it is one of the largest and easily accessed catalogued databases of inventions. The “near” and “far” analogical stimuli for this study were chosen based on a structure of patents, created using a combination of latent semantic analysis and a Bayesian based algorithm for discovering structural form, resulting in clusters of patents connected by their relative similarity. The findings of this engineering design study are juxtaposed with the findings of a previous study by the authors in design by analogy, which appear to be contradictory when viewed independently. However, by mapping the analogical stimuli used in the earlier work into similar structures along with the patents used in the current study, a relationship between all of the stimuli and their relative distance from the design problem is discovered. The results confirm that “near” and “far” are relative terms, and depend on the characteristics of the potential stimuli. Further, although the literature has shown that “far” analogical stimuli are more likely to lead to the generation of innovative solutions with novel characteristics, there is such a thing as too far. That is, if the stimuli are too distant, they then can become harmful to the design process. Importantly, as well, the data mapping approach to identify analogies works, and is able to impact the effectiveness of the design process. This work has implications not only in the area of finding inspirational designs to use for design by analogy processes in practice, but also for synthesis, or perhaps even unification, of future studies in the field of design by analogy.

Back up to main Research page See all papers

What we’ve learned so far

What’s next

Related publications

More