By: Rebecca Bilbro, Lewis Broerman, and Jordan Higgins
The widespread adoption of advanced machine learning techniques is changing the very nature of analytic work. Very soon, the job of an analyst will become a model maker that selects and trains machines. To date, this model selection has been largely accomplished using scientific computing languages, advanced statistics, probability theory, and iterative hypothesis-testing. These tools and techniques require both a deep knowledge of software and the underlying math, and limits the workforce available to apply them effectively and rapidly.
The ByteCubed concept is to enable human-machine collaboration with augmented reality to leverage human intuition and visual pattern recognition in model selection to increase the size of the workforce able to effectively select and implement highly effective machine learning for analytical missions.
Machine learning is the process of creating predictive models by fitting an algorithm to a set of training data, and then using the fitted model to extrapolate target values, classes, or other patterns of behavior from new unseen data. In practice, the workflow includes (1) selecting and/or engineering the smallest and most predictive feature set, (2) choosing a set of relevant algorithms from a model family, and (3) tuning the algorithm hyperparameters to optimize for performance.
Recently, much of this workflow has been automated through grid search methods, standardized APIs, and GUI-based applications. In practice, human intuition and intervention are more effective at selecting quality models than exhaustive search. This human steering process is best supported with visualization techniques that can enable practitioners to visually interpret the model selection process, diagnose problems throughout the workflow, and steer toward more predictive and performant models.
Why pure automation does not work
Automation in machine learning relies on model and hyperparameter search, however the search space is large because there are many possible models, and many possible hyperparameters for each model. Optimization techniques exist (e.g. genetic algorithms or particle swarm) which can learn the search space and make traversal more efficient, but even with such optimizations, many algorithms will not converge on a solution. Traversing this search space becomes exponentially slower with big data and high-dimensional data. Alternatively, pure automated mathematical model fit may not consider undesired or unweighted negative consequences or biases, and can make it difficult to leverage subject matter expertise.
Human-machine collaboration tools enable interventions in machine optimization. Users are able to engage in the modeling process through visualization, using their visual cortex to guide the process. This engagement leads to better performing models, produced more quickly and with greater insight.
Augmented steering and mixed reality
Augmented, virtual, and mixed reality applications stand to produce significant gains in the area of human-machine collaboration along five main parameters: immersion, dimensionality, collaboration, visual, and aural processing.
- AR/VR provides the ability to more fully engage multiple senses (visual, aural, haptic) in immersive experiences that provide greater context to more dimensions.
- Multi-dimensional data is particularly difficult to explore because humans struggle to visualize more a few dimensions. AR/VR better supports the visualization of high-dimensional data.
- Immersive visualization in AR/VR may result in better understanding and enhanced perception of relationships in data. These benefits are multiplied when people can work together. AR/VR can also provide these benefits with no geographical limitation, providing a sense of presence to participants regardless of physical location.
- The optic nerve processes visual information at roughly 1.25 MB/s, and when we focus on reading words or charts on a screen we use a fraction of this bandwidth. AR/VR has potential to greatly enhance ability for people to process and draw meaning from immersive data visualizations.
- In complex environments auditory cues are a proven means of drawing individual attention to critical information in complex, high visual load environments, such as pilots in a cockpit. 360 positional audio in AR/VR applications can provide similar benefits to compensate for human capacity to process data.
The intended outcome of the further development is a proof of concept mixed reality application for visualization of model selection process. Measures of success would include benchmarking versus current method and analyst usability measures.
ByteCubed seeks a mission partner to collaboratively engage with a problem and data pipeline currently using machine learning. The partner would have access to analyst(s) that could evaluate and validate the effectiveness of the concept and bench mark with current methods being used model selection. Collaborative, venture or other accelerative funding for research and development would be of interest.
Baldwin, C. L., Eisert, J. L., Garcia, A.J., Lewis, B., Pratt, S. M., & Gonzalez, C. (2012). Multimodal Urgency Coding: Auditory, Visual, and Tactile Parameters and Their Impact on Perceived Urgency. Work.
Bengfort, B. (2016). Visualizing the model selection process. Data Day Seattle.
Bilbro, R. (2017). Yellowbrick: Steering machine learning with visual transformers. PyData London.
Elmqvist, N., & Fekete, J.-D. (2010). Hierarchical aggregation for information visualization: Overview, techniques, and design guidelines. Visualization and Computer Graphics, IEEE Transactions on, 16(3), 439–454.
Heer, J., & Shneiderman, B. (2012). Interactive dynamics for visual analysis. Queue, 10(2), 30.
Kumar, A., McCann, R., Naughton, J., Patel, J. (2015). Model selection management systems: The next frontier of advanced analytics. ACM SIGMOD Record.
Seo, J. and Shneiderman, B. (2005). A rank-by-feature frame-work for interactive exploration of multidimensional data. Information Visualization, 4(2):96-113.
Wickham, H., Cook, D., and Hofmann, H. (2015). Visualizing statistical models: Removing the blindfold. Statistical Analysis and Data Mining: The ASA Data Science Journal, 8(4):203-225.