ing these iterative and hierarchical
steps toward qualitative-to-quantitative intelligence transformation would
thus disclose and quantify the initial
problem “unknownness.” Finally, actionable knowledge and insight would
be identified and delivered to busi-nesspeople who would address data
complexities and business goals.
As an example of how to deliver
actionable knowledge, domain-driven data mining7 aims to integrate X-intelligence and X-complexities for
complex knowledge-discovery problems. Domain-driven data mining
advocates a comprehensive process
of synthesizing data intelligence with
other types of intelligence to prompt
new intelligence to address gaps in
existing data-driven methods, delivering actionable knowledge to business users. The metasynthesis of X-complexities and X-intelligence in
complex data science problems might
ultimately produce even super machine intelligence. Super-intelligent
machines could then understand,
represent, and learn X-complexities,
particularly data characteristics; acquire and represent unstructured,
ill-structured, and uncertain human
knowledge; support involvement of
business experts in the analytics process; acquire and represent imaginative and creative thinking in group
heuristic discussions among human
experts; acquire and represent group/
collective interaction behaviors; and
build infrastructure involving X-intelligence. While a data brain cannot
mimic special human imagination,
curiosity, and intuition, the simulation and modeling of human behavior
and human-data systems interaction
and cooperation promise to approach
human-like machine intelligence.
The low-level X-complexities and X-
intelligence characterizing complex
data science problems reflect the gaps
between the world of hidden data and
existing data science immaturity. Fill-
ing them requires a disciplinewide
effort to build complex data science
thinking and corresponding method-
ologies from a complex-system per-
spective. The emerging data science
evolution means opportunities for
breakthrough research, technological
innovation, and a new data economy.
If parallels are drawn between evolu-
tion of the Internet and evolution of
data science, the future and the socio-
economic and cultural impact of data
science will be unprecedented indeed,
though as yet unquantifiable.
1. Cao, L. B. In-depth behavior understanding and use:
The behavior informatics approach. Information
Science 180, 17 (Sept. 2010), 3067–3085.
2. Cao, L.B. Non-IIDness learning in behavioral and
social data. The Computer Journal 57, 9 (Sept. 2014),
3. Cao, L.B. Metasynthetic Computing and Engineering
of Complex Systems. Springer-Verlag, London, U. K.,
4. Cao, L.B. Data science: Nature and pitfalls. IEEE
Intelligent Systems 31, 5 (Sept.-Oct. 2016), 66–75.
5. Cao, L.B. Data science: A comprehensive overview.
ACM Computing Surveys (to appear).
6. Cao, L. B. Understanding Data Science. Springer, New
York (to appear).
7. Cao, L. B., Yu, P.S., Zhang, C., and Zhao, Y. Domain
Driven Data Mining. Springer, Springer-Verlag, New
8. Cao, L.B., Yu, P.S., and Kumar, V. Nonoccurring behavior
analytics: A new area. IEEE Intelligent Systems 30, 6
(Nov. 2015), 4–11.
9. Cleveland, W.S. Data science: An action plan for
expanding the technical areas of the field of statistics.
International Statistical Review 69, 1 (Dec. 2001),
10. Diggle, P.J. Statistics: A data science for the 21st
century. Journal of the Royal Statistical Society: Series
A (Statistics in Society) 178, 4 (Oct. 2015), 793–813.
11. Donoho, D. 50 Years of Data Science. Computer
Science and Artificial Intelligence Laboratory, MI T,
Cambridge, MA, 2015; http://courses.csail.mit.
12. Huber, P.J. Data Analysis: What Can Be Learned
from the Past 50 Years. John Wiley & Sons, Inc.,
New York, 2011.
13. Jagadish, H., Gehrke, J., Labrinidis, A.,
Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R.,
and Shahabi, C. Big data and its technical challenges.
Commun. ACM 57, 7 (July 2014), 86–94.
14. Kramer, A.D., Guillory, J.E., and Hancock, J. T.
Experimental evidence of massive-scale emotional
contagion through social networks. Proceedings of the
National Academy of Sciences 111, 24 (Mar. 2014),
15. Lazer, D., Kennedy, R., King, G., and Vespignani, A.
The parable of Google flu: Traps in big data analysis.
Science 343, 6176 (Mar. 2014), 1203–1205.
16. Manyika, J. and Chui, M. Big Data: The Next Frontier for
Innovation, Competition, and Productivity. McKinsey
Global Institute, 2011.
17. Matsudaira, K. The science of managing data science.
Commun. ACM 58, 6 (June 2015), 44–47.
18. Mattmann, C.A. Computing: A vision for data science.
Nature 493, 7433 (Jan. 24, 2013), 473–475.
19. Mitchell, M. Complexity: A Guided Tour. Oxford
University Press, Oxford, U.K., 2011.
20. Qian, X., Yu, J., and Dai, R. A new discipline of science—
The study of open complex giant system and its
methodology. Journal of Systems Engineering and
Electronics 4, 2 (June 1993), 2–12.
21. Rowley, J. The wisdom hierarchy: Representations
of the DIKW hierarchy. Journal of Information and
Communication Science 33, 2 (Apr. 2007), 163–180.
22. Suchma, L. Human-Machine Reconfigurations: Plans
and Situated Actions. Cambridge University Press,
Cambridge, U.K., 2006.
23. Tukey, J. W. The future of data analysis. The Annals of
Mathematical Statistics 33, 1 (Mar. 1962), 1–67.
24. Tukey, J. W. Exploratory Data Analysis. Pearson, 1977.
Longbing Cao ( firstname.lastname@example.org) is a professor
in the Advanced Analytics Institute at the University of
Technology Sydney, Australia.
Copyright held by the author.
Publication rights licensed to ACM. $15.00
ing existing analytical methodologies,
typical cross-enterprise, global, and
Internet-based data science projects
(such as global financial crisis and
terrorist activities) satisfy most if
not all such complexities. This level
of complex data science involves X-complexities problems, and their
resolution must first synthesize the
X-intelligence in the problems. One
approach to instantiate the systematism methodology is “
qualitative-to-quantitative metasynthesis,” 3, 20 as
proposed by Chinese scientist Xue-sen Qian (also known as Hsue-Shen
Tsien) to guide system engineering
in large-scale open systems. 20 Such
qualitative-to-quantitative metasynthesis supports exploration of open
complex systems through an iterative
cognitive and problem-solving process on a human-centered, human-machine-cooperative problem-solving
platform in which human, data, and
machine intelligence, along with X-intelligence, must be engaged, quantified, and synthesized. Implementing
it for open complex intelligent systems, the “metasynthetic computing
and engineering” (MCE) approach3
provides a systematic computing and
engineering guide and suite of system-analysis tools.
Figure 7 outlines the process of applying the qualitative-to-quantitative
metasynthesis methodology to complex data science problems. MCE supports an iterative, hierarchical problem-solving process, incorporating
internal and external inputs, including data, information, domain knowledge, initial hypotheses, and underlying environmental factors. Data
scientists would start by presetting
analytics goals and tasks to be explored on the given data by incorporating domain, organizational, social and
environmental complexities and intelligence. They would then use preliminary observations obtained from domain and experience to identify and
verify qualitative and quantitative hypotheses and estimations that guide
development of modeling and analytics methods. Findings would then be
evaluated and fed back to the corresponding procedures for refining and
optimizing understanding of previously unknown problem challenges,
goals, and discovery methods. Follow-