vanced data-analysis techniques are
applied, perhaps divided into two
main groups: research and modeling.
Valuable information is obtained as
a result of applying these techniques
to the collected data. Metadata is also
generated, reducing the complexity
and processing of queries or operations that must be performed while
endowing the data with meaning.
Data and metadata are stored in a database for future queries, processing,
generation of new metadata, and/or
training and validation of the models.
Inquiry. Here, the system can ac-
waterfall mode, with each subproject be-
gun when the previous one has finished;
for example, each subproject can cover
an individual knowledge block or a tool.
Data architecture dimension. This
dimension identifies the proposed
steps the software engineer performs
during data analysis. The order in
which each task is executed in each of
the steps and its relationship with the
other dimensions of the framework are
specified in the methodology dimension. The data architecture dimension
is divided into levels ranging from
identifying the location and structure
of the data to the display of the results
requested by the organization. Figure
2 outlines the levels that make up the
data architecture, including:
Content. Here, the location and characteristics of the data are identified
(such as format and source of required
data, both structured and unstructured). In addition, the software engineer performs a verification process to
check that data location and characteristics are valid for the next level. Data
can be generated offline, through the
traditional ways of entering data (such
as open data sources and relational
databases in enterprise resource planning, customer relationship management systems, and other management
information systems). In addition, data
can also be obtained online through
social media (such as LinkedIn, Facebook, Google+, and Twitter).
Acquisition. Here, filters and pat-
terns are applied by software engineers
to ensure only valuable data is collect-
ed. Traditional data sources are easier
to link to because they consist of struc-
tured data. But social software poses
a greater technological challenge, as
it contains human information that
is complex, unstructured, ubiquitous,
multi-format, and multi-channel.
Enhancement. The main objectives here are to endow the collected
data with value, identify and extract
information, and discover otherwise
unknown relationships and patterns.
To add such endowment, various ad-
Figure 1. BD-IRIS framework dimensions.
Data Sources Data Quality
Figure 2. Proposed data architecture levels.
Data sources and their
Request for the
Access to the database
for making queries
Collected data using
filters and patterns
Result of the
stored in the database
required by queries