vanced data-analysis techniques are
applied, perhaps divided into two
main groups: research and modeling.
Valuable information is obtained as
a result of applying these techniques
to the collected data. Metadata is also
generated, reducing the complexity
and processing of queries or operations that must be performed while
endowing the data with meaning.
Data and metadata are stored in a database for future queries, processing,
generation of new metadata, and/or
training and validation of the models.
Inquiry. Here, the system can ac-
waterfall mode, with each subproject be-
gun when the previous one has finished;
for example, each subproject can cover
an individual knowledge block or a tool.
Data architecture dimension. This
dimension identifies the proposed
steps the software engineer performs
during data analysis. The order in
which each task is executed in each of
the steps and its relationship with the
other dimensions of the framework are
specified in the methodology dimension. The data architecture dimension
is divided into levels ranging from
identifying the location and structure
of the data to the display of the results
requested by the organization. Figure
2 outlines the levels that make up the
data architecture, including:
Content. Here, the location and characteristics of the data are identified
(such as format and source of required
data, both structured and unstructured). In addition, the software engineer performs a verification process to
check that data location and characteristics are valid for the next level. Data
can be generated offline, through the
traditional ways of entering data (such
as open data sources and relational
databases in enterprise resource planning, customer relationship management systems, and other management
information systems). In addition, data
can also be obtained online through
social media (such as LinkedIn, Facebook, Google+, and Twitter).
Acquisition. Here, filters and pat-
terns are applied by software engineers
to ensure only valuable data is collect-
ed. Traditional data sources are easier
to link to because they consist of struc-
tured data. But social software poses
a greater technological challenge, as
it contains human information that
is complex, unstructured, ubiquitous,
multi-format, and multi-channel.
Enhancement. The main objectives here are to endow the collected
data with value, identify and extract
information, and discover otherwise
unknown relationships and patterns.
To add such endowment, various ad-
Figure 1. BD-IRIS framework dimensions.
Methodology
Data
Architecture
Organizational
Privacy
and Security
Data Sources Data Quality
Support Tools
Figure 2. Proposed data architecture levels.
Tacit/Explicit
Structured/
Unstructured Data
Offline/Online
Filters Patterns
Connectors
Research
Structured Analysis
Term Analysis
Highlighting
Automatic Language
Detection
Modeling
Sentiment Analysis
Taxonomy
Video Analysis
Classification/
Categorization
Query
Finding
Database
Access
Presentation
User
Interaction
Alerts
Reports
Delivery
Vis
ua
lizati
on
Inq
ui
ry
En
ha
nce
ment
Ac
qui
siti
on
C
ontent
Data Ingestion
Data Integration
Dashboard
Query Plan
Analysis
Query Tools
Data Sources
Data sources and their
characteristics identified
Request for the
necessary information
Access to the database
for making queries
Collected data using
filters and patterns
Result of the
requested queries
Valuable information
stored in the database
required by queries