Data Virtualization Primer

The concepts

 

The Concepts

  •  Virtual Databases - Learn about how DV encapsulates your data access as a database.
  •  Models - Learn about how DV represents metadata.
  •  Connectors - Learn about how DV connects to enterprise information systems.
  •  Data Services - Learn about how DV exposes data service.
  •  SOAs - Learn about DV's role in an SOA world.

Originally published at http://teiid.jboss.org/basics/index.html

The Concepts - VDB's

A virtual database (or VDB) is a container for components used to integrate data from multiple data sources, so that they can be accessed in an integrated manner through a single, uniform API. 

A VDB contains models, which define the structural characteristics of data sources, views, and Web services.

MOVE DOWN TO EXPLORE THE CONCEPT OR RIGHT FOR THE NEXT CONCEPT

VDB Types

There are two types VDBs available.

  • Dynamic VDB is defined using a simple XML file. This XML file defines the sources it is trying to integrate and then provides access through JDBC where user queries can be written against this VDB using all the sources defined as if they are in single source. Dynamic VDB does not offer view/abstact layers.
  • Teiid Designer, a Eclipse-based GUI tool can be used to create VDBs. This Eclipse-based tool lets you not only define source models and import metadata and statistics from them, but also allows you to define relational and XML views on top of those sources. This allows you to abstract the structure of the information you expose to and use in your applications from the underlying physical data structures.

VDB and Models

VDBs can contain one or more models representing the information to be integrated and exposed to consuming applications. Models must be in a valid state in order for the VDB to be used for data access. Validation of a single model means that it must be in a self-consistent and complete state, meaning that there are no "missing pieces" and no references to non-existent entities. Validation of multiple models checks that all inter-model dependencies are present and resolvable.​

VDB deployed

The VDB needs to be deployed to a Teiid Server, if there are no errors during deployment and underlying data sources are configured correctly, then VDB will be accessible to your client application.  Once VDB is deployed, your VDB can be accessed through JDBC-SQL, SOAP (Web Services), SOAP-SQL, or XQuery.

Model Types

VDBs contain two primary varieties of model types :

  • Source models represent the structure and characteristics of physical data sources
  • View models represent the structure and characteristics of abstract structures you want to expose to your applications.

Source models must be associated with a Translator and a Resource Adaptor.

Translator

A Translator provides a abstraction layer between Teiid Query Engine and physical data source, that knows how to convert Teiid issued query commands into source specific commands and execute them using the Resource Adaptor. It also have smarts to convert the result data that came from the physical source into a form that Teiid Query engine is expecting.

resource adaptor

A Resouce Adaptor provides the connectivity to the physical data source. This also provides way to natively issue commands and gather results. A Resource Adaptor can be a RDBMS data source, Web Service, text file, connection to main frame etc. This is often a JCA Connector.

translator and RA configuration

You can define configuration for Translators and Resource Adaptors in Teiid Designer.  Once defined, Translator information along with the JNDI name of the Resource Adaptor is stored with a VDB, so that when a VDB is exchanged, the existing settings can be used.

 

Typically Resource Adaptor configuration information contains user-ids, passwords, URLs to the physical data sources. This information is not stored with the VDB. These are automatically created by Designer for development purposes, however user need to migrate or create new ones for the production environment themselfs using the provided tools like Admin Console.

VDB execution and testing

VDBs can be tested in Teiid Designer by issuing SQL queries in the SQL Explorer perspective. In this way, you can iterate between defining your integration models and testing them out to see if they are yielding the expected results.

 

Your VDB must define its Translator and Resource Adapter with all source models in order to be executable.

 

VDBs are stored in an archive file format, similar to a standard Java JAR format.

 

Dynamic VDBs are XML files. The schema for the XML file can be found in the Teiid documents.

The Concepts - Models

A model is a representation of a set of information constructs. A familiar model is the relational model, which defines tables composed of columns and containing records of data. Another familiar model is the XML model, which defines hierarchical data sets.  In Teiid, models are used to define the entities, and relationships between those entities, required to fully define the integration of information sets so that they may be accessed in a uniform manner using a single API and access protocol.

MOVE DOWN TO EXPLORE THE CONCEPT OR RIGHT FOR THE NEXT CONCEPT

Source and view models

Source models define the structural and data characteristics of the information contained in data sources. Teiid uses the information in source models to access the information in multiple sources, so that from a user's viewpoint these all appear to be in a single source.

 

In addition to source models, Teiid provides the ability to define a variety of view models. These can be used to define a layer of abstraction above the physical layer, so that information can be presented to end users and consuming applications in business terms rather than as it is physically stored. These business views can be in a variety of forms: relational, XML, or Web services. Views are defined using transformations between models.

Source and view models

Teiid Designer can be used to model a variety of classes of models. Each of these represent a conceptually different classification of models.

  • Relational, which model data that can be represented in table – columns and records – form. Relational models can represent structures found in relational databases, spreadsheets, text files, or simple Web services.
  • XML, which model the basic structures of XML documents. These can be “backed” by XML Schemas. XML models represent nested structures, including recursive hierarchies.
  • XML Schema, the W3C standard for formally defining the structure and constraints of XML documents, as well as the datatypes defining permissible values in XML documents.
  • Web Services, which define Web service interfaces, operations, and operation input and output parameters (in the form of XML Schemas).
  • Model Extensions, for defining property name/value extensions to other model classes.

The Concepts - connectors

A Translator provides an abstraction layer between DV Query Engine and physical data source, that knows how to convert DV issued query commands into source specific commands and execute them using the Resource Adaptor.

MOVE DOWN TO EXPLORE THE CONCEPT OR RIGHT FOR THE NEXT CONCEPT

translators

A Translator provides an abstraction layer between Teiid Query Engine and physical data source, that knows how to convert Teiid issued query commands into source specific commands and execute them using the Resource Adaptor. It also have smarts to convert the result data that came from the physical source into a form that Teiid Query engine is expecting.

 

Teiid provides various pre-built translators for sources like Oracle, DB2, SQL Server, MySQL, PostgreSQL, XML, File etc.

translators (Continued)

A Translator also defines the capabilities of a perticular source, like whether it can natively support query joins (inner joins, cross joins etc) or support criteria.

 

A Translator along with its Resource Adaptor is always must be configured on a Source Model. Cross-source queries issued against a VDB running in Teiid result in source queries being issued to translator, which interact with the physical data sources.

translators (Continued)

A Translator is defined by using one of the default pre-built ones, or you can override the default properties of the pre-built ones to define your own. The tooling will provide mechanisms to define override translators.

 

The "Developer's Guide" on how to create a custom Translator that works with your Resource Adaptor.

resource adaptors

A Resouce Adaptor provides the connectivity to the physical data source. This also provides way to natively issue commands to the source and gather results. A Resource Adaptor can be a RDBMS data source, Web Service, text file, connection to main frame or to a custom source you defined. This is often is JCA Connector, however there is no restriction how somebody provides the connection semantics to the Translator.

 

However, if your source needs participate in distributed XA transactions, then this must be a JCA connector. Other than providing transactions, JCA defines how to do configuration, packaging and deployment. This also provides a standard interaction model with the Container, connection pools etc.  It can be used for more than just Teiid data integration purposes.

resource adaptors (continued)

A instance of resouce adaptor is created by defining a "-ds.xml" file in the JBoss AS. This is same operation that is used to create Data Sources in JBoss AS.

 

The "Developer's Guide" on how to create a custom Resource Adaptor.

translator capabilities

Translator capabilities define what processing each translator/source combination can perform. For example, most relational sources can process joins and unions, whereas when processing delimited text files these operations cannot be performed by the resource adaptor or the "source" (in this case, the file system).

 

Capabilities are used by the Teiid query engine to determine what subsets of the overall federated query plan can be pushed down to each source involved in the query.

translator capabilities (continued)

Translator capabilities define the capabilities of a source in terms of language features (joins, criteria, functions, unions, sorts, etc). In addition, the source model defined in a virtual database may specify additional constraints at the metadata level, such as whether a column can be used in an exact match or wildcard string match, whether tables and columns can be updated, etc. In combination, these features can be used to more narrowly constrain how users access a source.

 

The Concepts - data services

A data service is a standards-based, uniform means of accessing information in a form useful to business applications.

MOVE DOWN TO EXPLORE THE CONCEPT OR RIGHT FOR THE NEXT CONCEPT

Abstraction

Since data is rarely in a form required by applications and services, and is often not even in a single data source, a key requirement for data services is that they abstract the data from its physical persistence structure, presenting it in a form that is closer to the needs of the using application. This effectively decouples consuming applications from the structure of the underlying data.

 

Hand-in-hand with abstraction, a federated query engine is required to execute the transformations defining the abstraction layers in an efficient manner, and to expose the abstracted structures through uniform and standard APIs.

key components

The two key components of a data services architecture, then, are:

  • Modeling environment, to define the abstraction layers -- views and Web services
  • Execution environment, to actualize the abstract structures from the underlying data, and expose them through standard APIs. A query engine is a required part of the execution environment, to optimally federate data from multiple disparate sources.

technical and business viewpoints

Teiid provides a suite of projects that provide data services to business applications. That is, Teiid provides a means to access integrated data from multiple data sources, through your preferred standards-based API. Teiid provides access to federated information through JDBC (SQL or XQuery), ODBC (SQL or XQuery), and SOAP (Web services).

 

A more business- or user-centric view of data services is that they are information representations required by business applications. From this perspective, data services are defined and designed by business analysts, modelers, and developers to represent the information structures required by business applications. Often, a key design goal is one of interoperability - the requirement that systems work together seamlessly, including when exchanging data. Teiid provides graphical and other tools for defining these interoperable data services, essentially relational and XML views that can be used by business applications in a semantically-meaningful manner.

technical and business viewpoints

These two viewpoints roughly correspond to the Execution and Modeling components of a data services solution, respectively.

essential part of SOA and 

Microservices

Data services are a key part of a service-oriented architecure, or SOA. They provide the necessary interface to data for all business services.

  •  Expose all data through a single uniform interface
  •  Provide a single point of access to all business services in the system
  •  Expose data using the same paradigm as business services - as "data services"
  •  Expose legacy data sources as data services
  •  Provide a uniform means of exposing/accessing metadata
  •  Provide a searchable interface to data and metadata
  •  Expose data relationships and semantics
  •  Provide uniform access controls to information

The Concepts - SOA

The guiding principles of SOAs are based on lessons well-learned over the brief history of computing, most notably that of decoupling of system components. It is these same principles that motivate the use of data services in an SOA.

MOVE DOWN TO EXPLORE THE CONCEPT OR RIGHT FOR THE NEXT CONCEPT

SOA and abstraction

Decoupling is the key concept in SOAs and is achieved through abstraction based on service interfaces. Business processes in an SOA represent a formalized, executable form of the actual enterprise's processes, but offer a layer of abstraction above the physical processes, be they automated or manual. Business processes are composed of business services. Just as business processes in an SOA represent an abstraction from their real-world counterparts, so do business services offer an abstraction of actual physical services. Decoupling through abstraction imbues SOAs with immense potential to model business operations independent of the IT infrastructuredu jour.

SOA and abstraction(continued)

SOAs, as their name makes clear, are architectures. These architectures, as we've seen, involve business processes composed of business services. Business processes and services both make use of business information, which is likely resident in many different types and instances of databases and files. This information can be exposed to business services using the same service-oriented paradigm - as data services.

data services

Just as business processes and services in an SOA represent abstractions - albeit executable ones - of their real-world counterparts, so too do data services represent an abstraction of underlying enterprise information. Data services expose information to business services in a form and through an interface amenable to those services.

data services (continued)

The form is generally some representation of business objects to be manipulated by business services and passed between services by business processes. Business objects may be simple tabular structures or complex nested structures. Almost always, though, they must be composed from information residing in more than one data source, often in different persistence formats. So a key requirement of data services is that they:

  • expose integrated information in one or more desired formats, even if the original data are in different formats.

data services (continued)

The desired interface is dependent on the architecture being used. A Web service-based SOA will provide a SOAP or REST-based interface to XML-formatted business objects. A more traditional Java or C-language RCP-based architecture will require JDBC or ODBC access to tabular information, obtained from multiple data sources. So, a second key requirement of data services is that they:

  • expose information through one or more consistent, standard interfaces, even if the original data are accessed through different interfaces.

data services (continued)

These two key requirements of data services are achieved by two different technologies:

  • modeling to define the required format of data, integrated from the underlying sources; and
  • a query engine for processing these abstract definitions efficiently, exposing the integrated information through one or more interfaces.

THANK YOU!

Data Virtualization Primer - The Concepts

By Kenny Peeples

Data Virtualization Primer - The Concepts

This part of the DV Primer will go over the concepts around Virualized Data Services

  • 3,567
Loading comments...

More from Kenny Peeples