UnifiedViews

UnifiedViews is an Extract-Transform-Load (ETL) framework that for sustainable (RDF) data processing  allows users – publishers, consumers, or analysts – to define, execute, monitor, debug, schedule, and share RDF data processing tasks. UnifiedViews is one of the core components of  Open Data Node – publication platform for Open data where it ensures extraction, transformation, and publishing of (Linked) Open Data.

A data processing task (or simply task) consists of one or more data processing units.  This tasks may use custom plugins (DPU) created by users. UnifiedViews differs from other ETL frameworks by natively supporting RDF data and ontologies. UnifiedViews has a graphical user interface for the administration, debugging, and monitoring of the ETL process.

 A data processing unit (DPU) encapsulates certain business logic needed when processing data (e.g., one DPU may extract data from an RDF database or apply a SPARQL query). Every DPU has its inputs, outputs, business logic and configuration.

Screeancast – UnifiedViews in 5 minutes (no audio)

 
 

UnifiedViews is a framework, thus, consumers may create custom DPUs; any tool used by RDF/Linked Data community can be easily wrapped as a DPU. UnifiedViews allows consumers to define and adjust data processing tasks, using graphical user interface (see picture).

Graphical interface for pipelines

UnifiedViews takes care of task scheduling. A consumer may configure UnifiedViews to get notifications about errors in the tasks’ executions; the consumer may also get daily summaries about the tasks being executed. UnifiedViews ensures that DPUs are executed in the proper order, so that all DPUs have proper required inputs when being launched. UnifiedViews provides consumers with the debugging capabilities – a consumer may browse and query (using SPARQL query language) the RDF inputs to and RDF outputs from any DPU.

UnifiedViews allows consumers to share DPUs, configurations of DPUs, and tasks as needed.

UnifiedViews Ver 2.0

UnifiedViews 2.0 is not backward compatible with UnifiedViews 1.X. If you need to use plugins developed for UnifiedViews 1.X in UnifiedViews 2.0, you may install extra library, which ensures that older plugins (developed for UnifiedViews 1.X) may be used in UnifiedViews 2.0.

New features in Ver. 2.0

Plugin template: makes plugin development easier – a developer needs only to specify couple of plugin’s metadata, such as its name, type, package for Java classes and, consequently, all important Java classes needed for developing and running the plugin are prepared for the plugin developer.

Plugin extensions: classes, which may be easily exploited by plugin developers and which provide further functionality for developers when developing their plugins. The most important extensions:

  • SimpleRDF and SimpleFiles: wrappers on top of RDF data unit and files data unit, respectively. They provide plugin developers with easy to use methods to cover basic functionality when working with RDF data or files; for example, there are methods, which allow DPU developers to query RDF data within RDF data unit or to add new RDF data to RDF data unit using single line of code.
  • Dynamic configuration of plugins using RDF configuration: allows a plugin to be configured dynamically over one of its inputs (configuration input data unit). If the plugin supports dynamic configuration, it may receives configuration over its configuration input data unit; such configuration is then automatically deserialized from RDF data format and used instead of the configuration being defined via configuration dialog of the plugin.
  • Fault tolerance. Operation on top of RDF data can be time intensive (e.g., fetching tens of millions of RDF triples over SPARQL protocol or executing set of SPARQL Update queries on tens of millions of RDF triples). Such operations typically consist of set of calls against target RDF store. Since any such call can throw exception anytime and, as our experiments revealed, such exception is often caused by target RDF store not responding temporarily to certain operation, it makes sense to retry certain particular call rather then trying to retry whole operation or pipeline execution, as this could mean that hours of work were lost. As a result, developers may decide to use Fault tolerant extension to ensure that certain calls, which may fail, are retried in case of certain types of problems.

Localisation: In UnifiedViews 2.0, it is not necessary to explicitly initialise localisation of the plugin – localisation of the plugin is automatically set up based on the localisation of the framework. There is also a native support for translating 1) messages published during plugin’s execution and 2) labels shown in the plugin’s configuration dialog.

Support for versioning of configurations, better support for serialization/deserialization of the configuration:  UV supports also automatic migration of plugins’ configuration. When a plugin developer changes configuration of a plugin, the plugin developer should 1) create new version of the configuration class and  2) provide a method, which migrates previous configuration to the new version of the configuration. As a result, when the new version of the existing plugin is imported to UnifiedViews, its configuration may be automatically updated before it is used based on the migration method provided by the plugin developer.

UnifiedViews 2.0 also changed the way how configurations of plugins are serialized, so that the persisted plugins’ configurations contain information about the versions of the configurations. Furthermore, the way how configuration of plugins is obtained and parsed on one side and stored on the other side was refactored in UnifedViews 2.0, so that the process may be easily configured/adjusted in the future.

Useful links