Please wait a minute...

当期目录

2012年, 第4期 刊出日期:2012-04-01 上一期    下一期
Recent advances of the Semantic Web
Zhao-hui Wu
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 239-240.   https://doi.org/10.1631/jzus.C1101000
摘要( 1855 )     PDF(0KB)( 849 )
In December 2011, researchers from all over the world gathered together in Hangzhou, China, for the 1st Joint International Semantic Technology Conference (JIST2011). JIST2011 is a joint event for regional Semantic Web related conferences. This year’s JIST brought together two regional conferences: ASWC2011 (Asian Semantic Web Conference 2011) and CSWC2011 (5th Chinese Semantic Web Conference).
As a follow up to this successful event, this special issue aims to promote the discussion on current trends of the Semantic Web. Our goal is not only to select the best papers from the conference, but also to present cutting-edge perspectives and visions to highlight future developments. Taking this into account, we have organized this special issue with many new features. Structurally, it has the following four components: Perspectives, Personal Views, Research Articles, and Application Reports.
For the perspective part, we invited Professor Ian Horrocks from Oxford University and Professor Riichiro Mizoguchi from Osaka University to present their own perspectives, with a particular focus on the scalability issue of the Semantic Web. Professor Ian Horrocks reviews the evolution of semantic technologies to date, and then examines the scalability challenges that arise from deployment in large-scale applications. Professor Riichiro Mizoguchi emphasizes the exact meaning of ‘scalability’ of the Semantic Web data, from the perspective of the data on the WWW scale and Linked Data scale.
For a personal view, Mark Greaves from Vulcan presents his view on two properties of the Semantic Web: how existing Internet social (‘crowd’) phenomena can apply to data on the Semantic Web, and how we can use these social Web techniques to improve the dynamic scalability of the Semantic Web. Dr. Jeff Z. Pan from the University of Aberdeen presents his personal view on an important problem in many Semantic Web applications, i.e., local closed world reasoning in ontologies. He also proposes several topics related to this research area and discusses possible technical directions. Professor Zhao-hui Wu presents his personal view on the technical evolution of the Semantic Grid into Knowledge Service Cloud. He states that Knowledge Service Cloud is the future e-Science infrastructure that emphasizes supporting knowledge creation activities with intelligent services, anytime, anywhere, and via any device.
The research articles part is composed of four selected articles. The first article proposes a MapReduce-based approach to tackle scalability challenge for matching large ontologies, for instance, long run time or strong hypotheses on the running environment. The second article proposes an interesting approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The third article is a study on improving SPARQL query performance with semantic caching approaches, i.e., SPARQL algebraic expression tree (AET) based caching and entity caching. The fourth article presents a multi-agent framework for mining hypothetical semantic relations from the Linked Data. These agents collaborate in relation mining by publishing and exchanging inter-dependent knowledge elements, e.g., hypotheses, evidence, and proofs, giving rise to an evidentiary network that connects and ranks diverse knowledge elements.
To encourage and promote the adoption of Semantic Web technologies, the last part of this special issue is devoted to reports on typical applications. The first one from Oracle specifically reports on enterprise applications of semantic technologies for business process management. The second one introduces a featured application in traditional Chinese medicine (TCM), and describes an in-use national semantic infrastructure developed for TCM communities.
As another novel feature of this special issue, we append a list of related articles recommended by the authors at the end of the papers, where space is available, so that readers can easily follow on specific topics based on these suggested readings.
Semantics ? scalability ? ⊥?
Ian Horrocks
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 241-244.   https://doi.org/10.1631/jzus.C1101001
摘要( 1876 )     PDF(0KB)( 952 )
So-called ‘semantic technologies’ are rapidly becoming mainstream technologies, with RDF and OWL now being deployed in diverse application domains, and with major technology vendors starting to augment their existing systems accordingly. This is, however, only the first step for Semantic Web research; we need to demonstrate that the semantic technologies we are developing can (be made to) exhibit robust scalability if deployments in large scale applications are to be successful. In this paper I will briefly review the evolution of semantic technologies to date, examine the scalability challenges arising from deployment in large scale applications, and discuss ongoing research aimed at addressing them.
On scalability of the Semantic Web
Riichiro Mizoguchi
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 245-246.   https://doi.org/10.1631/jzus.C1101005
摘要( 1503 )     PDF(0KB)( 692 )
Although people believe that both the Semantic Web (SW) and SW applications scale up, we have to make sure we know what they mean by ‘scale up’ in order to properly understand the research on the SW. It is definitely true that the data on the World Wide Web (WWW) scales up. It is also true that Linked Data scales up. How about the SW data? If it does, in what sense does it?
Let us look back at the history of knowledge bases and investigate expert systems to understand in what sense knowledge bases in expert systems did not scale up. A couple of critical issues in expert systems include the fact that it is very hard to build a very large knowledge base, and it is also hard to maintain it. So, people cannot hope to build an expert system which covers most of the tasks operated within a company. This is why people say “Expert systems do not scale up”.
Note here, however, that people might miss an important factor in expert systems. That is, expert systems have very high functionality to solve real world problems faster than human experts with compatible performance. High functionality is what people want to realize because it helps them solve daily problems. So, what they long for is such an ideal system that has high functionality which covers almost all tasks in the respective domains where they work (Fig. 1). If we do not care about the functionality, it is not difficult for us to build a knowledge base that scales up because they just have to collect relevant data and build a huge database. The problem is that such a knowledge base cannot be called a knowledge base and is rarely useful for solving problems, since what people can do with it is to find relevant data, and hence it has only low functionality.
Turning to the WWW and SW, in what sense do people believe they scale up? As far as I know, they scale up but functionality is quite low. What people can do with the WWW and SW is essentially ‘information finding’. So, if we compare expert systems and WWW/SW with respect to scalability in a fair way, it would be as follows: WWW/SW scale up, but only if low functionality can be acceptable, and there is no guarantee if they scale up with reasonably high functionality. This is what I want to claim by Fig. 1. The above observation suggests that we need to pay more attention to functionality when talking about the scalability issue. More concretely, it would not make sense to claim “This and that scale up” without discussing its functionality.
Now, we discuss the implication of this. I believe it is beneficial for us to investigate what functionality we would expect in the context of WWW/SW applications. Fig. 2 is prepared for discussing this topic. In Fig. 2, functionality is replaced with computational semantics. Of course, they are not equivalent in general. However, high functionality usually requires deep semantics in computation/inference. They are roughly proportional. Furthermore, we can compare between depths of computational semantics more easily than between the degree of the height of functionality. Anyway, typical applications are placed along the vertical axis according to their depth of computational semantics. The shallowest application is data retrieval from databases and the deepest is the knowledge base for problem solvers like expert systems. From top to bottom, deeper applications are placed one by one. Data retrieval from a database needs little semantics but pure syntactic processes. The next shallowest one is information finding in the WWW in which we need the page-ranking algorithm which requires evaluation of the importance of Web pages by reference analysis. The next is the social Web, which requires network analysis followed by Linked Open Data (LOD), which requires a schema for retrieval of Resource Description Framework (RDF) data. The next is the SW, which requires metadata using ontology and simple reasoning in terms of subsumption relations, part-of relations, etc. Down to SW, tasks involved are still information finding, though required computational semantics is deepened step by step.
The research on Semantic Web Services (SWS) is a bit different from those explained earlier in that its function is not just information (Web services) finding but also producing a new Web service by combining Web services found, like automatic program synthesis which requires fairly high functionality. The next one requiring higher functionality than SWS is the so-called intelligent amplifier (IA) system in which I have been deeply involved. The one requiring the highest functionality is of course expert systems.
As I discussed already, we learned that expert systems do not scale up if we keep their high functionality. From the top, down to the SW, they scale up but their functionality is quite low. So, my question is what do we really want? Most SW researchers seem to believe anything that scales up is good. But, I am afraid such a criterion does not work well. I strongly believe we have to ponder on what we really want and on whether or not it is something that scales up with reasonably high functionality. If so, then we have to seriously investigate how we can pursue the two conflicting goals, scalability and high functionality, at the same time. This would open our eyes to a new research direction.
Semantics and the crowd
Mark Greaves
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 247-249.   https://doi.org/10.1631/jzus.C1101003
摘要( 1430 )     PDF(0KB)( 817 )
One of the principal scientific challenges that drives my group is to understand the character of formal knowledge on the Web. By formal knowledge, I mean information that is represented on the Web in something other than natural language text—typically, as machine-readable Web data with a formal syntax and a specific, intended semantics. The Web provides a major counterpoint to our traditional artificial intelligence (AI) based accounts of formal knowledge. Most symbolic AI systems are designed to address sophisticated logical inference over coherent conceptual knowledge, and thus the underlying research is focused on characterizing formal properties such as entailment relations, time/space complexity of inference, monotonicity, and expressiveness. In contrast, the Semantic Web allows us to explore formal knowledge in a very different context, where data representations exist in a constantly changing, large-scale, highly distributed network of loosely-connected publishers and consumers, and are governed by a Web-derived set of social practices for discovery, trust, reliability, and use. We are particularly interested in understanding how large-scale Semantic Web data behaves over longer time periods: the way by which its producers and consumers shift their requirements over time; how uniform resource identifiers (URIs) are used to dynamically link knowledge together; and the overall lifecycle of Web data from publication, to use, integration with other knowledge, evolution, and eventual deprecation. We believe that understanding formal knowledge in this Web context is the key to bringing existing AI insights and knowledge bases to the level of scale and utility of the current hypertext Web.
Technically, the scalability of the Semantic Web is rooted in a large number of independently-motivated participants with a shared vision, each following a set of carefully-designed common protocols and representation languages (principally dialects of the Resource Description Framework (RDF), the Web Ontology Language (OWL), and the SPARQL Protocol and RDF Query Language (SPARQL)) that run on top of the standard Web server and browser infrastructure. This strategy builds on the familiar hypertext Web, and has been incredibly successful. The Semantic Web now encompasses more than 50 billion Semantic Web assertions (triples) shared across the world via large numbers of autonomous Web servers, processed by situation-specific combinations of local and remote logic engines, and consumed by a shifting collection of software and users. However, this kind of loosely-coupled scalability strategy comes at a technical price: the Semantic Web is by far the largest formal knowledge base on the planet, and certainly one of the broadest, but also one of the messiest. Semantic coherence can be guaranteed only locally if at all, performance is spotty, data updates are unpredictable, and the raw data can be problematic in many ways. These problems impact the overall scalability of the Semantic Web; beyond simply exchanging large quantities of data, we also want the Semantic Web to scalably support queries, integration, rules, and other data processing tools. If we can solve these problems, though, the Semantic Web promises an exciting new kind of data Web, with practical scaling properties beyond what federated database technology can achieve. In the full Semantic Web vision, massive amounts of partially-integrated data form a dynamically shifting fabric of on-demand information, able to be published and consumed by clients around the world, with transformational impact.
Our current work is inspired by two properties of the Semantic Web: how existing Internet social (‘crowd’) phenomena can apply to data on the Semantic Web, and how we can use these social Web techniques to improve the dynamic scalability of the Semantic Web. Most data currently published on the Semantic Web is originally sourced from existing relational databases, either via front-end systems like the D2R server (http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/), or by offline loading of the relational data into an associated high-performance triplestore to support Semantic Web access and processing. In each case, the core information is usually acquired by conventional means, cleansed and structured into a relational store by a database administrator, and imbued with a particular data semantics that is eventually reflected in the Semantic Web republication. Thus, much of the data presently on the Semantic Web relies heavily on the traditional computer science discipline of database construction.
Local closed world reasoning: a personal view on current status and trends
Jeff Z. Pan, Yuan Ren
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 250-252.   https://doi.org/10.1631/jzus.C1101004
摘要( 1429 )     PDF(0KB)( 799 )
LCWR with ontologies have opened a new horizon for ontology applications: 1. Combining ontologies with relational databases. The inter-operability between relation databases and DL-based ontologies is one of the most interesting topics in ontology applications, especially for industrial audience. LCWR bridges one of the biggest gaps between legacy relational database systems and ontology based systems. With LCWR supported by reasoners, users can specify the closure of certain classes/properties so that the semantics of their database remains the same when they are combined with ontologies. 2. Ontology-based recommendation systems. Due to the semantics of the OWA, anything derived by legacy reasoning technologies must be absolutely true facts. Possible (i.e., not definite) solutions cannot be derived by reasoning because they are not logical implications of the ontology. With LCWR, a user can close the class of definitely wrong answers, and then its complement becomes the set of possible answers. Such a pattern can be widely applied in recommendation services, such as system configuration and repairing, diagnosis and matchmaking, etc. 3. Non-monotonic ontology reasoning. Classical ontology reasoning is monotonic in a sense that adding new knowledge into an ontology will not retract existing conclusions. However, with LCWR, non-monotonic reasoning is realized. It will be a very interesting research topic to investigate the relationship between LCWR and other non-monotonic reasoning services, and to develop theories that can provide a unified explanation for these services. To sum up, a range of new applications can be incubated by the utilization of LCWR technologies, and the research work on these topics will have significant industrial and academic impacts. To facilitate the wide acceptance of LCWR, ontology system developers and researchers will have to address different technical issues. For developers, it is crucial to know the pros and cons and tool support of each different LCWR solution and identify their own requirements, so that they can find out which solution is most suitable and available for them. For researchers it will be important to investigate the connections between different LCWR solutions and discover the possibility of using them complementarily to overcome the limitations of each other. Furthermore, new technologies should be developed with the collaboration of both communities to improve the user-friendliness of LCWR. Particularly, because LCWR is non-monotonic, it will be very important to help users understand how the ontology and reasoning results will be changed if they decide to close or open certain predicates. These are all future directions we will be looking into.
From Semantic Grid to Knowledge Service Cloud
Zhao-hui Wu, Hua-jun Chen
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 253-256.   https://doi.org/10.1631/jzus.C1101006
摘要( 1719 )     PDF(0KB)( 850 )
According to Foster et al. (2008), Grid computing and cloud computing are closely related paradigms that share a lot of commonality in their vision, architecture, and technology. They also share some limitations, namely the inability to provide intelligent and autonomous services, the incompetency to address the heterogeneity of systems and data, and the lack of machine-understandable content. Mika and Tummarello (2008) identified the root cause of these limitations as the lack of ‘Web semantics’.
The Semantic Web is an emerging technical movement target on Web semantics (Berners-Lee et al., 2006). The Semantic Web languages, such as Resource Description Framework (RDF), RDF Schema (RDFS), Web Ontology Language (OWL), and SPARQL, together with a rich set of pragmatic tools, enable a Web of data with semantics formally defined (Domingue et al., 2011). The reliability, effectiveness, and efficiency of these technologies have been proved in practical applications from various domains such as biology, medical science, healthcare, and pharmaceutics. As Semantic Web technologies are reaching maturity, computer scientists are exploring the possibilities of integrating Semantic Web technologies into other Web-based technologies (e.g., SOC, Grid computing, and cloud computing), to create more powerful integration solutions. In this paper, we will discuss three major trends of technical integration: Semantic Web Services (Payne and Lassila, 2004), the Semantic Grid (de Roure et al., 2001), and the Knowledge Service Cloud (KSC).
VDoc+: a virtual document based approach for matching large ontologies using MapReduce
Hang Zhang, Wei Hu, Yu-zhong Qu
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 257-267.   https://doi.org/10.1631/jzus.C1101007
摘要( 1530 )     PDF(0KB)( 838 )
Many ontologies have been published on the Semantic Web, to be shared to describe resources. Among them, large ontologies of real-world areas have the scalability problem in presenting semantic technologies such as ontology matching (OM). This either suffers from too long run time or has strong hypotheses on the running environment. To deal with this issue, we propose a three-stage MapReduce-based approach V-Doc+ for matching large ontologies, based on the MapReduce framework and virtual document technique. Specifically, two MapReduce processes are performed in the first stage to extract the textual descriptions of named entities (classes, properties, and instances) and blank nodes, respectively. In the second stage, the extracted descriptions are exchanged with neighbors in Resource Description Framework (RDF) graphs to construct virtual documents. This extraction process also benefits from the MapReduce-based implementation. A word-weight-based partitioning method is proposed in the third stage to conduct parallel similarity calculation using the term frequency–inverse document frequency (TF-IDF) model. Experimental results on two large-scale real datasets and the benchmark testbed from Ontology Alignment Evaluation Initiative (OAEI) are reported, showing that the proposed approach significantly reduces the run time with minor loss in precision and recall.
Knowledge extraction from Chinese wiki encyclopedias
Zhi-chun Wang, Zhi-gang Wang, Juan-zi Li, Jeff Z. Pan
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 268-280.   https://doi.org/10.1631/jzus.C1101008
摘要( 1885 )     PDF(0KB)( 1110 )
The vision of the Semantic Web is to build a ‘Web of data’ that enables machines to understand the semantics of information on the Web. The Linked Open Data (LOD) project encourages people and organizations to publish various open data sets as Resource Description Framework (RDF) on the Web, which promotes the development of the Semantic Web. Among various LOD datasets, DBpedia has proved a successful structured knowledge base, and has become the central interlinking-hub of the Web of data in English. However, in the Chinese language, there is little linked data published and linked to DBpedia. This hinders the structured knowledge sharing of both Chinese and cross-lingual resources. This paper deals with an approach for building a large-scale Chinese structured knowledge base from Chinese wiki resources, including Hudong and Baidu Baike. The proposed approach first builds an ontology based on the wiki category system and infoboxes, and then extracts instances from wiki articles. Using Hudong as our source, our approach builds an ontology containing 19 542 concepts and 2381 properties. 802 593 instances are extracted and described using the concepts and properties in the extracted ontology and 62 679 of them are linked to equivalent instances in DBpedia. As from Baidu Baike, our approach builds an ontology containing 299 concepts, 37 object properties, and 5590 data type properties. 1 319 703 instances are extracted from Baidu Baike, and 84 343 of them are linked to instances in DBpedia. We provide RDF dumps and SPARQL endpoint to access the established Chinese knowledge bases. The knowledge bases built using our approach can be used not only in Chinese linked data building, but also in many useful applications of large-scale knowledge bases, such as question-answering and semantic search.
Improving SPARQL query performance with algebraic expression tree based caching and entity caching
Gang Wu, Meng-dong Yang
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 281-294.   https://doi.org/10.1631/jzus.C1101009
摘要( 1655 )     PDF(0KB)( 1244 )
To obtain comparable high query performance with relational databases, diverse database technologies have to be adapted to confront the complexity posed by both Resource Description Framework (RDF) data and SPARQL query. Database caching is one of such technologies that improves the performance of database with reasonable space expense based on the spatial/ temporal/semantic locality principle. However, existing caching schemes exploited in RDF stores are found to be dysfunctional for complex query semantics. Although semantic caching approaches work effectively in this case, little work has been done in this area. In this paper, we try to improve SPARQL query performance with semantic caching approaches, i.e., SPARQL algebraic expression tree (AET) based caching and entity caching. Successive queries with multiple identical sub-queries and star-shaped joins can be efficiently evaluated with these two approaches. The approaches are implemented on a two-level-storage structure. The main memory stores the most frequently accessed cache items, and items swapped out are stored on the disk for future possible reuse. Evaluation results on three mainstream RDF benchmarks illustrate the effectiveness and efficiency of our approaches. Comparisons with previous research are also provided.
A multi-agent framework for mining semantic relations from Linked Data
Hua-jun Chen, Tong Yu, Qing-zhao Zheng, Pei-qin Gu, Yu Zhang
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 295-307.   https://doi.org/10.1631/jzus.C1101010
摘要( 1845 )     PDF(0KB)( 935 )
Linked data is a decentralized space of interlinked Resource Description Framework (RDF) graphs that are published, accessed, and manipulated by a multitude of Web agents. Here, we present a multi-agent framework for mining hypothetical semantic relations from linked data, in which the discovery, management, and validation of relations can be carried out independently by different agents. These agents collaborate in relation mining by publishing and exchanging inter-dependent knowledge elements, e.g., hypotheses, evidence, and proofs, giving rise to an evidentiary network that connects and ranks diverse knowledge elements. Simulation results show that the framework is scalable in a multi-agent environment. Real-world applications show that the framework is suitable for interdisciplinary and collaborative relation discovery tasks in social domains.
Enterprise applications of semantic technologies for business process management
Ralf Mueller
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 308-310.   https://doi.org/10.1631/jzus.C1101011
摘要( 1453 )     PDF(0KB)( 825 )
We would like to give a short perspective on how BPM can benefit from the technical advances in semantic technologies over the last 10 years. BPM spans a wide range of concepts, technologies, and personas. From a top-down perspective, a BPM project typically starts with the definition of the company value chains, the strategies and goals that should be achieved, and the key performance indicators (KPIs) that are used to measure the success or failure of well-defined objectives. The value chains are further decomposed into supporting business processes (typically modeled in BPMN 2.0 (Silver, 2009)), applications, and services. An implementation of those artifacts is then performed using the tenets of a Service-Oriented Architecture (SOA). We are considering the use of semantic technologies invaluable in formalizing the complex relationships between the involved entities of a BPM system and providing a unified method for query and business rules. Understanding the exact semantics of a BPM system will foster agility and reduce proliferation of services and processes in an enterprise.
National semantic infrastructure for traditional Chinese medicine
Hua-jun Chen
Front. Inform. Technol. Electron. Eng., 2012, 13(4): 311-314.   https://doi.org/10.1631/jzus.C1101012
摘要( 3919 )     PDF(0KB)( 945 )
We use a domain ontology to construct a Semantic Web environment to unify and link the legacy databases, which typically have heterogeneous logic structures and physical properties. Users need only to interact with the Semantic Web environment, and perform searching, querying, and navigating around an extensible set of databases without the awareness of the database boundaries. Additional deductive capabilities can then be implemented to increase the usability and re-usability of data.
In the DartGrid project, we focus on three major TCM requirements, including academic virtual organization, personalized healthcare, and drug discovery and safety. Here we present a brief overview of the major applications that we have developed to satisfy the above requirements.
12 articles

编辑部公告More

友情链接