|
The paper describes some aspects of the development of the first Russian research and educational internet portal on international relations (IR) and world politics in the context of modern trends in knowledge management and designing social knowledge networks. The focus of the paper is on the innovative classification scheme for resources in the field of IR and world politics, and on a new method of internet resources cataloging and classification, document naming and search query organization, based on combinations of mnemonic descriptors and dynamic updating of DNS-servers’ databases. The proposed scheme integrates the Universal Resource Locator (URL) mechanism into resource classification and retrieval process and facilitates the integration of offline and online environments. Besides easy document retrieval within a centralized databases (such as thematic portals) the proposed method potentially enables the addressing of external resources, which makes it a useful means of organizing virtual distributed document repositories. The addressing scheme appears quite flexible and can be used not only for the classification of ontologies and document repositories, but for cataloguing and retrieval of diverse information in various business-applications as well. The paper also addresses some implementation issues and prospects of further development of the proposed concept. Closing the Circle: Employing a New Method of Internet Resources Classification, Cataloging and Naming for Managing the first Russian Social Knowledge Network DMITRY PESKOV Internet politics center, Moscow State Institute of International Relations (MGIMO-University), 76 Prospekt Vernadskogo, Moscow, 119454, Russia E-mail:
Этот e-mail защищен от спам-ботов. Для его просмотра в вашем браузере должна быть включена поддержка Java-script
,
Этот e-mail защищен от спам-ботов. Для его просмотра в вашем браузере должна быть включена поддержка Java-script
Vitaly Kabernik† and Andrei Mikheyev Internet politics center, Moscow State Institute of International Relations (MGIMO-University), 76 Prospekt Vernadskogo, Moscow, 119454, Russia †E-mail:
Этот e-mail защищен от спам-ботов. Для его просмотра в вашем браузере должна быть включена поддержка Java-script
Context: designing social knowledge management systems in Russia The creation and maintenance of knowledge management systems has in recent years been converging with the development of social software, internet addressing and content management systems, as well as with traditional tasks of structuring and arranging scientific knowledge. This feature is especially noticeable with regards to humanities, where the line between the subject and object of action, observation and experiment, examination and dialog is especially vague. The explosive growth of the amount of information increases the demand for its quality analysis. The co-evolution of the internet and society coupled with the continuing crisis of humanities leads to a similar sharp increase in the number of attempts to link humanitarian and technical knowledge in order to combine the advantages of both approaches. Knowledge management becomes social management, social management becomes ever more dependent on the internet, managing the internet leads to knowledge management. The circle closes. Different countries choose different ways of creating systems aimed at integrating knowledge, internet and society management. Much depends both on the characteristics of the society and on the historical groundwork in which knowledge management research and projects develop. For a long time (over the last 15 years) Russia has been effectively excluded from this process for a variety of reasons, the most important of which are: · Ideological engagement of scientific knowledge in the Soviet Union, which led to sharp depreciation of its value after the break-up of the country; · The ensuing social sciences crisis of the 1990s. During this period the detachment from the world information development process continued mostly due to Russia’s exclusion from the world economic system and to language problems. As a result, Russia has not de facto inherited any large knowledge management system (or any information system) from the USSR: anything similar to Lexis-Nexis, BBS or Gopher. The rapid change in information management technologies in the 1990s (including the creation of databases on personal computers, then in local networks and finally in the internet) has completely devalued Russia’s knowledge management systems, and in the 2000s everything had to be started from scratch. However, such situation brought certain advantages with it. For instance, knowledge management systems designers in Russia were not burdened by the need to transport existing content, to cater for the interests of long-time users accustomed to familiar interface, finally, to use outdated criteria for categorizing information. Today information management systems are expected to be able not only to collect, process and distribute knowledge, but to manage more subtle social and personal processes, from automatically arranging incoming data arrays to analyzing patterns of information search and, de facto, delivering the certain way of thinking to user on-demand. In Russia such systems can be developed virtually “from scratch” and rely on the newest information management achievements: from Semantic Web ontologies and tagging systems exemplified by Del.icio.us, Livejournal, Flickr to Wiki-names technique of Wikipedia. One can expect such system to contain quite a few political “Easter eggs” — due to the underdevelopment of civil society institutes in Russia their place is soon occupied by members of network communities, whereas information system standards begin to configure social processes (just like ERP- and CRM-systems configure business-processes of an enterprise). To use a social science analogy, the purpose of social knowledge management system is to produce cognitive practices of the society’s self-cognition. International relations and world politics portal: developing classification scheme One of the projects intended to fill the social knowledge management niche in Russia is the research and educational internet portal on international relations and world politics which is being developed at the Internet politics center of MGIMO-University, Russia’s leading research and educational center on international relations and diplomacy. The mission of the portal is to become the first major gateway to research and educational materials for professionals in the field of world politics and international relations, as well as a place for research, professional communication, mutual evaluation and reviewing. In other words, the ultimate mission of the portal is to transport the visible part of the professional community’s work into the online environment. Fig. 1. Major services of the portal The major problem MGIMO portal designers faced is quite typical: finding an inexpensive, powerful and scalable web solution to ensure that the project “stays alive” after the initial development funding expires. A solution was needed that would be content-independent while at the same time enabling the evaluation, indexing and analysis of research activities. Ideally the solution should also be platform-independent. Such a solution was found in the course of developing a classification scheme for international studies and world politics resources. The need to develop an original classification scheme arose because none of the existing ones (such as DDC and LCC, as well as Russian classification schemes UDK and BBK) met the requirements of a social knowledge management system of such scope. One of the major shortcomings of all existing schemes is the insufficient degree of detail that they provide. For instance, Dewey Decimal Classification System suggests only two options for categorizing international relations and world politics resources: 327.1 General topics of international relations; spies and 327.2 Diplomacy. Other possible options are limited to 324.1 International party organizations, auxiliaries, activities; 325 International migration & colonization; 341 International law; 382 International commerce (Foreign trade); 337 International economics; 909 World history. Needless to say, this offers very limited opportunities for high-quality categorization of resources. Another problem of such classifications is well-known to all librarians: these schemes do not fit well with the multiple-belonging requirement of many resources, especially in a field as intertwined (both within itself and with other areas) as world politics. The same document can belong to several issue areas or even to different types of resources, and choosing the “primary” or “most important” one is not always easy. A need was therefore evident for a new classification scheme, more detailed and better suited for the purposes of organizing social knowledge in the field of international relations and world politics. Such system of resources classification and categorization was expected to: · Provide an all-purpose instrument for statistical analysis of the field of study; · Enable the identification of high-priority problem fields in the study of international relations and world politics; · Become a model for similar classification schemes in other fields of study; · Make it possible to create a consistent internal structure of the portal; · Serve as basis for navigation and search mechanisms; · Assist in estimating the match between research priorities and allocated resources of the University; · Lay the foundation for content audit of research departments’ activity; · Help avoid duplication in creating new courses and plans; · Streamline the development of innovative courses. Based on the experience of some Russian portals and the guidelines and standards of the largest US and international educational portals (such as The Gateway to Educational Materials), a number of requirements for such classification scheme were developed: · applicability to various types of objects (including, for instance, peoples, institutions, events); · broad coverage (in terms of issue areas); · high level of detail; · ease of usage and intuitiveness; · flexibility, scalability and adaptability. The last requirement was especially important. It was understood that no deal of speculation can ensure that the classification scheme would accommodate all new resources (in terms of resource types, issue areas, etc.). Although every effort was made to foresee all possible options, it was important for the scheme to be fundamentally open to new categories of resources. Moreover, possible expansion of the basic principles of the scheme to other areas, not related to international relations and world politics, was also envisaged. Another important consideration that was kept in mind already at that stage was the role of the classification scheme as a tool for the integration of online and offline resources. For instance, it was expected that besides describing online resources on the portal the classification code would be indicated in each printed publication of MGIMO-University (and other publishers as well), so that the reader could easily identify the location of similar documents within the portal. The authors of the initial classification scheme (most of work was done in a small group involving Dmitry Peskov and two MGIMO scholars: Dr. Sergei Afontsev and Dr. Oleg Barabanov) opted for a three-part classification, partly influenced by the categorization method employed in the Journal of Economic Literature: the contents of each document was to be described by its type, issue area and geographical affiliation. The choice of geography as the third “dimension” was largely conditioned by the specifics of world politics as research field; it is quite conceivable that for other problem areas, for instance, geography would be of less importance while other important classification “dimensions” may appear. To avoid confusion, three types of descriptors were numbered in different ways: Roman numerals were used for document types, letters of Roman alphabet (and their combinations) for issue areas and Arabic figures for geographical regions. Within issue areas classification, 14 broader groups were identified (marked by one letter), each comprising up to several dozen sub-groups (two letters). The scheme allowed for classifying the same resource as belonging to multiple issue areas, geographical regions and even document types. Thus, a PhD dissertation on Russia’s policy towards the process of democratization in Taiwan would belong to the group “V JG 278,309”, where V would stand for PhD dissertation (resource type), JG for democratization (issue area) and numbers would indicate Russia and Taiwan respectively. Logical as it may be, this scheme was not intuitive and memorable enough for users not familiar with it and required certain refinement. In the course of further work on the classification scheme, which involved two other authors of the present paper, it was deemed practical to use two-letter country codes (as easier to remember and better known to users, especially those familiar with top-level domain names) rather than numbers. Then we tried to apply the same principle — in the form of “mnemonics”, or short and easy to remember letter combinations/abbreviations/words — to other parts of the classification scheme. From there, there was only one step to the main idea: integrating classification code and portal URL. Integrating the URL with resource classification and retrieval The idea of describing each document with a number of mnemonics suggested that the Universal Resource Locator mechanism can be used to allow easy retrieval of relevant documents (or their descriptions) available at the portal. Under the proposed scheme each new resource added to the portal by a member of the portal team is marked by a number of mnemonics, describing document type (s), issue area (s) and geographical region (s) the resource deals with — a typical procedure for many databases (in the form of keywords). What is new, however, is that users access groups of documents by entering a number of mnemonics (according to certain semantic rules) into the URL browser field. This string of mnemonics is then translated (according to the mechanism described below) into a search query which retrieves the documents matching the search criteria. The user is able to navigate between groups of resources by manipulating the page URL: adding and removing mnemonics. The DNS mechanism converts relatively short and easy-to-understand URLs into search queries and back, thus, for instance the URL leading to a list of dissertations dealing with Russia and Taiwan available at the portal, looks like: http://disser.ru.tw.<portal name>, whereas a list of news concerning Russia, Taiwan and elections process can be found at http://news.elect.ru.tw.<portal name>. A few examples involving traditional web-addressing methods can help illustrate the idea. One of the two major approaches can be used for internet-resource retrieval: GET- or POST-queries. When POST-query is used the actual content of the query is not saved, i.e. the URL string remains unchanged both before the query and after it. This makes saving the search results impossible; to run the same query again (for instance, to find new documents made available after the last use) the search string has to be re-entered. Because of this shortcoming, most search systems use GET-queries, where the factors used for documents selection are parts of the URL string. In such cases the URL looks similar to the following: Before the query: http://www.google.com After the query: http://www.google.com/search?sourceid=navclient&hl=ru&ie=UTF-8&oe=UTF-8&q=Some+String In the above example the significant part of the query is “q=Some+String”. The rest of the URL string defines additional search parameters. In the portal environment, where each user is additionally identified by session ID, entry parameters, etc., the URL string can expand to something like: http://www.ibm.com/products/finder/us/finders?Ne=5000000&finderN=1000100&trac=SU1&pg=ddfinder&collectionN=0&sid=982279341122547046037&cc=us&lc=en&oldC1=5000832&tmpl=%2Fproducts%2Ffinder%2Fus%2Fen%2Ffinders&C1=5000832&C2=5000847 As can be seen, the number of query parameters in the portal environment is much larger, including the session ID parameter (here sid=982279341122547046037), which is different for every new user and even for the same user with each new session. Such query form is extremely unusable: the URL is impossible to remember, write down, type in, etc. Moreover, the session ID changes from session to session and therefore a substantial part of the URL has to be changed every time a resource or group of resources is accessed. Let us now look at how these schemes work in internet-resource catalogs. For instance, the URL of the catalog list of organizations dealing with development issues in Africa typically looks like: http://www.google.com/Top/Society/Organizations/Development/Regional/Africa/ and de facto reproduces the catalog file structure. However, any attempt to limit the search with keywords will result in a string like http://www.google.com/search?q=Some+string&cat=gwd%2FTop%2FSociety%2FOrganizations%2FDevelopment%2FRegional%2FAfrica&hl=en, which deprives the URL in the previous example of its orderliness and brings us back to the problem of cumbersome queries. In an environment where the user is additionally identified by session ID this parameter is added to an already quite complicated string. The proposed scheme helps solve this problem by using a series of mnemonics as a search string. Let us compare the usual addressing scheme in a catalog with the one proposed: Typical variant: http://www.google.com/Top/Society/Organizations/Development/Regional/Africa/ Arranging the same descriptors according to the proposed scheme transforms it into: http://Top.Society.Organizations.Development.Regional.Africa.google.com However, within the classification scheme developed by the authors this string gets even shorter, as some descriptors of upper-level categories are eliminated: http://org.dev.africa.<portal name>, where “org” describes resource type, “dev” identifies issue area and “africa” further limits search results to a particular geographical region. When session ID is added, the advantages of the proposed scheme become even more obvious. Typical variant: http://www.google.com/search?q=Some+string&cat=gwd%2FTop%2FSociety%2FOrganizations%2FDevelopment%2FRegional%2FAfrica&hl=en&sid=1684387965414 Proposed scheme: http://Top.Society.Organizations.Development.Regional.Africa.google.com/?sid=573621658765 Using the mnemonics from classification scheme: http://org.dev.africa.worldpolitics.ru/?sid=573621658765 Besides being much easier to read and remember, this method of resource naming offers a number of other advantages. In a catalog structure which reproduces its file structure any change in the order of descriptors changes the query contents or even renders it meaningless. Thus, query http://www.google.com/Top/Society/Organizations/Development/Regional/Africa/ is not identical to http://www.google.com/Top/Organizations/Development/Regional/Society/Africa/, although the same descriptors are used. It could not be otherwise in a tree-type resource classification scheme because the same keywords can potentially occur in different branches of the catalog tree. In the proposed scheme the order of mnemonics is of no fundamental importance. Queries http://Top.Society.Organizations.Development.Regional.Africa.google.com and http://Top.Organizations.Development.Regional.Society.Africa.google.com yield the same result, which makes such query format much more usable. A particular document can also be retrieved by its ID (within the system or any universal ID such as ISBN), e.g., http://doc.id12345.google.com or http://doc.isbn5922801783.google.com What is also important is that the requested document can physically be located outside the server to which the mnemonic URL is pointing, which potentially enables a very high level of integration for any information resources, effectively detaching them from their physical location and making the process of finding them and working with them transparent for the end user. It is worth mentioning, that the proposed scheme does not rule out more traditional ways of searching for information, such as attributes-based search (by author, date, title, etc.) as well as full-text search (for documents within the portal database). Issue areas, document type and other document characteristics can be entered not only by manipulating the URL, but also in a more traditional way — by “ticking off” relevant fields. Moreover, the interface of the portal which is being developed allows the user to switch from one navigation method to the other at any point (for instance, to add issue area whose mnemonic s/he doesn’t know). Implementation issues and further development of the concept From a technical perspective the proposed system can be based on real-time programming of DNS-server databases with certain software. Upon receiving the query the server software formulates a “traditional-form” query by parsing the string of mnemonics and simultaneously adding the appropriate record (s) to the DNS-server names database. In other words, the “traditional-form” search query is transformed into an easy-to-read one, at the same time saving it for identical queries in the future. At a certain stage of query accumulation the need for translating software will be put to a minimum, as all needed records in the database will already have been created. The existing work schemes of DNS-servers are potentially able to handle this task. However, if such scheme is adopted for a particular project, additional revision of the server software may be useful. Standardizing the approach at the level of programming hardware firewall rules may be the best variant. Further analysis of the proposed notation system shows that, with certain reservations, it reproduces the notation of addressing class properties and methods in object-oriented programming languages, which brings us to the idea of using contextual and functional mnemonics, for instance: http://Top.Society.Organizations.Development.Regional.NOT.Africa.google.com Thus, even before using any query strings after “/” we are able to limit the search area or, on the contrary, broaden it (e.g., by using functional mnemonics INCLUDE). Of course, the above examples give only a very general notion of the possibilities of format expansion. In practice the language of functional mnemonics requires additional development and standardization with regards to its potential compatibility with notation of addressing classes and methods in object-oriented web-programming languages such as Java. The same principles can be employed for purposes of closer integration with such promising technologies as Web Services, which is suggested by the query format. The expansion of the query standard to make it compatible with database query language is also possible. Here we are just charting the possible new developments of the concept. Even further expansion of the scheme’s capabilities can be achieved by using functional separators: hyphens, underscores, twiddles and other symbols allowed within a domain name. However, such use requires additional conceptualization and standardization within a particular project or universal standard. The major implementation problems of the proposed software algorithms are related to the methods of parsing the mnemonics string (URL). Here the problem of repeating mnemonics and the issue of logical connections need to be dealt with. To illustrate the first problem we will use one of the examples given above. According to the proposed scheme a collection of dissertations dealing with Russia and Taiwan can be described as http://disser.ru.tw.<portal name>. If we decide to limit this set to documents available in the Russian language, one of the ways to do so would be to introduce an additional mnemonic “ru”, for instance http://disser.ru.tw.ru.<portal name>. Such query string can not be successfully parsed by software semantic algorithm due to an obvious ambiguity: it is impossible to identify which of the mnemonics refers to the language of the resource and which one — to the geographical region the resource deals with. This problem can be solved in three different ways. The first of them is imposing a strict order of mnemonics, where the classification parameter each mnemonic refers to is determined by its location within the URL. This method is undesirable for a number of reasons: · all locations within the URL need to be filled, which necessitates the use of functional/logical mnemonics, such as ANY or ALL, perhaps, a number of them in a single URL; · the order of mnemonics within the URL is pre-determined; a single mistake renders the whole query meaningless. Obviously, such requirements make the system too complicated for an average user. The second way is to use unique mnemonic identifiers, none of which occurs in more than one part of the classification scheme (which makes their order within the URL insignificant). This method also has its disadvantage — the need to create a large base of mnemonics, which are constantly controlled for non-repetition and can ultimately become quite confusing. Both these methods largely deprive the initial concept of its flexibility and are therefore undesirable. To solve this problem within the particular project the use of alternative separators was proposed with a further possibility of using “meta-mnemonics”, i.e., compound mnemonics, which are analyzed by the server software separately. With this approach the above example can look like: http://disser.ru.tw.lang_ru.<portal name> Here “lang” and “ru” are discrete mnemonics, connected by a separator, which prompts the server software to parse this part of the URL in a particular way. Similarly, other mnemonics of the classification scheme can be connected according to certain rules, which substantially expands the capabilities of the scheme. Alternatively, the proposed method can be combined with the “traditional” one, e.g., http://disser.ru.tw.<portal name>/?lang=ru Although this approach is less desirable, as it harms the concept orderliness and the URL readability, the use of such combination is nevertheless possible. The second potential problem is the possible logical ambiguity of the query/naming string. Thus, by default, when dots are used as separators, logical AND is implied, although some search queries may necessitate the use of OR, exclusive OR and NOT logical operators. For the particular project we have chosen to use alternative separators for solving this problem. The use of functional mnemonics prompting the server software a particular way of parsing and transforming the URL string is another possible option. Conclusion: The Way to Go The proposed scheme of internet resources naming, classification and categorization based on mnemonic combinations and dynamic updating of DNS-servers’ databases makes it possible to create a user-friendly and intuitive system of addressing both internal and external resources. The advantages of the method include, among other things, its potential for the integration of online and offline resources. For instance, indicating mnemonics-based URL in a printed publication shows its reader an easy way to all related documents within the portal, regardless of when they were published or when the book was read. Another advantage of the system is the possibility of creating “search expectation” queries for selected categories of documents. As each mnemonics-based URL is associated with a dynamically updated group of documents, easy “bookmarking” of search results pages becomes possible, allowing the user to check for portal updates on topics of interest without re-typing search queries or creating update alerts. In general, we feel that mnemonics-based method would be particularly helpful for users who often search for documents pertaining to a limited number of issue areas or who are interested in keeping track of new developments in the field of their interest (which is often the case for researchers). The addressing scheme appears quite flexible and can be used not only for the classification of ontologies and document repositories, but also for searching and cataloguing diverse information in various business-applications. The search semantics in the proposed scheme correlates well with the semantics of object-oriented programming languages, which enables easy translation of URLs into database search queries as well as creating web-services on that basis. The scheme also allows for the use of “functional mnemonics” and any set of standardizable separators. Therefore this scheme seems to be a useful means of organizing virtual distributed document repositories, such as professional portals and other social knowledge management instruments. A similar resource organization method can be employed in social blog systems by developing a scheme that could involve a separate concurrent resource classification (through user-defined descriptors). In case of the portal developed at MGIMO, provisions are made for user-driven parallel categorization of resources, similar to tagging systems such as Del.icio.us (whether using existing categories or proposing new ones). This feature of the portal is expected not only to help correct mis-categorizations but also to provide valuable data for usage patterns analysis. As the audience of the portal mainly consists of scholars and students of international relations, usage trends (including those discovered by analyzing user-driven categorization) can yield interesting observations about the very research field of international relations and world politics. The collection of information on concurrent classification makes it possible to analyze usage patterns, associational links, etc., which is applicable to tasks of business analysis, information theory and others. The potential of such approach for marketing and e-government projects is quite obvious as well. The authors are grateful to MGIMO-University and Mr. Frederick Paulsen, whose invaluable support made this work possible. |