Multidisciplinary Research/Education in Digital Governments


The main project contains a series of research and education activities expected to generate significant original contributions in Electronic Government Systems, Distributed Database Systems, Information Retrieval Systems and Social Trust of Technology. More specifically the proposal will conduct research in the following areas:

• Multi-lingual information archiving and retrieval of governmental repositories
• Automatic representation and extraction of semantic information from government documents
• Wide-area secure collaborative government-government databases
• Economic and social barriers to technology adoption by common citizens

These contributions will have impact on the current research agenda in these fields, improving the visibility of the University of Puerto Rico, Mayagüez (UPRM) in these research communities. Moreover, the associated educational activities will also strengthen the academics programs at UPRM helping our institution grow into an important center for computing research and education in Latin America and the Caribbean.

Multi-lingual information archiving and retrieval of governmental repositories

Puerto Rico has two official languages: Spanish and English. Therefore, governments in Puerto Rico at all levels are bilingual by statute. Although experimentation with multi-lingual information retrieval dates back at least thirty years, the advent of the Internet and the World Wide Web has catalyzed a renovated interest in these types of information retrieval (IR) systems. The most frequently studied model of a multi-lingual IR system is the cross-lingual IR system (CLIR). CLIR systems are typically based on the assumption that users are unequally familiar with different languages; they may know one source language well enough to express their information needs effectively, but may also know one or more additional target languages just well enough to be able to extract information from documents authored in such languages. As a result of this emphasis on CLIR, much effort has been devoted to the study of the query translation problem [e.g. 3, 4, 17]. In particular, several projects have attempted to deal with the term disambiguation problem that results from translating terms that have multiple meanings. In these studies the most common performance metric compares the performance of a CLIR system with different source and target languages, versus the performance of the system with equal source and target languages. The performance of the cross lingual system is measured as a percentage of its monolingual counterpart. This metric is designed to measure the effectiveness of the translation, a not necessarily the effectiveness of the retrieval itself.

This project addresses the need of a sector of the population that has sufficient language proficiency to both effectively express information needs and read documents using multiple languages. The emphasis is thus on fully multi-lingual information retrieval (MLIR); multi-lingual users writing multi-lingual queries in order to retrieve multi-lingual data. This change of assumptions has an important impact on the prioritization of the research issues that must be addressed. It is a well-known fact that most users tend to express their information needs using short (2-3 word) queries. Anecdotal experience indicates that users can often manually translate such queries in a small fraction of time. This suggests that the research problem of utmost importance for MLIR does not lie on the query translation stage, but rather on later stages of the retrieval process including document ranking, document categorization and result set visualization.

Automatic representation and extraction of semantic information from government documents

Governments are collectors, interpreters and maintainers of very large public data sets. Often, these data sets must be maintained in perpetuity. An interesting example of a government agencia that must play all these roles simultaneously is the land registry. The land registry office is a key government agency for all societies, as it is the repository for the titles to real estate properties. In Puerto Rico, the land registry is manually maintained, with the result that the system is said to have a backlog of six years. Every year about 250,000 documents are presented to the Registro de la Propiedad de Puerto Rico. Each and every one of these documents must be analyzed and summarized by an expert person into a document called a minuta, and then revised and approved before it is set into the land registry archives. This results in that the persons must be privately employed to make title searches, that a property may be sold several times before the land registry catches up with the paperwork, that land taxes are difficult to collect, and that these difficulties may increase the opportunities for fraud and theft. The most amazing fact is that most of the documents currently being presented to the Registro de la Propiedad were at one time in electronic form within a computer, and yet they must go through a manual process before its acceptance into the archives.

In Puerto Rico, property taxes constitute the main economic foundation upon which the autonomous finances of towns and cities are based. Effective management of land registry data is essential for the financial health of regional governments. The document semantics component of the project will address the issues involved in the automation of the land registry in Puerto Rico, but since this is a system steeped in tradition and legal requirements, the process must be done in steps. The first requirement that we must fulfill is to make the system backwards compatible with the existing system. We may place other requirements on the longevity and characteristics of the storage media, on the encryption and legal issues behind the electronic signatures and transmittal of sensitive documents.

The proposal hinges on using the eXtensible Markup Language (XML) to represent semantic information automatically extracted from documents submitted to the registry. XML has become the lingua franca for e-commerce and heterogeneous databases. XML is also the vehicle of choice for the Semantic Web put forth by Tim Berners-Lee. By developing the domain narrow domain of knowledge of contracts and deeds in Spanish, our team will be expanding the necessary tools to converse with other similar domains and at the same time open new avenues for electronic access to government databases and services. XML allows annotation of the original documents presented to the Registro de la Propiedad, but in such a way that the original text is preserved while the electronic version contains all the information that is required for electronic search, analysis, retrieval, and manipulation.

Wide-area secure collaborative government-government databases

The emergence of wide-area networks such as the Internet has provided users with access to vast amounts of rich data sets that are located on data sources distributed over these networks. For the past twenty years, researchers in the area of Distributed Database Systems have concentrated on the problems of heterogeneous data integration [8, 62, 74], distributed query processing [20, 30, 44], and distributed transaction processing. Database Middleware Systems were developed to integrate heterogeneous collections of data sources distributed over a computer network. Notice that Database Middleware differs from Network Middleware such as CORBA, RMI, .NET and RPC since the latter are used as an infrastructure layer to provide applications access to the network. Database Middleware is at a higher layer, and can leverage on the Network Middleware for connectivity purposes. But neither CORBA, RMI nor .NET provide services such as SQL query execution, and schema mapping routinely found in Database Middleware. Typically, Database Middleware Systems follow an architecture centered on a data integration server, which provides client applications with a uniform view and a uniform access mechanism to the data available in each source. Database gateways [33, 51] and Mediator systems [8,60] are the best known examples of database middleware.

Next-generation Government Information Systems will incorporate hundreds, perhaps thousands, of diverse data sources located on geographically distributed networks like the Internet. In these types of large-scale distributed environments, heterogeneity in terms of hardware devices, software components, network connectivity and system configuration will be a fundamental characteristic of the data sources. Each government agency will have its own set of applications, server equipment and policies to access the data. The data sources might reside on high-end servers, desktop computers, mobile laptop computers, hand-held devices, intelligent sensors and appliances, or embedded computer system. All these data has to be made accessible from its location and all these distributed data combined into new abstractions to be useful to the end-users.

Data integration and interoperation between these data sources will be a critical requirement to harvest the vast amounts of valuable information stored and maintained in government databases. Information could be extracted from any available data source, whether it is a satellite image from a Geographic database, or a phone book list, encoded in XML, that is extracted from a Palm-Pilot. Therefore, a government data source site cannot be defined based on the size of stored data sets, or on the software environment being run, but rather, on whether other sites (government or private) in the system retrieve the information held by the data source. In other words, a data source is any site that provides a service to access some kind of data. Clearly, the distinction between what constitutes a client site and what constitutes a server site will be blurred, since any site can act as a client or as a service provider to another site in the system. Moreover, the sheer number and diversity of data sources implies that there cannot be a single authority that effectively coordinates and controls the access to data, or to the computational services in the system. These observations points us in the direction of a peer-to-peer dynamic environment in which any government site can request or serve data, and must engage in a cooperative effort aimed at satisfying the requests for data and services associated with the queries posed by interested end-users. We call this type of scheme collaborative government-government database interactions.

Economic and social barriers to technology adoption by common citizens

During the last five years information technology innovations have revolutionized our political and social institutions and the role of citizens within them. Puerto Rico has the highest per capita incidence of Internet use in Latin America, with one out of every six Puerto Ricans being an Internet user. (Lama Bonilla, 2001). The development of digital government programs could transform how Puerto Ricans obtain government services and information and become involved as citizens. Within this context, it is essential to evaluate the sociopolitical implications of electronic government programs and interventions.

The Social Thrust component of this project has five main goals. The first is to determine the current extent of Internet access and use among the population of the research area. This involves the following:

1. Estimation of the extent of Internet access among the population of the research municipality. This includes estimating true phone coverage (including intermittent service), computer ownership and other forms of Internet access, including government provided electronic libraries.
2. Estimation of actual Internet usage, including time spent online and the nature of online activity with particular emphasis on identifying access to governmental resources.
3. Description of extent and nature of barriers to Internet access and usage.
4. Measure changes in the above as the project products become available.
5. Evaluate the effects of age, education, gender, and income on Internet access

All of these will be analyzed taking into account geographic and socio-economic factors that may contribute to explaining variance in the observed phenomena so that we improve our understanding of how e-government innovation may either improve democracy by making government more accessible, or increase biases in political participation by magnifying the advantages that privileged social groups enjoy.

Second, we will describe and explain the relationship between Internet access and usage with different forms of political participation such as voting, attending meetings and belonging to organizations (particularly community based groups).

Third, we will study what government services would citizens want to be electronically provided, their expectations and their concerns about the digital government.

Fourth, based on the above, we will make policy recommendations concerning the location, design and services that should be provided in public electronic libraries to maximize access and use of the new technologies for underprivileged groups.

Fifth, we will carryout an ongoing process and outcome evaluation for the project considering:

1. Changes in Internet usage.
2. Usage of project products.
3. User’s satisfaction with said products.


| Home | News Archive | People | Projects | Demos/Protoypes | Publications | Others | Contact Info |
Advanced Data Management Group - University of Puerto Rico at Mayagüez
Research & Development Center. Office 208, Road 108 Km. 1.0, Miradero, Mayagüez, PR 00680
Copyright 2004 ADMG