The main project contains a series of research and education
activities expected to generate significant original
contributions in Electronic Government Systems, Distributed
Database Systems, Information Retrieval Systems and Social Trust
of Technology. More specifically the proposal will conduct
research in the following areas:
• Multi-lingual information archiving and retrieval of
governmental repositories
• Automatic representation and extraction of semantic
information from government documents
• Wide-area secure collaborative government-government databases
• Economic and social barriers to technology adoption by common
citizens
These contributions will have impact on the current research
agenda in these fields, improving the visibility of the
University of Puerto Rico, Mayagüez (UPRM) in these research
communities. Moreover, the associated educational activities
will also strengthen the academics programs at UPRM helping our
institution grow into an important center for computing research
and education in Latin America and the Caribbean.
Multi-lingual information archiving and retrieval of
governmental repositories Puerto Rico has two official languages: Spanish and English.
Therefore, governments in Puerto Rico at all levels are
bilingual by statute. Although experimentation with
multi-lingual information retrieval dates back at least thirty
years, the advent of the Internet and the World Wide Web has
catalyzed a renovated interest in these types of information
retrieval (IR) systems. The most frequently studied model of a
multi-lingual IR system is the cross-lingual IR system (CLIR).
CLIR systems are typically based on the assumption that users
are unequally familiar with different languages; they may know
one source language well enough to express their information
needs effectively, but may also know one or more additional
target languages just well enough to be able to extract
information from documents authored in such languages. As a
result of this emphasis on CLIR, much effort has been devoted to
the study of the query translation problem [e.g. 3, 4, 17]. In
particular, several projects have attempted to deal with the
term disambiguation problem that results from translating terms
that have multiple meanings. In these studies the most common
performance metric compares the performance of a CLIR system
with different source and target languages, versus the
performance of the system with equal source and target
languages. The performance of the cross lingual system is
measured as a percentage of its monolingual counterpart. This
metric is designed to measure the effectiveness of the
translation, a not necessarily the effectiveness of the
retrieval itself.
This project addresses the need of a sector of the population
that has sufficient language proficiency to both effectively
express information needs and read documents using multiple
languages. The emphasis is thus on fully multi-lingual
information retrieval (MLIR); multi-lingual users writing
multi-lingual queries in order to retrieve multi-lingual data.
This change of assumptions has an important impact on the
prioritization of the research issues that must be addressed. It
is a well-known fact that most users tend to express their
information needs using short (2-3 word) queries. Anecdotal
experience indicates that users can often manually translate
such queries in a small fraction of time. This suggests that the
research problem of utmost importance for MLIR does not lie on
the query translation stage, but rather on later stages of the
retrieval process including document ranking, document
categorization and result set visualization.
Automatic representation and extraction of semantic information
from government documents
Governments are collectors, interpreters and maintainers of very
large public data sets. Often, these data sets must be
maintained in perpetuity. An interesting example of a government
agencia that must play all these roles simultaneously is the
land registry. The land registry office is a key government
agency for all societies, as it is the repository for the titles
to real estate properties. In Puerto Rico, the land registry is
manually maintained, with the result that the system is said to
have a backlog of six years. Every year about 250,000 documents
are presented to the Registro de la Propiedad de Puerto Rico.
Each and every one of these documents must be analyzed and
summarized by an expert person into a document called a minuta,
and then revised and approved before it is set into the land
registry archives. This results in that the persons must be
privately employed to make title searches, that a property may
be sold several times before the land registry catches up with
the paperwork, that land taxes are difficult to collect, and
that these difficulties may increase the opportunities for fraud
and theft. The most amazing fact is that most of the documents
currently being presented to the Registro de la Propiedad were
at one time in electronic form within a computer, and yet they
must go through a manual process before its acceptance into the
archives.
In Puerto Rico, property taxes constitute the main economic
foundation upon which the autonomous finances of towns and
cities are based. Effective management of land registry data is
essential for the financial health of regional governments. The
document semantics component of the project will address the
issues involved in the automation of the land registry in Puerto
Rico, but since this is a system steeped in tradition and legal
requirements, the process must be done in steps. The first
requirement that we must fulfill is to make the system backwards
compatible with the existing system. We may place other
requirements on the longevity and characteristics of the storage
media, on the encryption and legal issues behind the electronic
signatures and transmittal of sensitive documents.
The proposal hinges on using the eXtensible Markup Language
(XML) to represent semantic information automatically extracted
from documents submitted to the registry. XML has become the
lingua franca for e-commerce and heterogeneous databases. XML is
also the vehicle of choice for the Semantic Web put forth by Tim
Berners-Lee. By developing the domain narrow domain of knowledge
of contracts and deeds in Spanish, our team will be expanding
the necessary tools to converse with other similar domains and
at the same time open new avenues for electronic access to
government databases and services. XML allows annotation of the
original documents presented to the Registro de la Propiedad,
but in such a way that the original text is preserved while the
electronic version contains all the information that is required
for electronic search, analysis, retrieval, and manipulation.
Wide-area secure collaborative government-government databases
The emergence of wide-area networks such as the Internet has
provided users with access to vast amounts of rich data sets
that are located on data sources distributed over these
networks. For the past twenty years, researchers in the area of
Distributed Database Systems have concentrated on the problems
of heterogeneous data integration [8, 62, 74], distributed query
processing [20, 30, 44], and distributed transaction processing.
Database Middleware Systems were developed to integrate
heterogeneous collections of data sources distributed over a
computer network. Notice that Database Middleware differs from
Network Middleware such as CORBA, RMI, .NET and RPC since the
latter are used as an infrastructure layer to provide
applications access to the network. Database Middleware is at a
higher layer, and can leverage on the Network Middleware for
connectivity purposes. But neither CORBA, RMI nor .NET provide
services such as SQL query execution, and schema mapping
routinely found in Database Middleware. Typically, Database
Middleware Systems follow an architecture centered on a data
integration server, which provides client applications with a
uniform view and a uniform access mechanism to the data
available in each source. Database gateways [33, 51] and
Mediator systems [8,60] are the best known examples of database
middleware.
Next-generation Government Information Systems will incorporate
hundreds, perhaps thousands, of diverse data sources located on
geographically distributed networks like the Internet. In these
types of large-scale distributed environments, heterogeneity in
terms of hardware devices, software components, network
connectivity and system configuration will be a fundamental
characteristic of the data sources. Each government agency will
have its own set of applications, server equipment and policies
to access the data. The data sources might reside on high-end
servers, desktop computers, mobile laptop computers, hand-held
devices, intelligent sensors and appliances, or embedded
computer system. All these data has to be made accessible from
its location and all these distributed data combined into new
abstractions to be useful to the end-users.
Data integration and interoperation between these data sources
will be a critical requirement to harvest the vast amounts of
valuable information stored and maintained in government
databases. Information could be extracted from any available
data source, whether it is a satellite image from a Geographic
database, or a phone book list, encoded in XML, that is
extracted from a Palm-Pilot. Therefore, a government data source
site cannot be defined based on the size of stored data sets, or
on the software environment being run, but rather, on whether
other sites (government or private) in the system retrieve the
information held by the data source. In other words, a data
source is any site that provides a service to access some kind
of data. Clearly, the distinction between what constitutes a
client site and what constitutes a server site will be blurred,
since any site can act as a client or as a service provider to
another site in the system. Moreover, the sheer number and
diversity of data sources implies that there cannot be a single
authority that effectively coordinates and controls the access
to data, or to the computational services in the system. These
observations points us in the direction of a peer-to-peer
dynamic environment in which any government site can request or
serve data, and must engage in a cooperative effort aimed at
satisfying the requests for data and services associated with
the queries posed by interested end-users. We call this type of
scheme collaborative government-government database
interactions.
Economic and social barriers to technology adoption by common
citizens
During the last five years information technology innovations
have revolutionized our political and social institutions and
the role of citizens within them. Puerto Rico has the highest
per capita incidence of Internet use in Latin America, with one
out of every six Puerto Ricans being an Internet user. (Lama
Bonilla, 2001). The development of digital government programs
could transform how Puerto Ricans obtain government services and
information and become involved as citizens. Within this
context, it is essential to evaluate the sociopolitical
implications of electronic government programs and
interventions.
The Social Thrust component of this project has five main goals.
The first is to determine the current extent of Internet access
and use among the population of the research area. This involves
the following:
1. Estimation of the extent of Internet access among the
population of the research municipality. This includes
estimating true phone coverage (including intermittent service),
computer ownership and other forms of Internet access, including
government provided electronic libraries.
2. Estimation of actual Internet usage, including time spent
online and the nature of online activity with particular
emphasis on identifying access to governmental resources.
3. Description of extent and nature of barriers to Internet
access and usage.
4. Measure changes in the above as the project products become
available.
5. Evaluate the effects of age, education, gender, and income on
Internet access
All of these will be analyzed taking into account geographic and
socio-economic factors that may contribute to explaining
variance in the observed phenomena so that we improve our
understanding of how e-government innovation may either improve
democracy by making government more accessible, or increase
biases in political participation by magnifying the advantages
that privileged social groups enjoy.
Second, we will describe and explain the relationship between
Internet access and usage with different forms of political
participation such as voting, attending meetings and belonging
to organizations (particularly community based groups).
Third, we will study what government services would citizens
want to be electronically provided, their expectations and their
concerns about the digital government.
Fourth, based on the above, we will make policy recommendations
concerning the location, design and services that should be
provided in public electronic libraries to maximize access and
use of the new technologies for underprivileged groups.
Fifth, we will carryout an ongoing process and outcome
evaluation for the project considering:
1. Changes in Internet usage.
2. Usage of project products.
3. User’s satisfaction with said products.
|