Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact firstname.lastname@example.org with any questions.
Recently there have been a lot of researches for summarizing news stream and for detecting edges of new events in the news stream. But, in these tasks, all data are assumed to carry timestamp (temporal information). It is noteworthy that news articles without timestamp can't make any contribution to these tasks. In this investigation, we propose a new technique to estimate timestamps to any news articles using small number of incomplete news corpus. Here we learn temporal information and topic information by means of both EM algorithm and incremental clustering, then we estimate timestamp of news article based on events that are discussed in news corpus. In this work, we examine TDT2 corpus and we show how well our approach works by some experiments.
There are a lot of very important data in database, which need to be protected from attacking. Cryptographic support is an important mechanism of securing them. People, however, must tradeoff performance to ensure the security because the operation of encryption and decryption greatly degrades query performance. To solve such a problem, an approach is proposed that can implement SQL query on the encrypted character data. When the character data are stored in the form of cipher, we not only store the encrypted character data, but also turn the character data into the characteristic values via a characteristic function, and store them in an additional field. When querying the encrypted character data, we apply the principle of two-phase query. Firstly, we implement a coarse query over the encrypted data in order to filter the records not related to the querying conditions. Secondly, we decrypt the rest records and implement a refined query over them again. Results of a set of experiments validate the functionality and usability of our approach.
A new challenge in Web usage analysis is how to manage and discover informative patterns from various types of Web data stored in structured or unstructured databases for system monitoring and decision making. In this paper, a novel integrated data warehousing and data mining framework for Website management and patterns discovery is introduced to analyze Web user behavior. The merit of the framework is that it combines multidimensional Web databases to support online analytical processing for improving Web services. Based on the model, we propose some statistical indexes and practical solutions to intelligently discover interesting user access patterns for Website optimization, Web personalization and recommendation etc. We use the Web data from a sports Website as data sources to evaluate the effectiveness of the model. The results show that this integrated data warehousing and mining model is effective and efficient to apply into practical Web applications.
Condition number of a matrix is an important measure in numerical analysis and linear algebra. The general approach to obtaining it is through direct computation or estimation. The time and memory cost of such approaches are very high, especially for large size matrices. We propose a totally different approach to estimating the condition number of a sparse matrix. That is, after computing the features of a matrix, we use support vector regression (SVR) to predict its condition number. We also use feature selection strategies to further reduce the response time and improve accuracy. We use a feature selection criterion which combines the weights from SVR and the weights from comparison of matrices with their preconditioned counterparts. Our experiments show that the response time of the prediction method is on average 15 times faster than the direct computation approaches, which makes it suitable for online condition number query. The accuracy of our prediction method is not as precise as the general direct computation methods. However, many people only care about whether a matrix is well-conditioned or ill-conditioned or the order of the condition number, not the exact value of the condition number. For such users, a rough prediction with quick response time probably is a better choice than a precise value after waiting for hours or days.