Some really interesing data mining, text mining, visualization, and NLP work follows. Quite addictive. We think it will be of interest to many of you.
We've just started to demo TextMap. The service debuted about 10 months ago but last week added many improvements
Look for a more indepth look at these interesting and useful services coming soon. TextMap is a research project from the computer science department at Stony Brook University.
TextMap tracks references to people, places, and things appearing in news text, so as to identify meaningful relationships between them. TextMap monitors the state of the world by analyzing both the temporal and spatial distribution of these entities. We currently analyze over 1000 domestic and international news sources every day. TextMap uses natural language processing techniques to identify entity references and a variety of statistical techniques to analyze the juxtapositions between them.
Quick Notes:
TextMap is a search engine for entities: the important (and not so important) people, places, and things in the news. Our news analysis system automatically identifies and monitors these entities, and identifies meaningful relationships between them.
++ Alphabetical lists of entities in the following categories:
+ Person
+ City
+ Agency
+ Company
+ University
+ Drug
+ Website
+ Title
+ Country
+ Disease
+ State
A Results Page includes:
1. Favorite Things:
Which entities in each semantic category are most strongly associated with the given entity.
Example: John Edwards Favorite Things include:
Barack Obama
Chapel Hill, NC
2. Relational Networks:
Look and see the most important associates of the given entity and the relationships between them. Clusters are highlighted by colors, strength of association by line types. Click on the vertex to take you to the appropriate entity page.
Three categories: Historic, Year, 30 Days.
3. Articles:
We present a brief selection of recent articles referring to the entity with more articles available.
4. Entity Juxtapositions:
Which entities are most strongly associated with our target, and how does that vary over three time scales. The size of the colored bars reflect the popularity of the given entity (orange) as well as the strength of association with the target (blue). Clicking on the bars brings you to recent articles linking both entities.
5. Relations:
How can we explain the nature of the relationship between two entities? A larger selection is available on demand.
6. Aliases:
What other names is the entity also known as.
7. Popularity Time Series:
Tracks fluctuations in the reference frequency of the target entity. References are partitioned by article type (news, sports, business or entertainment) to highlight the nature of the entity.
8. Sentiment Index:
How well is the entity thought of within various communities and contexts? Our sentiment index gives a percentile score of the entity's standing along several dimensions as well as whether it is increasing or decreasing.
9. Sentiment Graphs:
Here we plot changes in the general sentiment regard for our target entity in terms of favorable (red) versus underfavorable (blue) and passionate versus apathetic.
10. Heatmap:
Shows the relative interest in an entity at every locale in the United States. We have developed a geographic model of news influence which enables us to gauge the relative amount of exposure a given entity has received as a function of location.
A family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success. Read more »
Recently I have found myself cooing over visualisation maps (and heat maps) of health and well being resources. The content rich data is overlayed with mapping technologies, and some interesting themes and patterns are emerging.
A lot of the talk around social media in the last year has been around information overload. Social media has provided us with new and exciting ways to create content. But it has also meant learning new ways to manage and engage with social media tools. Are we teetering on the edge of an information overload precipice?
Information overload is a figment of your imagination. Or a failure of your filter. Or a symptom of your technological submissiveness. Depends on who you ask.
What if you had to sort through 3.5 million articles and social media posts a day and try to pull out the most relevant items for your organisation? What if you then had to cobble it all together into something readable for your top groups and executives in your organisation?
Alacra Compliance saves time by aggregating information from both free and fee-based sources and enabling users to conduct an accurate federated search across these sources (coined “simultaneous search” by Alacra).