In this paper, we observe that many Web pages contain geolocation information (address, zipcode, and telephone area code) and many of these geolocation items are directly related to the locations of the IP addresses that host the Web pages. We then design {it Structon}, a system that mines Web pages for IP address geolocations. In Structon, we first extract geolocation information from every crawled Web pages, we then devise a serial of information clustering, false-inform-ation filtering, error-correction, and location inferring algorithms to map IP addresses to geolocations. We have run our algorithms on top of a set of 74M Chinese Web pages, from which we are able to identify the geolocations for 8.2M IP addresses, which contain addresses for not only Web servers but also client hosts. We have verified our result with an IP address location table of a major Chinese ISP, the verification shows that the accuracy of Structon is 94.4% at province level.
The FreePint Family is a family of resources to help information workers be more effective, raise the value of information in their organisations and contribute to success.
'FreePint... provides most of my professional development because it won't come through work and [other resources] just don't cut it.'
FUMSI Forum: Do you have a research question? Post it to the FUMSI Forum, where professionals share Q&A and useful tips on how to Find, Use, Manage and Share Information. It's free.