This is something I’ve always wanted to do, having a data set such as every ping-able IP, every server with port 80 exposed, lookups on every domain name or IP and so on are very useful. But the bandwidth and computation needed for this is often out of reach. Except now with services such as EC2 it is within reach, figure 1k data sent and received for an IP, a class A scan would only take 32 gigabytes in total. Figure 1 month of Small Linux instance machine time at Amazon and you’re looking at $63.60 (cheaper if you use a reserved instance or a spot instance!). So for a few thousand dollars you can now easily scan the entire Internet or create other similarly large data sets for a reasonable price.