Jul. 6, 2009 - Search Engines and the MLS Data Scraping Question revisited
This is a continuation of Search Engines and the MLS Data Scraping Question (http://www.realtown.com/mattcohen/blog/scraping-and-search-engines). That blog post started a lot of conversation, and Brian Larson did an excellent job of filling in a lot of the detail on his blog, starting here (http://www.mlstesseract.com/2009/06/search-engines-indexing-idx-sites.html). So far, he and I seem to be substantially in agreement, and after his five part series we are generally to the same point:
Our industry has a long way to go in discussing how it relates to the Internet - and topics such as how data is used by search engines, syndicators and others are ripe, perhaps over-ripe, for discussion. Through that discussion, policy around use of data should be developed. How one broker uses another broker's listings online is especially to be considered. This policy needs to be reflected on web sites in Terms of Use, anti-scraping, and other technical details. That said, we must ensure that policy enacted in our own industry does not disadvantage brokers online, in relation to sites that the policy does not or cannot affect.
When a search engine blurs the line between their traditional role as a "conduit" site with a role as a "destination" in its own regard, indexing may be more controversial. See how Google is using real estate data in Australia: http://maps.google.com.au/help/maps/realestate/. Google is now also heading in the same direction in select U.S. cities.
There has been an expectation that, when a search engine crawls your site, its purpose was to allow the public to enter search terms, get back a link with a small amount of text under it, and encourage the public user to click on the link and visit your site. This is referred to as the search engine being a "conduit". When a search engine crawls your site and not only indexes your content, but stores a copy of your data and presents that content - perhaps in conjunction with other content - it can become a "destination" site in its own regard.
Though in the example / URL provided, Google still links out to an original source of the data, getting users to that source may not be the primary focus of the page. What would you think of Google if the focus was on the "More info" link and you only saw a link of traditional destination sites when you clicked on the "Web Pages" tab? How about if the design changed further and there was a LOT more content on Google - public records data, demographics, etc.? Or what if Google added additional functionality - what if users could bookmark their favorite listings and share them with friends? What if they could get email updates or RSS updates via Google Reader when new matches to their criteria were found?
Where does a site cross the line from being a search engine and start seeming like any other 'scraper'?
As per my original posting on this subject, I still believe usage is at the heart of the IDX / search engine policy question. Ideally, there should be rigorous strategic discussions of how the listings are used by various parties today - and how they might be used tomorrow.
|