Powered by RealTown Blogs
Search Engines and the MLS Data Scraping Question revisited : Matt's Real Estate Technology Blog
Clareity ConsultingReal Estate Information Technology Consultants
Home PageAbout ClareityServicesClientsPublicationsEventsContact

Matt's Real Estate Technology Blog

Jul. 6, 2009 - Search Engines and the MLS Data Scraping Question revisited

This is a continuation of Search Engines and the MLS Data Scraping Question (http://www.realtown.com/mattcohen/blog/scraping-and-search-engines). That blog post started a lot of conversation, and Brian Larson did an excellent job of filling in a lot of the detail on his blog, starting here (http://www.mlstesseract.com/2009/06/search-engines-indexing-idx-sites.html). So far, he and I seem to be substantially in agreement, and after his five part series we are generally to the same point:

Our industry has a long way to go in discussing how it relates to the Internet - and topics such as how data is used by search engines, syndicators and others are ripe, perhaps over-ripe, for discussion. Through that discussion, policy around use of data should be developed. How one broker uses another broker's listings online is especially to be considered. This policy needs to be reflected on web sites in Terms of Use, anti-scraping, and other technical details. That said, we must ensure that policy enacted in our own industry does not disadvantage brokers online, in relation to sites that the policy does not  or cannot affect.

When a search engine blurs the line between their traditional role as a "conduit" site with a role as a "destination" in its own regard, indexing may be more controversial. See how Google is using real estate data in Australia:  http://maps.google.com.au/help/maps/realestate/. Google is now also heading in the same direction in select U.S. cities.

There has been an expectation that, when a search engine crawls your site, its purpose was to allow the public to enter search terms, get back a link with a small amount of text under it, and encourage the public user to click on the link and visit your site. This is referred to as the search engine being a "conduit". When a search engine crawls your site and not only indexes your content, but stores a copy of your data and presents that content - perhaps in conjunction with other content - it can become a "destination" site in its own regard.

Though in the example / URL provided, Google still links out to an original source of the data, getting users to that source may not be the primary focus of the page. What would you think of Google if the focus was on the "More info" link and you only saw a link of traditional destination sites when you clicked on the "Web Pages" tab? How about if the design changed further and there was a LOT more content on Google - public records data, demographics, etc.? Or what if Google added additional functionality - what if users could bookmark their favorite listings and share them with friends? What if they could get email updates or  RSS  updates via Google Reader when new matches to their criteria were found?

Where does a site cross the line from being a search engine and start seeming like any other 'scraper'?

As per my original posting on this subject, I still believe usage is at the heart of the IDX / search engine policy question. Ideally, there should be rigorous strategic discussions of how the listings are used by various parties today - and how they might be used tomorrow.  

Comments (6) :: Post A Comment! :: Permanent Link
View more entries tagged with: , , ,


Jul. 7, 2009 - RE: Search Engines and the MLS Data Scraping Question revisited

Posted by victor lund

Here is how anyone can pretend to be a google bot and scrape data.

 

http://www.addictivetips.com/internet-tips/access-any-website-or-forum-without-registering/

Permanent Link


Jul. 7, 2009 - RE: Search Engines and the MLS Data Scraping Question revisited

Posted by Matt Cohen

You don't need to pretend - over 90% of industry sites have NO protection against scraping at all.

"User Agent" information - supplied by the client rather than the server - has never been a mechanism of providing security.

Permanent Link


Jul. 7, 2009 - RE: Search Engines and the MLS Data Scraping Question revisited

Posted by Matt Cohen

BTW, Victor, I've used that plugin for ages ... if you ever see Apple ][+  as a platform in your web logs, it might be me!

Permanent Link


Jul. 7, 2009 - RE: Search Engines and the MLS Data Scraping Question revisited

Posted by Lin McIntosh

There is no MLS in Australia, and the auction is the primary method to sell property.  If a property does not sell at auction, it is considereed "passed over" and is tarnished.  It is a perfect place for Google maps to set up business. Because there is no MLS, each broker has to get a separate signed commission agreement, resuling in 10-15 separate brokerage signs in the yards of the available property. As a result, most sales are made without a broker.  Commissions are set by the state government and were a maximum of 2% the last time I was there.  It's amazing what the MLS and the associations do to increase the professionalism of the industry ,.... you have to go somewhere there is none before you can appreciate what a professional association does for the industry.

Permanent Link


Jul. 14, 2009 - RE: Search Engines and the MLS Data Scraping Question revisited

Posted by Paul Trippett

One option could be to use the metatag to stop google from archiving the page.

<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE"> 

Permanent Link


Jul. 14, 2009 - RE: Search Engines and the MLS Data Scraping Question revisited

Posted by Matt Cohen

Yes Paul, or something like that:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

In my previous post where we discussed anti-scraping methods, I and comments made various references to NOINDEX. Do keep in mind that this does not prevent robots from crawling your site - it's just a request for them not to do so.

The point of this post was less to recap those mechanisms and more to ask the question, "When does a search engine become a destination site and its activities become scraping because they are used for a purpose other than that one expects from a search engine?"

 


Permanent Link


Write a Comment

Your Name:  RealTown Members: Click here to login
Your E-Mail: 
Your Website: 
Subject: 
Your Comment: 
Notifications: 
Privacy: 
Verification: 
To verify that you are a human and not a script, please enter the verification word from the image into the box on the right.
 

Matt Cohen
Matt Cohen has consulted to MLSs, Associations, franchises, brokerages, and many real estate industry software companies for over 12 years. Matt is a well-regarded real estate industry expert on industry trends, software design, product management, project management, and information security. Matt speaks at conferences, workshops and leadership retreats around the country on a wide variety of MLS-related topics.

Twitter
Facebook

Subscribe

Your E-mail Address:

Links

Disclaimer: The opinions expressed on this blog are the responsibility of the author and do not necessarily reflect the opinion of my employer