Real estate aggregators (like Zillow, Redfin, or local MLS platforms) are notoriously difficult to scrape.
They rely on heavy map integrations, anti-bot mechanisms, and highly obfuscated DOM structures (e.g.
class="Text-c11n-8-100-2__sc-aipi13-0").
Traditional CSS selector tools usually fail within a week. Extracto leverages AI vision models and Playwright to extract property details purely by understanding the page visually.
Provide a single schema definition, and Extracto uses Mistral or GPT-4o to dynamically pull the data mapping from any property listing website:
import asyncio
from extracto.config import CrawlerConfig
from extracto.crawler_engine import CrawlerEngine
from extracto.schema import PropertyDefinition, DataType
schema = [
PropertyDefinition(name="address", type=DataType.STRING, description="Full street address"),
PropertyDefinition(name="price", type=DataType.NUMBER, description="Listed property price"),
PropertyDefinition(name="bedrooms", type=DataType.INTEGER, description="Number of bedrooms"),
PropertyDefinition(name="bathrooms", type=DataType.NUMBER, description="Number of bathrooms"),
PropertyDefinition(name="sqft", type=DataType.INTEGER, description="Square footage of the home"),
PropertyDefinition(name="agent_phone", type=DataType.STRING, description="The listed agent's contact number")
]
config = CrawlerConfig(
start_urls=["https://www.realestate-example.com/listing/1"],
properties=schema,
llm_provider="openai"
)
# Run extracto
asyncio.run(CrawlerEngine(config).run())
Real Estate platforms employ strict rate limits. Extracto integrates crawlee's premier evasion
utilities under the hood, natively injecting Stealth behaviors, managing proxy rotation, and manipulating
browser fingerprints to assure your crawler runs undetected.