Writy.
No Result
View All Result
  • Home
  • Business & Finance
    • Global Markets & Economy
    • Entrepreneurship & Startups
    • Investment & Stocks
    • Corporate Strategy
    • Business Growth & Leadership
  • Health & Science
    • Digital Health & Telemedicine
    • Biotechnology & Pharma
    • Wellbeing & Lifestyl
    • Scientific Research & Innovation
  • Marketing & Growth
    • SEO & Digital Marketing
    • Branding & Public Relations
    • Social Media & Content Strategy
    • Advertising & Paid Media
  • Policy & Economy
    • Government Regulations & Policies
    • Economic Development
    • Global Trade & Geopolitics
  • Sustainability & Future Trends
    • Renewable Energy & Green Tech
    • Climate Change & Environmental Policies
    • Sustainable Business Practices
    • Future of Work & Smart Cities
  • Tech & AI
    • Artificial Intelligence & Automation
    • Software Development & Engineering
    • Cybersecurity & Data Privacy
    • Blockchain & Web3
    • Big Data & Cloud Computing
  • Home
  • Business & Finance
    • Global Markets & Economy
    • Entrepreneurship & Startups
    • Investment & Stocks
    • Corporate Strategy
    • Business Growth & Leadership
  • Health & Science
    • Digital Health & Telemedicine
    • Biotechnology & Pharma
    • Wellbeing & Lifestyl
    • Scientific Research & Innovation
  • Marketing & Growth
    • SEO & Digital Marketing
    • Branding & Public Relations
    • Social Media & Content Strategy
    • Advertising & Paid Media
  • Policy & Economy
    • Government Regulations & Policies
    • Economic Development
    • Global Trade & Geopolitics
  • Sustainability & Future Trends
    • Renewable Energy & Green Tech
    • Climate Change & Environmental Policies
    • Sustainable Business Practices
    • Future of Work & Smart Cities
  • Tech & AI
    • Artificial Intelligence & Automation
    • Software Development & Engineering
    • Cybersecurity & Data Privacy
    • Blockchain & Web3
    • Big Data & Cloud Computing
No Result
View All Result
Embedding-Based mostly Retrieval for Airbnb Search | by Huiji Gao | The Airbnb Tech Weblog | Mar, 2025

Embedding-Based mostly Retrieval for Airbnb Search | by Huiji Gao | The Airbnb Tech Weblog | Mar, 2025

Theautonewspaper.com by Theautonewspaper.com
26 March 2025
in Software Development & Engineering
0
Share on FacebookShare on Twitter

You might also like

The Studying Loop and LLMs

The Studying Loop and LLMs

4 November 2025
GraphQL Information Mocking at Scale with LLMs and @generateMock | by Michael Rebello | The Airbnb Tech Weblog | Oct, 2025

GraphQL Information Mocking at Scale with LLMs and @generateMock | by Michael Rebello | The Airbnb Tech Weblog | Oct, 2025

31 October 2025


Huiji Gao

The Airbnb Tech Blog

Our journey in making use of embedding-based retrieval methods to construct an correct and scalable candidate retrieval system for Airbnb Houses search

Authors: Mustafa (Moose) Abdool, Soumyadip Banerjee, Karen Ouyang, Do-Kyum Kim, Moutupsi Paul, Xiaowei Liu, Bin Xu, Tracy Yu, Hui Gao, Yangbo Zhu, Huiji Gao, Liwei He, Sanjeev Katariya

Search performs an important function in serving to Airbnb company discover the proper keep. The objective of Airbnb Search is to floor essentially the most related listings for every consumer’s question — however with thousands and thousands of accessible properties, that’s no simple activity. It’s particularly tough when searches embrace giant geographic areas (like California or France) or high-demand locations (like Paris or London). Latest improvements — similar to versatile date search, which permits company to discover stays with out mounted check-in and check-out dates — have added one more layer of complexity to rating and discovering the appropriate outcomes.

To deal with these challenges, we’d like a system that may retrieve related properties whereas additionally being scalable sufficient (when it comes to latency and compute) to deal with queries with a big candidate depend. On this weblog put up, we share our journey in constructing Airbnb’s first-ever Embedding-Based mostly Retrieval (EBR) search system. The objective of this method is to slender down the preliminary set of eligible properties right into a smaller pool, which might then be scored by extra compute-intensive machine studying fashions later within the search rating course of.

Determine 1: The final phases and scale for the assorted forms of rating fashions utilized in Airbnb Search

We’ll discover three key challenges in constructing this EBR system: (1) developing coaching information, (2) designing the mannequin structure, and (3) growing a web-based serving technique utilizing Approximate Nearest Neighbor (ANN) options.

Step one in constructing our EBR system was coaching a machine studying mannequin to map each properties and de-identified search queries into numerical vectors. To realize this, we constructed a coaching information pipeline (Determine 3) that leveraged contrastive studying — a method that includes figuring out pairs of positive- and negative-labeled properties for a given question. Throughout coaching, the mannequin learns to map a question, a optimistic dwelling, and a adverse dwelling right into a numerical vector, such that the similarity between the question and the optimistic dwelling is way larger than the similarity between the question and the adverse dwelling.

To assemble these pairs, we devised a sampling methodology based mostly on consumer journeys. This was an necessary design resolution, since customers on Airbnb usually bear a multi-stage search journey. Information exhibits that earlier than making a last reserving, customers are likely to carry out a number of searches and take varied actions — similar to clicking into a house’s particulars, studying critiques, or including a house to a wishlist. As such, it was essential to develop a method that captures this whole multi-stage journey and accounts for the varied forms of listings a consumer would possibly discover.

Diving deeper, we first grouped all historic queries of customers who made bookings, utilizing key question parameters similar to location, variety of company, and size of keep — our definition of a “journey.” For every journey, we analyzed all searches carried out by the consumer, with the ultimate booked itemizing because the optimistic label. To assemble (optimistic, adverse) pairs, we paired this booked itemizing with different properties the consumer had seen however not booked. Unfavorable labels had been chosen from properties the consumer encountered in search outcomes, together with these they’d interacted with extra intentfully — similar to by wishlisting — however finally didn’t e book. This alternative of adverse labels was key: Randomly sampling properties made the issue too simple and resulted in poor mannequin efficiency.

Determine 2: Instance of developing (optimistic, adverse) pairs for a given consumer journey. The booked house is at all times handled as a optimistic. Negatives are chosen from properties that appeared within the search outcome (and had been probably interacted with) however that the consumer didn’t find yourself reserving.

Determine 3: Instance of total information pipeline used to assemble coaching information for the EBR mannequin.

The mannequin structure adopted a conventional two-tower community design. One tower (the itemizing tower) processes options concerning the dwelling itemizing itself — similar to historic engagement, facilities, and visitor capability. The opposite tower (the question tower) processes options associated to the search question — such because the geographic search location, variety of company, and size of keep. Collectively, these towers generate the embeddings for dwelling listings and search queries, respectively.

A key design resolution right here was selecting options such that the itemizing tower could possibly be computed offline every day. This enabled us to pre-compute the house embeddings in a day by day batch job, considerably lowering on-line latency, since solely the question tower needed to be evaluated in real-time for incoming search requests.

Determine 4: Two-tower structure as used within the EBR mannequin. Word that the itemizing tower is computed offline day by day for all properties.

The ultimate step in constructing our EBR system was selecting the infrastructure for on-line serving. We explored quite a few approximate nearest neighbor (ANN) options and narrowed them down to 2 essential candidates: inverted file index (IVF) and hierarchical navigable small worlds (HNSW). Whereas HNSW carried out barely higher when it comes to analysis metrics — utilizing recall as our essential analysis metric — we finally discovered that IVF supplied one of the best trade-off between pace and efficiency.

The core purpose for that is the excessive quantity of real-time updates per second for Airbnb dwelling listings, as pricing and availability information is regularly up to date. This brought on the reminiscence footprint of the HNSW index to develop too giant. As well as, most Airbnb searches embrace filters, particularly geographic filters. We discovered that parallel retrieval with HNSW alongside filters resulted in poor latency efficiency.

In distinction, the IVF answer, the place listings are clustered beforehand, solely required storing cluster centroids and cluster assignments inside our search index. At serving time, we merely retrieve listings from the highest clusters by treating the cluster assignments as a normal search filter, making integration with our present search system fairly easy.

Determine 5: Total serving circulation utilizing IVF. Houses are clustered beforehand and, throughout on-line serving, properties are retrieved from the closest clusters to the question embedding.

On this method, our alternative of similarity operate within the EBR mannequin itself ended up having attention-grabbing implications. We explored each dot product and Euclidean distance; whereas each carried out equally from a mannequin perspective, utilizing Euclidean distance produced way more balanced clusters on common. This was a key perception, as the standard of IVF retrieval is extremely delicate to cluster dimension uniformity: If one cluster had too many properties, it might vastly cut back the discriminative energy of our retrieval system.

We hypothesize that this imbalance arises with dot product similarity as a result of it inherently solely considers the course of characteristic vectors whereas ignoring their magnitudes — whereas a lot of our underlying options are based mostly on historic counts, making magnitude an necessary issue.

Determine 6: Instance of the distribution of cluster sizes when utilizing dot product vs. Euclidean distance as a similarity measure. We discovered that Euclidean distance produced way more balanced cluster sizes.

The EBR system described on this put up was absolutely launched in each Search and E-mail Advertising and marketing manufacturing and led to a statistically-significant acquire in total bookings when A/B examined. Notably, the bookings carry from this new retrieval system was on par with a few of the largest machine studying enhancements to our search rating up to now two years.

The important thing enchancment over the baseline was that our EBR system successfully included question context, permitting properties to be ranked extra precisely throughout retrieval. This finally helped us show extra related outcomes to customers, particularly for queries with a excessive variety of eligible outcomes.

We want to particularly thank the complete Search and Information Infrastructure & ML Infrastructure org (led by Yi Li) and Advertising and marketing Expertise org (led by Michael Kinoti) for his or her nice collaborations all through this venture!

Tags: AirbnbBlogEmbeddingBasedGaoHuijiMarRetrievalSearchTech
Theautonewspaper.com

Theautonewspaper.com

Related Stories

The Studying Loop and LLMs

The Studying Loop and LLMs

by Theautonewspaper.com
4 November 2025
0

Software program improvement has all the time resisted the concept it may be was an meeting line. At the same...

GraphQL Information Mocking at Scale with LLMs and @generateMock | by Michael Rebello | The Airbnb Tech Weblog | Oct, 2025

GraphQL Information Mocking at Scale with LLMs and @generateMock | by Michael Rebello | The Airbnb Tech Weblog | Oct, 2025

by Theautonewspaper.com
31 October 2025
0

How Airbnb combines GraphQL infra, product context, and LLMs to generate and keep convincing, type-safe mock information utilizing a brand...

Agentic AI and Safety

Agentic AI and Safety

by Theautonewspaper.com
28 October 2025
0

Agentic AI methods will be superb - they provide radical new methods to construct software program, via orchestration of an...

Understanding Spec-Pushed-Improvement: Kiro, spec-kit, and Tessl

Understanding Spec-Pushed-Improvement: Kiro, spec-kit, and Tessl

by Theautonewspaper.com
15 October 2025
0

I’ve been making an attempt to know one of many newest AI coding buzzword: Spec-driven improvement (SDD). I checked out...

Next Post
Web3 Basis EcoDev — Wave 24 Recipients | by Web3 Basis Workforce | Web3 Basis | Jan, 2025

Web3 Basis EcoDev — Wave 24 Recipients | by Web3 Basis Workforce | Web3 Basis | Jan, 2025

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

The Auto Newspaper

Welcome to The Auto Newspaper, a premier online destination for insightful content and in-depth analysis across a wide range of sectors. Our goal is to provide you with timely, relevant, and expert-driven articles that inform, educate, and inspire action in the ever-evolving world of business, technology, finance, and beyond.

Categories

  • Advertising & Paid Media
  • Artificial Intelligence & Automation
  • Big Data & Cloud Computing
  • Biotechnology & Pharma
  • Blockchain & Web3
  • Branding & Public Relations
  • Business & Finance
  • Business Growth & Leadership
  • Climate Change & Environmental Policies
  • Corporate Strategy
  • Cybersecurity & Data Privacy
  • Digital Health & Telemedicine
  • Economic Development
  • Entrepreneurship & Startups
  • Future of Work & Smart Cities
  • Global Markets & Economy
  • Global Trade & Geopolitics
  • Health & Science
  • Investment & Stocks
  • Marketing & Growth
  • Public Policy & Economy
  • Renewable Energy & Green Tech
  • Scientific Research & Innovation
  • SEO & Digital Marketing
  • Social Media & Content Strategy
  • Software Development & Engineering
  • Sustainability & Future Trends
  • Sustainable Business Practices
  • Technology & AI
  • Wellbeing & Lifestyl

Recent News

Apple and Claris Veteran Nelson Named CIQ CTO

CIQ: Rocky Linux Is Licensed Linux Distribution for NVIDIA AI Stack

10 November 2025
Instructing Youngsters About Privateness – TeachPrivacy

Instructing Youngsters About Privateness – TeachPrivacy

10 November 2025
Handle Excluded Placements

Handle Excluded Placements

10 November 2025
Issues to Do in Chestertown MD: Jap Shore Journey Information

Issues to Do in Chestertown MD: Jap Shore Journey Information

10 November 2025
The best way to Work on Your Longevity — Three Ranges of Private Engagement

The best way to Work on Your Longevity — Three Ranges of Private Engagement

10 November 2025
  • About Us
  • Privacy Policy
  • Disclaimer
  • Contact Us

© 2025 https://www.theautonewspaper.com/- All Rights Reserved

No Result
View All Result
  • Home
  • Business & Finance
    • Global Markets & Economy
    • Entrepreneurship & Startups
    • Investment & Stocks
    • Corporate Strategy
    • Business Growth & Leadership
  • Health & Science
    • Digital Health & Telemedicine
    • Biotechnology & Pharma
    • Wellbeing & Lifestyl
    • Scientific Research & Innovation
  • Marketing & Growth
    • SEO & Digital Marketing
    • Branding & Public Relations
    • Social Media & Content Strategy
    • Advertising & Paid Media
  • Policy & Economy
    • Government Regulations & Policies
    • Economic Development
    • Global Trade & Geopolitics
  • Sustainability & Future Trends
    • Renewable Energy & Green Tech
    • Climate Change & Environmental Policies
    • Sustainable Business Practices
    • Future of Work & Smart Cities
  • Tech & AI
    • Artificial Intelligence & Automation
    • Software Development & Engineering
    • Cybersecurity & Data Privacy
    • Blockchain & Web3
    • Big Data & Cloud Computing

© 2025 https://www.theautonewspaper.com/- All Rights Reserved