Introduction

Scraping business listings from Yelp can provide useful data about local businesses, their reviews, price ranges, locations, and more. This information can power business intelligence tools, market analysis, lead generation, and other applications.

In this comprehensive guide, we'll walk through a full Objective-C scraper to extract key details on Chinese restaurant listings in San Francisco from the Yelp website.

This is the page we are talking about

Here's the exact data we'll pull from each listing:

Business Name

Rating

Number of Reviews

Price Range

Location

We'll use the proxies API from ProxiesAPI to bypass Yelp's anti-scraper protections. As we'll see, premium proxies that rotate IP addresses are essential for scraping sites like Yelp without quickly getting blocked.

Install Dependencies

Let's quickly cover installing the dependencies we'll need:

TFHpple

This Objective-C library parses HTML/XML documents and allows XPath queries to extract data.

pod 'TFHpple'

The scraper also relies on Foundation and other standard Objective-C libraries.

With the imports and dependencies handled, let's get to the data extraction!

Encode the Target URL

We first construct the target URL pointing to Yelp listings in San Francisco:

NSString *urlString = @"<https://www.yelp.com/search?find_desc=chinese&find_loc=San+Francisco%2C+CA>";

Next we URL-encode this string to handle any special characters:

NSString *encodedURLString = [urlString stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLQueryAllowedCharacterSet]];

This encoded URL will be embedded in the request to ProxiesAPI.

Use Premium Proxies

To avoid immediately getting blocked by Yelp's bot detection, we'll use the premium proxy API from ProxiesAPI:

NSString *apiURLString = [NSString stringWithFormat:@"<http://api.proxiesapi.com/?premium=true&auth_key=YOUR_AUTH_KEY&url=%@>", encodedURLString];

Key things to note:

Authenticate with your own auth_key

The premium=true parameter gives us access to IP-rotating residential proxies that mimic real users

Our target Yelp URL is appended to the end

So each request will go through a different proxy IP, fooling Yelp into thinking it's organic user traffic. Sneaky! 😉

Set HTTP Headers

We next construct a dictionary of request headers that mimic a real Chrome browser:

NSDictionary *headers = @{
  @"User-Agent": @"Mozilla/5.0...",
  @"Accept-Language": @"en-US,en;q=0.5",
  @"Accept-Encoding": @"gzip, deflate, br",
  @"Referer": @"<https://www.google.com/>"
};

And convert the headers into the required NSURLRequestHTTPHeaderField array format:

NSMutableArray *headerFields = [NSMutableArray array];
[headers enumerateKeysAndObjectsUsingBlock:^(NSString *key, NSString *value, BOOL *stop) {
  [headerFields addObject:[NSURLRequest requestHTTPHeaderFieldWithName:key value:value]];
}];

Mimicking a real browser via headers decreases the chances of getting flagged as a bot.

Construct NSURLRequest

We assemble all the pieces into an NSMutableURLRequest object:

NSURLComponents *components = [NSURLComponents componentsWithString:apiURLString];

NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:components.URL];
request.allHTTPHeaderFields = [NSDictionary dictionaryWithObjects:headerFields
                                   forKeys:[headerFields valueForKey:@"name"]];
request.HTTPMethod = @"GET";

This request points to the ProxiesAPI URL, includes our mimic-browser headers, and performs a GET.

Make the HTTP Request

With our request prepped, we kick it off:

NSURLSession *session = [NSURLSession sharedSession];
NSURLSessionDataTask *task = [session dataTaskWithRequest:request
                                          completionHandler:...];

[task resume];

The code handles the async response in the completion block:

Parsing response data

Checking status code

Extracting HTML

Passing HTML to TFHpple parser

Now the fun begins - using XPath to extract fields!

Extract Business Listings

With the HTML loaded into a TFHpple parser object, we can query elements using XPath syntax.

Inspecting the page

When we inspect the page we can see that the div has classes called arrange-unit__09f24__rqHTg arrange-unit-fill__09f24__CUubG css-1qn0b6x

First we grab all the listings containers:

NSArray *listings = [parser searchWithXPathQuery:@"//div[contains(@class,'arrange-unit__09f24__rqHTg')]"];

Key things to note:

Double slashes // says find this element anywhere in document

contains(@class, 'arrange-unit') matches the CSS class

[ ... ] returns all matching elements in an NSArray

Then we loop through each listing:

for (TFHppleElement *listing in listings) {

  // Extract data for this listing

}

Inside the loop, we use very specific XPath queries to extract each data field!

Extract Business Name

For business name, we grab the h4 tag inside class css-19v1rkv:

TFHppleElement *businessNameElement = [listing firstChildWithClassName:@"css-19v1rkv"];
NSString *businessName = [businessNameElement text];

This neatly returns just the business name string!

Extract Rating, Reviews, Price, Location

The other fields require more nuanced XPath queries:

// Rating
TFHppleElement *ratingElement = [listing firstChildWithClassName:@"css-gutk1c"];

// Number of Reviews
NSArray *spanElements = [listing searchWithXPathQuery:@"//span[contains(@class,'css-chan6m')]"];

// Price Range
TFHppleElement *priceRangeElement = [listing firstChildWithClassName:@"priceRange__09f24__mmOuH"];

// Location
NSString *location = @"N/A";

if ([spanElements count] >= 2) {
  location = [[spanElements[1] text] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}

We have to handle cases where fields are missing or contain unpredictable whitespace in the HTML.

But ultimately we extract and print all the pieces we need:

NSLog(@"Business Name: %@", businessName);
NSLog(@"Rating: %@", rating);
// etc...

The full code handles edge cases and surfaces everything in an easy-to-process structure.

Key Takeaways

Scraping Yelp listings relies heavily on:

Rotating Proxies - Avoid bot blocking by mimicking organic traffic

Custom Headers - Masquerade requests as a real browser

XPath Selectors - Carefully target DOM elements to extract fields

With these key ingredients, you can build robust Yelp scrapers in Objective-C and other languages.

Next Steps

To expand on this project:

Build a pipeline to store data in databases

Expand to scrape other business info from Yelp

Containerize the scraper for server deployment

Hopefully this gives you a firm handle on tackling third-party sites like Yelp. Happy scraping!

Full Objective-C Code

Here again is the full scraper code:

#import <Foundation/Foundation.h>
#import "TFHpple.h"

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        NSString *urlString = @"https://www.yelp.com/search?find_desc=chinese&find_loc=San+Francisco%2C+CA";
        
        // URL-encode the URL
        NSString *encodedURLString = [urlString stringByAddingPercentEncodingWithAllowedCharacters:[NSCharacterSet URLQueryAllowedCharacterSet]];
        
        // API URL with the encoded URL
        NSString *apiURLString = [NSString stringWithFormat:@"http://api.proxiesapi.com/?premium=true&auth_key=YOUR_AUTH_KEY&url=%@", encodedURLString];
        
        // Define user-agent header and other headers
        NSDictionary *headers = @{
            @"User-Agent": @"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
            @"Accept-Language": @"en-US,en;q=0.5",
            @"Accept-Encoding": @"gzip, deflate, br",
            @"Referer": @"https://www.google.com/"
        };
        
        // Convert headers to an array of NSURLRequestHTTPHeaderField objects
        NSMutableArray *headerFields = [NSMutableArray array];
        [headers enumerateKeysAndObjectsUsingBlock:^(NSString *key, NSString *value, BOOL *stop) {
            [headerFields addObject:[NSURLRequest requestHTTPHeaderFieldWithName:key value:value]];
        }];
        
        // Create an NSURLComponents object to build the URL
        NSURLComponents *components = [NSURLComponents componentsWithString:apiURLString];
        
        // Create an NSURLRequest object with the URL and headers
        NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:components.URL];
        request.allHTTPHeaderFields = [NSDictionary dictionaryWithObjects:headerFields forKeys:[headerFields valueForKey:@"name"]];
        request.HTTPMethod = @"GET";
        
        // Send an HTTP GET request
        NSURLSession *session = [NSURLSession sharedSession];
        NSURLSessionDataTask *task = [session dataTaskWithRequest:request completionHandler:^(NSData *data, NSURLResponse *response, NSError *error) {
            if (error) {
                NSLog(@"Failed to retrieve data. Error: %@", error.localizedDescription);
            } else {
                NSHTTPURLResponse *httpResponse = (NSHTTPURLResponse *)response;
                if (httpResponse.statusCode == 200) {
                    NSString *htmlString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
                    
                    // Save the HTML to a file (optional)
                    [htmlString writeToFile:@"yelp_html.html" atomically:YES encoding:NSUTF8StringEncoding error:nil];
                    
                    // Parse the HTML content using TFHpple
                    TFHpple *parser = [TFHpple hppleWithHTMLData:data];
                    
                    // Find all the listings
                    NSArray *listings = [parser searchWithXPathQuery:@"//div[contains(@class,'arrange-unit__09f24__rqHTg') and contains(@class,'arrange-unit-fill__09f24__CUubG') and contains(@class,'css-1qn0b6x')]"];
                    
                    NSLog(@"Number of Listings: %ld", (long)[listings count]);
                    
                    // Loop through each listing and extract information
                    for (TFHppleElement *listing in listings) {
                        // Extract information here
                        
                        // Extract business name
                        TFHppleElement *businessNameElement = [listing firstChildWithClassName:@"css-19v1rkv"];
                        NSString *businessName = [businessNameElement text];
                        
                        // Extract rating
                        TFHppleElement *ratingElement = [listing firstChildWithClassName:@"css-gutk1c"];
                        NSString *rating = [ratingElement text];
                        
                        // Extract price range
                        TFHppleElement *priceRangeElement = [listing firstChildWithClassName:@"priceRange__09f24__mmOuH"];
                        NSString *priceRange = [priceRangeElement text];
                        
                        // Extract number of reviews and location
                        NSArray *spanElements = [listing searchWithXPathQuery:@"//span[contains(@class,'css-chan6m')]"];
                        NSString *numReviews = @"N/A";
                        NSString *location = @"N/A";
                        
                        if ([spanElements count] >= 2) {
                            numReviews = [[spanElements[0] text] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
                            location = [[spanElements[1] text] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
                        } else if ([spanElements count] == 1) {
                            NSString *text = [[spanElements[0] text] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
                            if ([text integerValue] > 0) {
                                numReviews = text;
                            } else {
                                location = text;
                            }
                        }
                        
                        // Print the extracted information
                        NSLog(@"Business Name: %@", businessName);
                        NSLog(@"Rating: %@", rating);
                        NSLog(@"Number of Reviews: %@", numReviews);
                        NSLog(@"Price Range: %@", priceRange);
                        NSLog(@"Location: %@", location);
                        NSLog(@"===========================");
                    }
                } else {
                    NSLog(@"Failed to retrieve data. Status Code: %ld", (long)httpResponse.statusCode);
                }
            }
        }];
        
        [task resume];
        
        [[NSRunLoop currentRunLoop] run];
    }
    return 0;
}

The code runs as-is - just insert your own ProxiesAPI auth key and try it out! Let me know if any part needs more explanation.

Scraping Business Listings from Yelp with Objective C

Introduction

Install Dependencies

Encode the Target URL

Use Premium Proxies

Set HTTP Headers

Construct NSURLRequest

Make the HTTP Request

Extract Business Listings

Extract Business Name

Extract Rating, Reviews, Price, Location

Key Takeaways

Next Steps

Full Objective-C Code

Browse by tags:

Browse by language:

The easiest way to do Web Scraping

Scraping Business Listings from Yelp with Objective C

Introduction

Install Dependencies

Encode the Target URL

Use Premium Proxies

Set HTTP Headers

Construct NSURLRequest

Make the HTTP Request

Extract Business Listings

Extract Business Name

Extract Rating, Reviews, Price, Location

Key Takeaways

Next Steps

Full Objective-C Code

The easiest way to do Web Scraping

Don't leave just yet!