Programmatic Conversion Funnels w/ Apache Druid

Published on June 09, 2017 by Miguel Morales

In our last blog post, we explained what programmatic advertising is, how to build a basic bidder, and how to analyze bid requests using Apache Druid.

In this blog post we’ll be discussing how to extend our basic bidder to track and analyze wins, impressions, clicks, alongside bid requests.

The goal is to build a reporting dashboard to visualizes the conversion funnels between bid requests->bids->wins->impressions->clicks across publishers, locations, device model, etc.

By visualizing and analyzing conversion funnels, programmatic marketers can optimize infrastructure costs, block non-performing publishers, and spend more on inventory that performs well.

All code mentioned here, along with automated specs and tools to generate sample data, is included in the simple-bidder repo:



Setting Up a Campaign

We first need to extend our basic bidder to actually return a bid alongside a tag. We’ll only support rich media creatives for the purposes of this tutorial. However, these concepts would also apply to VAST or other types of creatives.

For this purpose we need to create three new MVC Models: Seat, Campaign, & Advertisement. An advertisement contains attributes such as its width and height , html/xml tag, domain, etc. We’ll keep it simple for this example.



Making a Bid

We’re going to update our bid_requests controller to add targeting and return a bid if an advertisement matches the incoming bid request.

advertisement = Targeter.call(bid_request: bid_request)

In our case, we’ll keep it simple and simply return the last advertisement in the database: return Advertisement.last.

If there’s a matching advertisement, we’ll run it through a service class that determines how much should be bid. If the bidder determines that a bid needs to be made, the bid alongside the ad tag and bid price is returned to the client.



Analyzing the Bid & Other Events

Now that we have a way to process bid requests and return a bid with a creative we need to setup Druid to track and analyze wins, impressions, clicks, etc.

We’re going to make our system generic by building a schema that’s similar to Rail’s Single Table Inheritance feature, where the row’s type field value determines the class that’s instantiated. Let’s create a general events SQL table:



Notice how this schema is very similar that we created before to analyze bid requests.

We’re going to take our old schema and add a few new metrics. We’re going to track counts grouped by type. This keeps the tracking system generic and extendable. These new javascript metrics are defined by providing an aggregate, combine, & reset function for the metric in Druid.



Storing Events in PostgreSQL

Because events like bids, wins, impressions, clicks, etc are small data and don’t happen nearly as often as bid requests, we can use PostgreSQL to store them. In production, Redshift would work well to handle this small event data. However Redshift would not be able to handle the volume that bid requests generate.

Druid, however, is perfectly capable to analyzing bid requests because it can handle a large number of events. So, a bid request is just another type of event, but they are not stored in PostgreSQL.

Our events table is also generic, as we mentioned before. We use the type field to denote the event’s class.



New Tracking Routes

Now that Druid and PostgreSQL is setup we can go back to the bidder. Going back to the point of being able to return a bid we’re going to store the bid in PostgreSQL as an event and send it to Druid for analysis.

We’re also going to add two new routes to our bidder web server. In production you might want to break out the concepts of tracking and bidding into different micro services. However, for our purposes, using the same project helps visualize how bid requests and wins are related.

One new route is /v1/wins which is part of the url that’s returned in the bid as its nurl. The nurl is the url that’s hit when the client selects our particular bidder as the winner. Here’s an example win url:

http://localhost:9292/v1/wins?bid_id=ABC123&price=1.50

Our simple bidder uses localhost because that’s where we’ll be running all of our tests from. Obviously, this would be different in a production setting.

The first thing the win route does is it looks up the bid from the events table by the bid_id.

It uses all the information from the bid to generate a new win event. The new win event is also stored in PostgreSQL and sent to Druid for analysis.

Our other new route, /v1/events, is a generic route for tracking non-win events. Things like impressions, clicks, etc. fall into this category. We use this generic mechanism to be able to track any kind of event in the future such as installs, video_views, video_25%, etc.

Here’s a sample of an impression event:

http://localhost:9292/v1/events?bid_id=ABC123&type=impression We use two different routes because validation is different for wins and events. For example, to generate a valid win, a winning price must be provided.



Generating Sample Data

Our new simple bidder is ready to go, we’re able to bid and track all events. Druid is listening for data and configured with a new schema. All tables in PostgreSQL have been created so we’re ready to go.

First we need to launch our web server in localhost to listen for http requests. We’ve updated this process from our previous blog post.

If you haven’t already, run the migrations and seed the sample campaign using the following:



Then launch the web server using:

export TRANQUILITY_URI=https://shaman-proxy-staging.zero-x.co/v1/index/593a1a4ad68c54057090b38b/fa7cdcc2a478af82680070f61a3779a1
bundle exec puma



You’ll need to change the TRANQUILITY_URI to point to your tranquility host.

Next, run the tester which generates some sample bid_requests, wins, & impressions.

ruby bin/tester

Now you’ll have data in both Druid and PostgreSQL.



Visualizing the Funnels

As before, we’re going to use Superset to build a dashboard to visualize performance across various dimensions such as location, gender, etc.

Our setup is basically the same as before, except we add new metrics for bid_requests, bid, wins, and impressions. These are very simple and their JSON definition looks like:

{"type": "longSum", "name": "impressions", "fieldName": "impressions"}

We’re going to build some simple dashboards that shows the conversion funnels based on various dimensions.





In Conclusion

We’ve now built a simple system to make bids and efficiently analyze conversion funnels in programmatic advertising.

With these real-time insights teams can now generate more profits from their campaigns by pro-actively blocking non-performing traffic and rewarding placements that generate more conversions.

We can also start to plan out campaigns by visualizing how they would perform based on their targeting criteria. Teams can then make better decisions by analyzing not just the available audience, but historically how well those types of campaigns convert.



Shaman: Cloud Hosted Druid



If you’d like to tap into your bid requests streams in real-time and build powerful dashboards for your team without breaking your wallet then Shaman is for you.

Shaman is a self-serve platform for deploying single node and multi-node clusters to the cloud.

Avoid the pain and time of setting up Druid yourself, stop wasting developers’ time building internal reporting dashboards, and stop paying for solutions that simply cost way too much.

Visit https://zero-x.co to find out more and to create an account.