Case Study: Reducing Bot Traffic with a large consortial implementation.

Institution

A statewide collaboration led by a large R1 institution to preserve and make accessible the digital assets of cultural heritage institutions, using DG Starter/Islandora.

Challenge

As a long-standing Islandora user, this R1 institution hosts millions of digital objects and serves a diverse community of researchers, students, and the public. However, their on-premise Islandora installation had recently seen an explosion in bot traffic, much of it from malicious or non-human sources. This spike caused:

  • Performance slowdowns
  • Inflated infrastructure usage
  • False analytics signals
  • Increased strain on Drupal, Postgres, and Solr services.

Discovery Garden previously used a bot blocker (Created and maintained by Mitchell Krog) to filter out unwanted traffic. To bring that same protection to modern environments, we built a new plugin for Traefik that adds support for CIDR-based blocking and custom blocklists, alongside the original blacklists. This enhanced approach helps defend against increasingly sophisticated bots that rely on distributed proxy networks.

In short, Discovery Garden built a new tool to better block unwanted web traffic, including sneaky bots that try to hide where they’re coming from.

Solution: Captcha Project

As part of Discovery Garden’s ongoing support for this consortium, we proposed and implemented the Captcha Project (Created and maintained by Joe Corall) to address the growing bot traffic - especially for non-logged-in or anonymous traffic trying to hit the Drupal application layer.

Key Goals

  • Introduce a challenge mechanism to filter out low-effort and automated requests
  • Reduce the number of suspicious hits reaching the web application
  • Layer this solution behind the existing bot blocker, forming a multi-tier defence

Implementation

The Captcha Project was deployed with default settings in the Kubernetes environment. The bad bot blocker was applied at the HTTPS entry point in Traefik, while the Captcha Project was configured at the ingress level. This setup required all incoming traffic to pass through both protection layers before reaching Islandora. 

The assessment was conducted using a pre/post test approach over two 11-day periods.

Results

Pre Post test results

The Captcha Project implementation resulted in a 50% reduction in traffic reaching the Islandora site. While this was less than the reduction achieved with a similar test using a cloud-based Web Application Firewall (WAF), it demonstrated the value of layered protection for on-premise deployments. Institutions running on-premise systems can also consider WAF appliances with managed rules from vendors, which can offer effective mitigation without relying on cloud-based services. 

Complementary Insights

  • Bad Bot Blocker (running before Captcha Project) still blocked over 6.26 million requests in 30 days (~208k/day), especially during large-scale bursts from known malicious sources.
  • Bad Bot Blocker worked well for basic, low-effort bots but was less effective against bots leveraging home proxy IPs or headless browsers.

Takeaways

  • Captcha Project is a low-cost, effective first step for on-prem environments that can’t implement a full WAF
  • Bot defence works best in layers: combining known agent blocking, CAPTCHA challenges, and managed WAF solutions when possible
  • Institutions with high digital traffic should proactively invest in anti-bot strategies to protect performance and reduce infrastructure overhead

Ready to see what Islandora can do for your organization?

Whether you’re managing a single repository or supporting a multi-institution consortium, Islandora offers the flexibility, scalability, and support you need. Contact Discovery Garden to schedule a demo, start a project discovery session, or learn more about how we can help you build a future-proof digital repository.