Overview

Cloud-based search engine developed in 2011 for Singapore’s blogshop ecosystem. Aggregated RSS feeds from hundreds of independent fashion blogshops and provided centralized tag-based search, solving the problem of fragmented product discovery across dozens of individual blogs.

Tagline: “Stop searching, start finding.”

Architecture

RSS Aggregation Engine

  • Automated crawler for blogshop RSS feeds
  • Scheduled polling (hourly/daily based on update frequency)
  • Feed parsing using SyndicationFeed API
  • Duplicate detection and content normalization

Tagging & Indexing System

  • Natural language processing for product title parsing
  • Automatic tag extraction (category, color, size, brand)
  • Manual tag correction interface for blogshop owners
  • Full-text search index for fast queries

Web Application

  • ASP.NET Web Forms hosted on Azure WebRole
  • Responsive layout for mobile/tablet (2011-era responsive design)
  • Facebook integration for social discovery
  • Google Analytics for user behavior tracking

Cloud Infrastructure

  • Windows Azure Cloud Services (WebRole)
  • Azure SQL Database for product/tag storage
  • Azure Storage for cached images
  • CDN for static asset delivery

Technical Implementation

RSS Feed Handling: Used System.ServiceModel.Syndication.SyndicationFeed class to parse Atom and RSS 2.0 feeds. Implemented retry logic for unreliable blogshop servers (many hosted on shared hosting with frequent downtime).

Tag Extraction Algorithm: Simple keyword matching against predefined fashion taxonomy (tops, bottoms, dresses, accessories, etc.). Extracted color names using regex matching for common fashion color terms. Size detection looked for S/M/L/XL patterns in post titles.

Search Performance: Azure SQL Database full-text search with CONTAINS queries. Product table had ~50,000 items at peak. Query response time < 500ms for most searches. Implemented result caching for common queries (top 100 searches).

Azure Deployment: Used Cloud Service WebRole configuration. ServiceDefinition.csdef defined single WebRole instance. Scaled to 2-3 instances during peak traffic (weekends, payday periods). Auto-scaling not available in 2011; manual scaling via Azure Management Portal.

Technical Challenges

Feed Reliability: Blogshop RSS feeds were inconsistent - some used Blogger’s format, others WordPress, some custom. Implemented defensive parsing with fallback extraction from raw HTML when RSS parsing failed.

Image Hosting: Blogshops often hotlink-protected their images. Implemented image proxy that downloaded and re-hosted images on Azure Blob Storage to ensure consistent availability and faster loading.

Duplicate Detection: Same product often posted on multiple platforms (blog, Facebook, forum). Used perceptual image hashing and title similarity (Levenshtein distance) to detect duplicates.

SEO Competition: Competing with established e-commerce platforms for “Singapore blogshop” keywords. Implemented on-page SEO (canonical URLs, meta descriptions, schema.org markup) and built sitemap for Google indexing.

Results

Successfully launched in 2011 with 100+ registered blogshops. Indexed ~50,000 fashion products at peak. Google Analytics showed 5,000-10,000 unique visitors per month during active operation. Facebook page gained 2,000+ followers.

Service operated for 2-3 years before being discontinued due to shift in Singapore fashion e-commerce landscape - dedicated platforms like Carousell emerged and blogshops migrated to Instagram for product discovery.

Tech Stack

  • Platform: Windows Azure Cloud Services
  • Backend: ASP.NET Web Forms, C#, .NET Framework 4.0
  • Database: Azure SQL Database
  • Storage: Azure Blob Storage
  • APIs: System.ServiceModel.Syndication (RSS), Google Analytics, Facebook Graph API
  • Frontend: HTML, CSS, JavaScript, jQuery

Source Code

Code will be available on GitHub at: https://github.com/tanchunsiong/blogtagg

Project Created: 2011


Connect: