Blogtagg: Search Engine for Singapore Blogshops
Overview
Cloud-based search engine developed in 2011 for Singapore’s blogshop ecosystem. Aggregated RSS feeds from hundreds of independent fashion blogshops and provided centralized tag-based search, solving the problem of fragmented product discovery across dozens of individual blogs.
Tagline: “Stop searching, start finding.”
Architecture
RSS Aggregation Engine
- Automated crawler for blogshop RSS feeds
- Scheduled polling (hourly/daily based on update frequency)
- Feed parsing using SyndicationFeed API
- Duplicate detection and content normalization
Tagging & Indexing System
- Natural language processing for product title parsing
- Automatic tag extraction (category, color, size, brand)
- Manual tag correction interface for blogshop owners
- Full-text search index for fast queries
Web Application
- ASP.NET Web Forms hosted on Azure WebRole
- Responsive layout for mobile/tablet (2011-era responsive design)
- Facebook integration for social discovery
- Google Analytics for user behavior tracking
Cloud Infrastructure
- Windows Azure Cloud Services (WebRole)
- Azure SQL Database for product/tag storage
- Azure Storage for cached images
- CDN for static asset delivery
Technical Implementation
RSS Feed Handling: Used System.ServiceModel.Syndication.SyndicationFeed class to parse Atom and RSS 2.0 feeds. Implemented retry logic for unreliable blogshop servers (many hosted on shared hosting with frequent downtime).
Tag Extraction Algorithm: Simple keyword matching against predefined fashion taxonomy (tops, bottoms, dresses, accessories, etc.). Extracted color names using regex matching for common fashion color terms. Size detection looked for S/M/L/XL patterns in post titles.
Search Performance: Azure SQL Database full-text search with CONTAINS queries. Product table had ~50,000 items at peak. Query response time < 500ms for most searches. Implemented result caching for common queries (top 100 searches).
Azure Deployment: Used Cloud Service WebRole configuration. ServiceDefinition.csdef defined single WebRole instance. Scaled to 2-3 instances during peak traffic (weekends, payday periods). Auto-scaling not available in 2011; manual scaling via Azure Management Portal.
Technical Challenges
Feed Reliability: Blogshop RSS feeds were inconsistent - some used Blogger’s format, others WordPress, some custom. Implemented defensive parsing with fallback extraction from raw HTML when RSS parsing failed.
Image Hosting: Blogshops often hotlink-protected their images. Implemented image proxy that downloaded and re-hosted images on Azure Blob Storage to ensure consistent availability and faster loading.
Duplicate Detection: Same product often posted on multiple platforms (blog, Facebook, forum). Used perceptual image hashing and title similarity (Levenshtein distance) to detect duplicates.
SEO Competition: Competing with established e-commerce platforms for “Singapore blogshop” keywords. Implemented on-page SEO (canonical URLs, meta descriptions, schema.org markup) and built sitemap for Google indexing.
Results
Successfully launched in 2011 with 100+ registered blogshops. Indexed ~50,000 fashion products at peak. Google Analytics showed 5,000-10,000 unique visitors per month during active operation. Facebook page gained 2,000+ followers.
Service operated for 2-3 years before being discontinued due to shift in Singapore fashion e-commerce landscape - dedicated platforms like Carousell emerged and blogshops migrated to Instagram for product discovery.
Tech Stack
- Platform: Windows Azure Cloud Services
- Backend: ASP.NET Web Forms, C#, .NET Framework 4.0
- Database: Azure SQL Database
- Storage: Azure Blob Storage
- APIs: System.ServiceModel.Syndication (RSS), Google Analytics, Facebook Graph API
- Frontend: HTML, CSS, JavaScript, jQuery
Source Code
Code will be available on GitHub at: https://github.com/tanchunsiong/blogtagg
Project Created: 2011
Connect:
- Blog: www.tanchunsiong.com
- LinkedIn: linkedin.com/in/tanchunsiong
- X: x.com/tanchunsiong