A stock analyst (FA-002) needs to know the following every time they start an analysis:
- Current US market trends (VIX, SPX)
- US Dollar strength (DXY)
- Global risk appetite (BTC)
- Several key A-share indices
If the analyst had to manually fetch data from Yahoo Finance, Sina, and AkShare before every analysis, it would be terribly inefficient.
So we built an automated macro data collection system to let analysts just query the database, without worrying about where the data comes from.
Requirements Analysis
Why Real-Time Data
Macro indicators are the barometer of market sentiment.
When VIX (Fear Index) jumps from 15 to 25, what happens to tech stocks? The analytical logic for the same stock can completely change.
That’s why data timeliness is critical. Our targets:
- Collection latency < 3 seconds (from data source to our database)
- Update frequency every 30 minutes (sufficient to cover trading hours)
- Data completeness > 99.5% (at least one data source succeeds)
Markets Covered
The global markets we track:
| Market | Code | Data Source Priority | Target Latency |
|---|---|---|---|
| US Equities | SPX, IXIC, DJI | yfinance | < 3s |
| US Bonds | US10Y | yfinance | < 3s |
| US Dollar | DXY | yfinance | < 3s |
| Dollar Fear | VIX | yfinance | < 3s |
| A-shares | SH, SZ, CYB | AkShare/Sina | < 2s |
| Hong Kong | HSI, HSTECH | AkShare/Sina | < 5s |
| Forex | USDCNY, USDJPY | yfinance | < 3s |
| Crypto | BTC, ETH | yfinance/Binance | < 2s |
| Commodities | GOLD, OIL | yfinance/AkShare | < 3s |
Multi-Source Fallback Design
The biggest challenge isn’t any single data source—it’s that none of them are completely reliable.
AkShare might timeout at certain times. yfinance occasionally returns 400 errors. Binance API rate-limits during peak hours.
Our strategy is: don’t rely on any single source, but use elegant multi-layer fallback.
Five-Layer Fallback for A-Share Data
| |
Global Data Fetching Routing Logic
| |
Caching and Deduplication
Repeatedly hitting data sources is wasteful. So we use 30-second TTL cache:
| |
The beauty of this design:
- During peak times (e.g., 5 minutes after US market open), 10 different analysts might request data simultaneously
- With caching, only the first request actually hits the data source; the other 9 use cache (saving 90% of network requests)
- Cache auto-refreshes on expiry—no manual intervention needed
Retry Mechanism and Exponential Backoff
Network requests occasionally fail. The key is how to retry.
Immediate retries might worsen congestion. So we use exponential backoff:
| |
Randomization (random.uniform(0, 0.5)) is crucial. If all retries fire at the same moment, you get a “thundering herd” problem.
Storage Architecture: Why Supabase
Why not PostgreSQL? Why not DuckDB? Why Supabase?
Three reasons:
1. Real-Time Sync
Supabase has built-in PostgREST, enabling simple HTTP requests for CRUD operations without SQL connections.
| |
The on_conflict clause: if a SPX record exists for today, update it; otherwise insert. This is called an upsert, common in time-series data.
2. Permission Management
Supabase includes Row-Level Security (RLS). We can define:
- FA-002 (analyst) can read
market_quotesbut cannot write - Luna can read and write
- External applications can only read specific columns
| |
3. Sufficient Free Tier
Supabase’s free tier offers 500MB database + 1GB file storage. For macro data, this is more than enough.
Five years of A-share + US stock data fits in under 100MB.
Operations and Monitoring
Cron Task Configuration
On bwg:
| |
Monitoring Rules
| |
Permission Whitelist
Initially, remote nodes needed approval for every Python script execution. Under high concurrency, this times out.
Solution: Add macro data collection to the system whitelist:
| |
Now cron tasks execute directly without approval.
Failure Cases and Lessons Learned
Case 1: Single Data Source Outage
Problem: AkShare went down for 30 minutes on 2026-02-23 at 4 PM.
Impact: A-share indices didn’t update.
Resolution: System automatically fell back to Sina API; data kept flowing. FA-002 didn’t even notice.
Lesson: Multi-source redundancy beats “99.9% SLA” from a single source.
Case 2: Supabase Connection Timeout
Problem: Python process timed out writing to Supabase during peak hours.
Root cause: Remote node Python + Supabase SDK default timeout too short (6 seconds).
Solution:
- Migrated macro collection from remote node to local bwg
- Increased Supabase SDK timeout to 15 seconds
- Added retry logic (see
_retrymethod above)
Lesson: Network timeouts should be conservative (better to wait 15 seconds than fail immediately).
Case 3: Cache-Induced Data Staleness
Problem: During testing, cache TTL was set to 1 hour.
After first collection failure, system returned data from 1 hour ago. FA-002 analyzed based on stale data.
Resolution: Lowered TTL to 30 seconds and added “warning: using N-seconds-old cache” flag when returning cached data.
Lesson: Caching is for resilience, not hiding failures. Always explicitly mark cached data.
Performance Metrics (Real Data)
After 24 days of operation:
- Average collection latency: 2.3 seconds (P99: 3.8 seconds)
- Data completeness: 99.87% (only 3 total failures)
- Supabase write success rate: 99.94%
- Cache hit rate: 73% (high multi-agent concurrent requests)
- System availability: 99.96%
For an “open-source + free tier” solution, that’s solid.
Impact on the Analyst
From FA-002’s perspective, the workflow now looks like:
| |
No need to care where data comes from, how many fallback layers exist, or fear any data source suddenly going offline.
That’s the power of abstraction.
Final thought:
Elegant systems aren’t about “nothing breaks”—they’re about “keep running when things break”.
Multi-source fallback, caching, retries, monitoring—these aren’t failure-prevention measures. They’re failure-recovery measures.
In a distributed, inherently unreliable network, that’s reality.