Reason 1,000,001 why OpenAI sucks
This is one of those ‘I cannot believe it!’ moments:
I have been debugging for days why Sentry would use up all our transaction spans. I had reduced our traces_sampler rate from 0.1 to 0.01 to 0.00001, and seen no difference in the consumption of spans, so this told me that Sentry was ignoring what we gave them.
Strange! Well, I finally opened a ticket, and Sentry (kudos to their support team! They gave a human, helpful answer within 2 hours!) provided a good pointer:
I checked one of your transactions and noticed that it has the following header in the incoming request:
Traceparent 00-[redacted]-01The header tells the transaction should be sampled and takes preference over the sample rate you set in your SDK configuration to keep the trace consistent. Are you using tracing with other libraries/OTEL?
Huh, strange. As far as I know we aren’t using any other tracing library. Where could this come from?
To be sure, I checked our CloudFlare configuration, and found nothing, so I decided to log incoming request headers from nginx. Honestly, at this stage I was absolutely convinced Sentry had sent me on a wild goose chase.
I watched my logs in real time for a few minutes and saw no traceparent headers come in.
Whenever I hit a point where I don’t know what to do, I make a coffee (or another drink), because I’ve long learned that I get my best revelations when I’m away from the screen/problem. So off I go for my coffee.
This time, the coffee machine offered no enlightenment. But when I came back a few minutes later - log tail still running - I saw it: traceparent headers lighting up the screen.
Here’s one (IDs redacted, just in case):
GET /
Host: example.com
Connection: close
cf-ray: [redacted]
cdn-loop: cloudflare; loops=1
tracestate: dd=p:[redacted];s:2;t.dm:-4;t.tid:[redacted]
x-forwarded-for: 68.221.75.25
accept-encoding: gzip, br
x-request-id: [redacted]
x-forwarded-proto: https
x-openai-internal-caller: browse
x-openai-product-sku: unknown
cf-visitor: {"scheme":"https"}
cf-ipcountry: ES
x-openai-originator-env: prod
cf-connecting-ip: 68.221.75.25
x-openai-originator: browse
x-openai-traffic-source: user
x-envoy-expected-rq-timeout-ms: 15000
x-datadog-origin: rum
user-agent: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot
accept-language: en-US,en;q=0.9
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
x-datadog-trace-id: [redacted]
x-datadog-parent-id: [redacted]
x-datadog-sampling-priority: 2
x-datadog-tags: _dd.p.tid=[redacted],_dd.p.dm=-4
traceparent: 00-[redacted]-01
And who’s the culprit? Of course it’s OpenAI. Again. Because it’s always AI 😡
It appears as though OpenAI’s crawler is crawling the web with tracing headers set. Wasting even more resources than they already do by instructing all the tracing libraries out their to trace all of OpenAI’s requests…