Infrastructure logic should not depend on search syntax.
We found a flaw in the Gmail skill.
The assistant was asked:
“Were any emails received in the last 25 hours?”
It said no.
That answer was wrong.
An email had arrived around 10:50 PM central time. It fell within the 25-hour window. It was missed.
This was not a memory error. It was a notification failure.
The system failed to correctly determine whether new email had arrived.
The original implementation did this:
Compute a cutoff time.
Convert it to Unix epoch seconds.
Use Gmail search syntax:
after:<epoch>
That felt reasonable.
Unix time is precise. Gmail supports after:. The math is simple.
But we made an assumption: that Gmail search behaves like a strict numeric comparison engine.
It does not.
Gmail search is designed for human queries.
The after: operator primarily expects date-based input (YYYY/MM/DD). It will accept numbers, but it does not promise strict epoch semantics the way a database comparison would.
We had been using it as if it behaved like a precise boundary check. The more we examined the behavior, the more it felt like we were leaning on something built for convenience rather than correctness.
Gmail also evaluates search in UTC, while our cutoff was initially computed locally. That added another layer of uncertainty.
The email was in the inbox. Our logic simply did not surface it.
This process is responsible for:
If the time-window logic is unreliable, the system can:
Time filtering is infrastructure. It must be exact.
We removed Gmail search operators from time-window logic entirely.
The new process:
internalDate.internalDate is milliseconds since epoch in UTC.No search syntax. No interpretation layer. No label dependence. No ambiguity.
The system now compares numbers directly.
The Gmail skill now encodes:
internalDate.This rule is committed to version control.
It is structural.
There is a larger lesson.
We corrected this flaw manually.
We amended doctrine manually.
We committed the update manually.
That suggests a missing layer.
The system likely needs a “skill creation” skill:
A mechanism that:
Bug → Root cause → Rule → Skill update → Commit.
Without that layer, improvements depend on close supervision.
With it, the system can institutionalize its own lessons.
The issue was not time zones.
It was abstraction.
Infrastructure logic should not rely on convenience layers when correctness matters.