This page covers my role, how I got here, and the programs I have owned. Each tab tells the story of a specific initiative.
$31M+
Risk-adj. revenue (12mo)
What I do
As Technical Product Manager for Credit Cards at PNC, I own the technology that determines which customers get credit line increases and when. The stack includes the data pipelines, orchestration layers, decision engine, vendor integrations, and governance controls that keep the program both effective and compliant. At roughly 1.5 billion decisions per year, there is no room for downtime or inaccuracy.
What makes the role unusual is the range it demands. On any given week I might be reviewing model architecture with engineers in the morning and presenting a business case to a credit risk committee in the afternoon. I came up through data science rather than an MBA program, which means I can engage with the technical work at a level most TPMs cannot.
Career Arc
May 2019 – May 2020
Data Scientist · ConnectOne Bank
A summer internship that turned into a part-time role during undergrad. My first exposure to production financial data and model deployment in a regulated environment. I worked on analyzing the performance of SBA loans during the COVID-19 epidemic.
May 2020 – May 2022
Data Scientist · Global X ETFs
Another summer internship that became a part-time role. I built and maintained quantitative models for ETF analysis, and worked with the sales team to develop an unsupervised machine learning client recommender to identify prime candidates for new product marketing.
May 2022 – January 2023
Research Data Scientist · MIT Lincoln Laboratory
Built an end-to-end NLP intelligence platform for a national security research program. The system processed 20M+ posts, cut analyst review time by 60%, and gave analysts a way to understand both what users were discussing and who was amplifying whom. The tool became a standing capability after I left.
May 2023 – January 2024
Data Scientist · PNC Bank
Joined the credit cards decisioning team. Built monitoring infrastructure that cut manual validation by 80%, and started bridging gaps between data science and program delivery.
January 2024 – Present
Technical Product Manager · PNC Bank, Consumer & Business Credit Cards
Promoted after eight months when previous TPM left the company. Took ownership of the full decisioning stack across 25+ engineers. Over the next twelve months: modernized the infrastructure from batch to real-time, shipped an AI income model that unlocked 215K customers, and created data integrity guard rails ensuring decisioning accuracy..
The gap I found
While digging through campaign performance data in early 2024, I noticed that the single most common reason customers were being excluded from proactive credit line increases had nothing to do with credit risk. It was missing income. These customers had clean profiles, low utilization, and years of on-time payments. The only problem was that the engine couldn't verify what they earned, so it had no choice but to leave them out. That felt like a solvable problem.
The incremental revenue opportunity was large enough to justify a dedicated program. This became the AI Income Encoder.
Getting it approved
Pitching a novel AI model inside a regulated lending environment requires more than a good idea. I wrote the business case from the ground up, established the risk guardrails, and personally walked the proposal through credit risk review, model governance, and legal. The argument I kept returning to was straightforward: a well-calibrated income estimate, even an imperfect one, is categorically better than no estimate at all. We had a segment of creditworthy customers being excluded not because they posed a risk but because of a data gap we had the tools to close. One compromise we agreed on was to not use the income approximation for adverse actions such as line decreases or closures — it felt unfair to use a potentially imperfect estimate to give a customer a negative experience, even if accurate 99% of the time.
How the model works
The Income Encoder is a fine-tuned BERT model designed to extract a clear signal from the often messy and unstructured world of transaction data. While traditional systems rely on rigid keyword matching for payroll labels, this approach uses a transformer architecture to understand the semantic patterns of stable earnings. By treating transaction histories as sequences, the model captures the temporal regularity of deposits, distinguishing recurring income from one-time credits even when descriptions are heavily abbreviated or inconsistent.
The central technical challenge was ensuring the model generalized well to the unverified segment we actually cared about. To manage the bias-variance tradeoff, I implemented a validation strategy centered on macroeconomic stress testing and cohort-based cross-validation. Rather than a single aggregate accuracy score, we evaluated performance using log-loss and within-tolerance accuracy across specific income bands — because a model that fluctuates in reliability at lower income tiers could easily trigger the kind of disparate impact that leads to Fair Lending scrutiny.
In production, the model achieved a mean absolute percentage error of 12% on a held-out verification set. We established a confidence threshold of 0.72 — predictions exceeding this value fed directly into the automated decision engine, while outputs below it defaulted to manual review or existing exclusion logic.
I led this initiative from the initial architecture through to the API integration and the Fair Lending monitoring runbook, ensuring testing protocols for protected classes were in place before the first automated decision was ever made. One insight post-launch: because the model analyzes direct deposits, it estimates post-tax liquidity rather than gross income. Reconciling this delta with customer self-reported figures has since become a key refinement focus.
The problem with batch
When I joined the team, our credit decisioning infrastructure relied entirely on overnight batch processing. Depending on when upstream tables refreshed, we were often making critical credit decisions based on data that was up to two weeks old. This wasn't just a technical bottleneck — it was a fundamental barrier to the product experience. When a customer requests a credit line increase in-app, they expect an immediate answer, not a letter in the mail three days later. The next-generation FICO models we wanted to adopt lost their predictive power when fed stale data, and our manual, state-level disaster exclusion rules created significant operational risk during rapid weather events.
The mandate was clear: migrate four deeply interconnected systems from batch to real-time processing while maintaining zero compliance exposure and ensuring we never had to perform a cold cutover.
How I ran it
I managed this as a four-track parallel program involving over 25 engineers. Given the distinct technical complexities and regulatory requirements of each workstream, I avoided treating it as a monolithic project. I established weekly architecture reviews and a shared dependency map to prevent teams from blocking one another. A non-negotiable requirement for every migration was a shadow mode phase — we ran the legacy batch pipeline and the new real-time system in parallel, automatically comparing their outputs. We only flipped the switch once both systems produced identical results across a full campaign cycle.
The most complex hurdle wasn't the infrastructure itself but ensuring audit trail continuity. While batch systems naturally log decisions at the end of a cycle, real-time systems require atomic, event-based logging robust enough for regulatory reconstruction without adding perceptible latency to the customer experience. I designed the event schema with hard guarantees: every decision was written to the audit log before a response was ever returned to the caller.
To ensure resilience, we built circuit breaker logic into every vendor integration. If an API timeout rate crossed a specific threshold, the system would automatically reroute to a backup endpoint. This architecture was put to the test sooner than expected — a story detailed in the next tab.
What went wrong
In early 2024, a failure in an upstream table caused a student segmentation flag to drop before a decisioning cycle. Because the field arrived as null, the engine defaulted to treating the affected segment as non-students. This bypassed the conservative protections we specifically apply to student populations to foster long-term banking relationships.
The result was a cascade of unearned credit line decreases. Remediation took weeks of outreach letters, manual reversals, and reconciliation across three different systems of record. Beyond the six-figure operational cost, the hit to customer trust and our regulatory standing was significant. It was a sobering reminder that our engine was only as reliable as the data feeding it.
Building the Floodgate
In the wake of that incident, I designed a pre-campaign validation layer the team named the Floodgate. The logic was straightforward: if the data cannot be trusted, the engine does not run. Before any decisioning cycle begins, the system queries every critical upstream field and validates it against configurable thresholds. If a check fails, the campaign stops immediately. There are no exceptions and no overrides without documented approval from a named stakeholder.
The Floodgate monitors three failure categories: completeness, referential integrity, and distributional drift. While a missing field is easy to spot, drift is more insidious. We track the statistical shape of every field against a 90-day rolling baseline to catch instances where a value is present but has quietly changed what it represents.
This drift check proved its worth six months after launch. An upstream provider changed a monthly income field to an annual figure without notice. Our logic would have multiplied that annual figure by twelve, artificially inflating customer incomes and potentially leading to catastrophic credit increases across the portfolio.
The system flagged a z-score of 11.8 standard deviations and halted the campaign before a single decision was made. What would have been a major regulatory disclosure event became a near-miss known only to a handful of people on the technical team.
When everyone's project is Priority 1
In Q2 2024, three senior stakeholders escalated Priority 1 requests at the same time. Risk needed a FEMA API integration before hurricane season. Marketing needed an email delivery channel to bypass a physical mail capacity cap. Analytics needed a complete overhaul of the waterfall decisioning logging architecture. Every team had a legitimate case. None of them wanted to wait. I had one engineering team.
To score them objectively I applied Weighted Shortest Job First, which weighs business value, time criticality, and risk reduction against implementation effort. The goal was to get a number on the table before anyone's preference entered the room.
| Initiative | Biz Value | Time Crit. | Risk Red. | Effort | WSJF |
| FEMA API ✓ | 8 | 9 | 7 | 3 | 8.0 |
| Email Integration | 4 | 6 | 3 | 5 | 2.6 |
| Waterfall Logging | 7 | 3 | 5 | 8 | 1.9 |
FEMA API won by a wide margin. Hurricane season was weeks away, moving from state-level exclusions to ZIP-code granularity would unlock thousands of previously blocked accounts, and it was a single stable endpoint with low implementation effort. The email integration had a short-term workaround available. The logging overhaul was genuinely valuable work with no hard deadline.
I personally wanted the logging overhaul most. I'd been frustrated by the analytical gap it created for months and already knew exactly what I'd build. That's precisely why the scoring mattered. It removed my preference from the equation and gave me something objective to show the teams that didn't get what they asked for. When stakeholders know the process is consistent and transparent, a no lands differently than when it feels arbitrary.
My approach to prioritization
WSJF works well when you have multiple competing initiatives and need a defensible ranking. But it's one tool in a larger set. For roadmap planning I use a combination of OKR alignment and opportunity scoring to evaluate whether a project moves a metric that actually matters to the business. For shorter-horizon trade-offs, like sprint-level decisions, I rely on impact-effort mapping to quickly visualize where the team's time creates the most value relative to cost.
For stakeholder communication I find that the most important prioritization skill isn't the scoring framework itself. It's having a consistent north star metric that everyone agrees on in advance. At PNC that metric is risk-adjusted revenue. When every team's request gets measured against the same ruler, the conversation shifts from whose project is more important to which project moves the number furthest. That's a much more productive argument to have.
I also use MoSCoW prioritization for scope decisions within a project, especially when a deadline is fixed and trade-offs need to be made fast. Separating must-haves from should-haves from won't-haves forces the product team to get explicit about what a launch actually requires, rather than letting every feature be implicitly critical until someone has to cut something at the last minute.
Enriching the model with behavioral signals
Running alongside the infrastructure work, I identified an opportunity to improve the quality of the decisioning model's inputs. Standard credit features draw heavily on FICO and bureau signals, which are lagging indicators by design. I worked with data science to scope an LLM-based transaction classifier that normalizes raw transaction descriptions into a consistent category taxonomy across 30, 60, and 90-day windows. The result was 400+ distinct behavioral signals feeding the feature store at decision time, giving the model visibility into spending trends rather than just credit history.
The campaign starts breaking
On a Tuesday morning in Q4 2024, the decision engine started throwing failures during a live credit line increase campaign. Within twenty minutes we were at a 25% failure rate across hundreds of thousands of accounts. Each failed account represented roughly $40 in Risk-Adjusted Return that would be permanently lost if the campaign timed out. At full throughput, the campaign was generating over a million dollars an hour.
The instinct from my leadership was to shut it down immediately. I asked for thirty minutes to understand what was actually happening before we pulled the plug.
Figuring out what we actually had
I pulled together the Risk lead, the Credit Card Director, and tech leadership into a war room and ran two analyses in parallel. Infrastructure logs showed 25% of API requests timing out at the FICO vendor's load balancer. The decision logic was processing correctly. The engine wasn't broken — the highway was congested.
At the same time, I pulled the demographic and credit profile distribution of failed accounts against successful ones in real time. There was no correlation with credit score band, geographic cluster, income proxy, or any protected class attribute. The failures were statistically random.
That distinction matters more than it sounds. A logic failure means the engine is making wrong decisions and you stop immediately. An infrastructure failure means some decisions aren't completing, but the ones that do are correct. These situations require completely different responses, and confusing them is expensive.
The case for staying live
I presented three facts. Seventy-five percent of decisions were still processing correctly. The failure distribution showed no demographic clustering, which meant no Fair Lending exposure. And we were still inside the pre-notification window, meaning every failed account could be requeued without legal consequence.
My proposal was to stay live with hourly checkpoints and a hard automatic kill switch if failure rates crossed 30%. If the vendor resolved the issue, we recovered the full revenue. If they didn't, we had a clean exit at a defined threshold. FICO found and fixed the load balancer issue within eight hours. We stayed live the entire time.
MIT Lincoln Laboratory
Research Data Scientist · Lexington, MA · May 2022 – January 2023
The assignment
MIT Lincoln Laboratory does classified applied research for the Department of Defense. The program I joined was monitoring a targeted extremist online community — roughly 40,000 active users generating millions of posts a month. The analyst team was drowning in manual content review. My job was to build the system that automated the noise and surfaced the signal, so they could focus on the work that actually required human judgment.
The platform
I built an end-to-end NLP pipeline from raw ingestion to analyst-facing intelligence outputs. The ingestion layer was a multi-threaded scraper with bot detection avoidance and rate limiting that collected 20M+ posts while preserving full metadata. I designed the storage schema around the queries the team actually needed to run — not just around getting the data down.
The modeling layer used BERT embeddings with HDBSCAN clustering to surface thematic communities and coordinated messaging patterns that a keyword search would never find. Topic labeling ran through a zero-shot NLI classifier that didn't require labeled training data, which mattered because the topic space shifted fast and the source material was too sensitive to annotate at scale.
The piece that made the biggest difference was the social network graph. I modeled the full 40,000-node network and used betweenness and eigenvector centrality to identify high-influence actors and coordination clusters. Individual posts are noisy and easy to manufacture. Network structure is much harder to fake. Modeling who was amplifying whom surfaced patterns that no content-based approach would have found.
The platform reduced analyst manual review time by roughly 60%, which meant the same team could cover substantially more ground or redirect their attention to higher-judgment work that the system couldn't automate. After I left, the system remained in production as a standing capability for the research program. It was the first time I built something that outlasted my tenure, and it shaped how I think about what good platform work looks like.