Is Privacy-First Underwriting for NTC Users Achievable in India?
Is Privacy-First Underwriting for NTC Users Achievable in India?
Is Privacy-First Underwriting for NTC Users Achievable in India?
4 min read
4 min read
Indian statute does not define underwriting in the context of retail or MSME lending, but in practice, it refers to the structured assessment of repayment capacity, probability of default, loss severity, and portfolio concentration prior to sanction and pricing.
Historically, underwriting was documentation-centric and verification-based. Decision-making relied on reports from entities such as TransUnion CIBIL, income proof and bank statements. This framework favoured formally employed and collateral-backed borrowers, while New-to-Credit (‘NTC’) borrowers, including gig workers, unorganized income entrants, and income earners outside formal credit channels, remained largely excluded, as the absence of formal credit history translated into unquantified risk.
With fintech scale and smartphone penetration, underwriting has shifted toward extracting signals from transactional and behavioural data. New inputs now include bank statement parsing, GST returns for merchants, platform performance data, cash flow regularity, and device metadata and behavioural proxies. Alternative data functions as a complement to traditional inputs and helps bridge information gaps for thin-file or NTC consumers, while also reducing acquisition costs in digital consumer lending.
Lenders increasingly deploy machine learning models that analyse bank transactions, UPI flows, and GST data (Account Aggregator Framework) to derive insights about income consistency and repayment capacity. However, these approaches remain dependent on the existence of meaningful digital financial activity, and their effectiveness is constrained by the availability and quality of underlying data.
Underwriting where formal financial activity is minimal
The primary limitation arises where data is sparse or entirely absent, including individuals with no formal credit history, irregular or cash-based income, and limited interaction with formal financial systems. In such cases, even well-designed consent frameworks and advanced models do not produce reliable underwriting outcomes, as the constraint lies in the absence of usable inputs rather than analytical capability.
User behaviour in this segment reflects these constraints. Individuals may be willing to share additional forms of data where it increases their chances of accessing credit, including microfinance or small-ticket loans, but such data is often inconsistently available and collected through fragmented or informal mechanisms. As a result, even expanded data strategies do not fully resolve the underlying problem of data absence.
Indian statute does not define underwriting in the context of retail or MSME lending, but in practice, it refers to the structured assessment of repayment capacity, probability of default, loss severity, and portfolio concentration prior to sanction and pricing.
Historically, underwriting was documentation-centric and verification-based. Decision-making relied on reports from entities such as TransUnion CIBIL, income proof and bank statements. This framework favoured formally employed and collateral-backed borrowers, while New-to-Credit (‘NTC’) borrowers, including gig workers, unorganized income entrants, and income earners outside formal credit channels, remained largely excluded, as the absence of formal credit history translated into unquantified risk.
With fintech scale and smartphone penetration, underwriting has shifted toward extracting signals from transactional and behavioural data. New inputs now include bank statement parsing, GST returns for merchants, platform performance data, cash flow regularity, and device metadata and behavioural proxies. Alternative data functions as a complement to traditional inputs and helps bridge information gaps for thin-file or NTC consumers, while also reducing acquisition costs in digital consumer lending.
Lenders increasingly deploy machine learning models that analyse bank transactions, UPI flows, and GST data (Account Aggregator Framework) to derive insights about income consistency and repayment capacity. However, these approaches remain dependent on the existence of meaningful digital financial activity, and their effectiveness is constrained by the availability and quality of underlying data.
Underwriting where formal financial activity is minimal
The primary limitation arises where data is sparse or entirely absent, including individuals with no formal credit history, irregular or cash-based income, and limited interaction with formal financial systems. In such cases, even well-designed consent frameworks and advanced models do not produce reliable underwriting outcomes, as the constraint lies in the absence of usable inputs rather than analytical capability.
User behaviour in this segment reflects these constraints. Individuals may be willing to share additional forms of data where it increases their chances of accessing credit, including microfinance or small-ticket loans, but such data is often inconsistently available and collected through fragmented or informal mechanisms. As a result, even expanded data strategies do not fully resolve the underlying problem of data absence.
Privacy-preserving approaches such as federated learning and differential privacy offer a different direction by enabling analysis without direct data sharing. However, their effectiveness depends on the existence of relevant data and financial footprints within the ecosystem. Where such a footprint is weak or non-existent, these systems have limited utility.
Indian statute does not define underwriting in the context of retail or MSME lending, but in practice, it refers to the structured assessment of repayment capacity, probability of default, loss severity, and portfolio concentration prior to sanction and pricing.
Historically, underwriting was documentation-centric and verification-based. Decision-making relied on reports from entities such as TransUnion CIBIL, income proof and bank statements. This framework favoured formally employed and collateral-backed borrowers, while New-to-Credit (‘NTC’) borrowers, including gig workers, unorganized income entrants, and income earners outside formal credit channels, remained largely excluded, as the absence of formal credit history translated into unquantified risk.
With fintech scale and smartphone penetration, underwriting has shifted towards extracting signals from transactional and behavioural data. New inputs now include bank statement parsing, GST returns for merchants, platform performance data, cash flow regularity, device metadata and behavioural proxies. Alternative data functions as a complement to traditional inputs and helps bridge information gaps for thin-file or NTC consumers, while also reducing acquisition costs in digital consumer lending.
Lenders increasingly deploy machine learning models that analyse bank transactions, UPI flows, and GST data (Account Aggregator Framework) to derive insights about income consistency and repayment capacity. However, these approaches remain dependent on the existence of meaningful digital financial activity, and their effectiveness is constrained by the availability and quality of underlying data.
Underwriting where formal financial activity is minimal
The primary limitation arises where data is sparse or entirely absent, including individuals with no formal credit history, irregular or cash-based income, and limited interaction with formal financial systems. In such cases, even well-designed consent frameworks and advanced models do not produce reliable underwriting outcomes, as the constraint lies in the absence of usable inputs rather than analytical capability.
User behaviour in this segment reflects these constraints. Individuals may be willing to share additional forms of data where it increases their chances of accessing credit, including microfinance or small-ticket loans, but such data is often inconsistently available and collected through fragmented or informal mechanisms. As a result, even expanded data strategies do not fully resolve the underlying problem of data absence.
Indian statute does not define underwriting in the context of retail or MSME lending, but in practice, it refers to the structured assessment of repayment capacity, probability of default, loss severity, and portfolio concentration prior to sanction and pricing.
Historically, underwriting was documentation-centric and verification-based. Decision-making relied on reports from entities such as TransUnion CIBIL, income proof and bank statements, physical verification and field investigation, and committee-level credit sanctioning. This framework favoured formally employed and collateral-backed borrowers, while New-to-Credit (‘NTC’) borrowers, including gig workers, new salaried entrants, and MSMEs outside formal credit channels, remained largely excluded, as the absence of formal credit history translated into unquantified risk.
With fintech scale and smartphone penetration, underwriting has shifted toward extracting signals from transactional and behavioural data. New inputs now include bank statement parsing, GST returns for merchants, platform performance data, cash flow regularity, and device metadata and behavioural proxies. Alternative data functions as a complement to traditional inputs and helps bridge information gaps for thin-file or NTC consumers, while also reducing acquisition costs in digital consumer lending.
Lenders increasingly deploy machine learning models that analyse bank transactions, UPI flows, and GST data (Account Aggregator Framework) to derive insights about income consistency and repayment capacity. However, these approaches remain dependent on the existence of meaningful digital financial activity, and their effectiveness is constrained by the availability and quality of underlying data.
Underwriting where formal financial activity is minimal
The primary limitation arises where data is sparse or entirely absent, including individuals with no formal credit history, irregular or cash-based income, and limited interaction with formal financial systems. In such cases, even well-designed consent frameworks and advanced models do not produce reliable underwriting outcomes, as the constraint lies in the absence of usable inputs rather than analytical capability.
User behaviour in this segment reflects these constraints. Individuals may be willing to share additional forms of data where it increases their chances of accessing credit, including microfinance or small-ticket loans, but such data is often inconsistently available and collected through fragmented or informal mechanisms. As a result, even expanded data strategies do not fully resolve the underlying problem of data absence.
Privacy-preserving approaches such as federated learning and differential privacy offer a different direction by enabling analysis without direct data sharing. However, their effectiveness depends on the existence of relevant data and financial footprints within the ecosystem. Where such a footprint is weak or non-existent, these systems have limited utility.
Privacy-preserving approaches such as federated learning and differential privacy offer a different direction by enabling analysis without direct data sharing. However, their effectiveness depends on the existence of relevant data and financial footprints within the ecosystem. Where such a footprint is weak or non-existent, these systems have limited utility.
Consent driven by the need for credit
Consent driven by the need for credit
Privacy-preserving approaches such as federated learning and differential privacy offer a different direction by enabling analysis without direct data sharing. However, their effectiveness depends on the existence of relevant data and financial footprints within the ecosystem. Where such a footprint is weak or non-existent, these systems have limited utility.
Consent driven by the need for credit
In a layered lending stack involving multiple data sources and third parties, operationalising and monitoring compliance under the DPDP Act presents practical challenges.
The World Bank’s Bank’s Study on Alternative Data in Credit Risk Assessment recognises that alternative data may include transactional records, utility payments, app usage, mobile money transactions, and e-commerce participation, and recommends a risk-based approach combined with consumer-permissioned, secure data-sharing. It also highlights that consent in alternative credit models is often difficult to interpret and may become effectively coerced where access to credit is contingent on agreement.
This becomes particularly relevant in the NTC context, where borrowers may prioritise access to credit over negotiating data use terms. As a result, consent for behavioural or non-essential data cannot be treated as fully voluntary in the conventional sense, especially where it is embedded within onboarding flows.
In this context, a defensible approach for NTC microfinance products is a structured implementation of purpose limitation, as behavioural data is continuous and does not map neatly to a single purpose in the way financial data does. Addressing this requires system-level design choices:
Segregation of behavioural data by use case (for example, fraud detection versus credit assessment), each with a defined purpose
Prohibition on repurposing data collected for one function toward another without fresh consent
Restriction to behaviour-based signals that are demonstrably necessary for credit risk, with exclusion of weak or proxy indicators, particularly those correlated with socio-economic or personal traits, at the feature engineering stage
Replacement of continuous tracking with time-bound or event-based data use
In a layered lending stack involving multiple data sources and third parties, operationalising and monitoring compliance under the DPDP Act presents practical challenges.
The World Bank’s Study on Alternative Data in Credit Risk Assessment recognises that alternative data may include transactional records, utility payments, app usage, mobile money transactions, and e-commerce participation, and recommends a risk-based approach combined with consumer-permissioned, secure data-sharing. It also highlights that consent in alternative credit models is often difficult to interpret and may become effectively coerced where access to credit is contingent on agreement.
This becomes particularly relevant in the NTC context, where borrowers may prioritise access to credit over negotiating data use terms. As a result, consent for behavioural or non-essential data cannot be treated as fully voluntary in the conventional sense, especially where it is embedded within onboarding flows.
In this context, a defensible approach for NTC microfinance products is a structured implementation of purpose limitation, as behavioural data is continuous and does not map neatly to a single purpose in the way financial data does. Addressing this requires system-level design choices:
Segregation of behavioural data by use case (for example, fraud detection versus credit assessment), each with a defined purpose
Prohibition on repurposing data collected for one function toward another without fresh consent
Restriction to behaviour-based signals that are demonstrably necessary for credit risk, with exclusion of weak or proxy indicators, particularly those correlated with socio-economic or personal traits, at the feature engineering stage
Replacement of continuous tracking with time-bound or event-based data use