JH Blog

파이썬의 현대적인 HTTP 클라이언트 라이브러리로, requests와 유사하지만 더 많은 기능 제공

HTTP/2 프로토콜 지원
async/await 비동기 처리
동기/비동기 API 모두 제공
세밀한 타임아웃 및 연결 관리

기본 사용법

python

import httpx

# 동기 방식
response = httpx.get('https://api.example.com/data')

# 비동기 방식
async with httpx.AsyncClient() as client:
    response = await client.get('https://api.example.com/data')

1. AsyncClient와 Response 객체

객체의 정체

python

# AsyncClient: 비동기 HTTP 클라이언트 객체
client = httpx.AsyncClient()  # AsyncClient 인스턴스 반환
print(type(client))  # <class 'httpx.AsyncClient'>

# Response: HTTP 응답 객체
response = await client.get('https://example.com')  # Response 객체 반환
print(type(response))  # <class 'httpx.Response'>

관계:

httpx.AsyncClient() → AsyncClient 객체 (HTTP 클라이언트)
client.get(), client.post() 등 → Response 객체 (HTTP 응답)

Response 객체 속성

python

response = await client.get('https://api.example.com/data')

print(response.status_code)  # 200
print(response.headers)      # 헤더 정보
print(response.text)         # 응답 본문 (텍스트)
print(response.json())       # JSON 파싱된 데이터
print(response.content)      # 응답 본문 (바이트)

2. 클라이언트 재사용의 핵심: 소켓 연결 재사용

TCP 소켓 쌍이란?

TCP 연결은 다음 4가지 정보로 고유하게 식별됨 (4-tuple):

text

(클라이언트 IP, 클라이언트 포트, 서버 IP, 서버 포트)

실제 소켓 연결 예시

python

async with httpx.AsyncClient() as client:
    # 첫 번째 요청: 새로운 TCP 연결 생성
    # TCP 핸드셰이크: SYN → SYN-ACK → ACK
    # 소켓 쌍: 클라이언트(192.168.1.10:54321) ↔ 서버(93.184.216.34:443)
    response1 = await client.get('https://example.com/api/users')
    
    # 두 번째 요청: 동일한 소켓 연결 재사용!
    # TCP 핸드셰이크 생략, 기존 연결로 즉시 HTTP 요청
    # 동일 소켓: 클라이언트(192.168.1.10:54321) ↔ 서버(93.184.216.34:443)
    response2 = await client.get('https://example.com/api/posts')
    
    # 세 번째 요청: 여전히 같은 소켓 재사용
    response3 = await client.get('https://example.com/api/comments')

# 컨텍스트 종료 시 소켓 연결 종료

성능 차이

매번 새 클라이언트 생성 (비효율적):

python

# ❌ 나쁜 예
async def bad_approach():
    async with httpx.AsyncClient() as client:
        response1 = await client.get('https://api.example.com/data1')
    # 연결 종료
    
    async with httpx.AsyncClient() as client:
        response2 = await client.get('https://api.example.com/data2')
    # 연결 종료

# 과정:
# 요청1: TCP 핸드셰이크 → HTTP 요청 → 응답 → 연결 종료
# 요청2: TCP 핸드셰이크 → HTTP 요청 → 응답 → 연결 종료

클라이언트 재사용 (효율적):

python

# ✅ 좋은 예
async def good_approach():
    async with httpx.AsyncClient() as client:
        response1 = await client.get('https://api.example.com/data1')
        response2 = await client.get('https://api.example.com/data2')
        response3 = await client.get('https://api.example.com/data3')
    # 여러 요청 후 한 번만 연결 종료

# 과정:
# 요청1: TCP 핸드셰이크 → HTTP 요청 → 응답
# 요청2: HTTP 요청 → 응답 (핸드셰이크 생략!)
# 요청3: HTTP 요청 → 응답 (핸드셰이크 생략!)
# 마지막: 연결 종료

3. 연결 제한 설정

max_connections vs max_keepalive_connections

python

client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=10,           # 최대 연결 수
        max_keepalive_connections=3   # 유지할 연결 수
    )
)

max_connections (최대 연결 수)

정의: 지금 이 순간 동시에 열려있을 수 있는 소켓의 총 개수

포함 범위:

현재 활발히 통신 중인 연결
응답 완료 후 유휴 상태로 대기 중인 연결
위 둘을 합쳐서 이 개수를 초과할 수 없음

max_keepalive_connections (유지할 연결 수)

정의: 응답을 받은 후에도 종료하지 않고 재사용을 위해 살려둘 연결의 개수

동작:

요청 완료 후 이 개수만큼만 keep-alive 상태로 유지
초과하는 연결은 즉시 close하여 리소스 해제

구체적인 동작 시나리오

python

client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=10,
        max_keepalive_connections=3
    )
)

# === 상황 1: 10개 동시 요청 ===
tasks = [client.get(f'https://api{i}.com') for i in range(10)]
await asyncio.gather(*tasks)

# 현재 상태: 10개 소켓 모두 열림 (max_connections 한도 도달)
[소켓1-사용] [소켓2-사용] [소켓3-사용] ... [소켓10-사용]
       ← 동시에 열린 연결: 10개 →


# === 상황 2: 모든 요청 완료 ===
# 자동으로 연결 정리
# - 3개만 keep-alive로 유휴 상태 유지 (재사용 대기)
# - 나머지 7개는 즉시 FIN으로 종료

[소켓1-유휴] [소켓2-유휴] [소켓3-유휴] [닫힘] [닫힘] ... [닫힘]
   ← keep-alive: 3개 →


# === 상황 3: 새로운 요청 ===
response = await client.get('https://api1.com')
# api1으로의 연결이 풀에 있으면 즉시 재사용 (소켓1)
# 없으면 새로 생성 (max_connections 범위 내에서)


# === 상황 4: 한도 초과 시도 ===
# 이미 10개 연결이 사용 중일 때
response = await client.get('https://api11.com')
# 기존 연결 중 하나가 해제될 때까지 대기 (블로킹)

시각화

text

요청 중 (10개 사용):
[소켓1] [소켓2] [소켓3] [소켓4] [소켓5] [소켓6] [소켓7] [소켓8] [소켓9] [소켓10]
←────────────────── max_connections = 10 ──────────────────→

요청 완료 후:
[소켓1-유휴] [소켓2-유휴] [소켓3-유휴] [닫힘] [닫힘] [닫힘] [닫힘] [닫힘] [닫힘] [닫힘]
←─ max_keepalive = 3 ─→

5. 프로세스 vs 소켓 연결

핵심 개념

한 쌍의 소켓 연결은 프로세스가 아님

하나의 프로세스가 여러 개의 소켓 연결을 소유할 수 있음.

계층 구조

text

프로세스 (PID: 12345)  ← 하나의 Python 프로그램
├── 메모리 공간
├── 실행 중인 코드
├── 열린 파일들
└── 소켓 연결들  ← 프로세스 내부의 리소스
    ├── 소켓1 (파일 디스크립터: 3) → api1.com:443
    ├── 소켓2 (파일 디스크립터: 4) → api2.com:443
    ├── 소켓3 (파일 디스크립터: 5) → api3.com:443
    └── 소켓4 (파일 디스크립터: 6) → api4.com:443

실제 예시

python

# 하나의 Python 프로세스 (PID: 12345)
import httpx
import asyncio

async def main():
    client = httpx.AsyncClient(limits=httpx.Limits(max_connections=100))
    
    # 100개의 동시 요청 = 100개의 소켓 연결
    tasks = [client.get(f'https://api{i}.com') for i in range(100)]
    await asyncio.gather(*tasks)

# 프로세스: 1개
# 소켓 연결: 100개
# 파일 디스크립터: 100개

시스템에서 확인

bash

# 1. 프로세스 확인
ps aux | grep python
# 출력: 12345 user  python script.py  ← 하나의 프로세스

# 2. 해당 프로세스의 소켓 연결 확인
lsof -p 12345 | grep TCP
# 출력:
# python 12345 user 3u IPv4 TCP 192.168.1.10:54321->93.184.216.34:443 (ESTABLISHED)
# python 12345 user 4u IPv4 TCP 192.168.1.10:54322->93.184.216.35:443 (ESTABLISHED)
# python 12345 user 5u IPv4 TCP 192.168.1.10:54323->93.184.216.36:443 (ESTABLISHED)
# ...

# 하나의 PID(12345)가 여러 소켓 연결을 가짐

파일 디스크립터

Unix/Linux에서는 "모든 것은 파일" 철학:

python

import socket

# 소켓도 파일 디스크립터로 표현됨
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print(sock.fileno())  # 출력: 3 (파일 디스크립터 번호)

# 일반 파일도 파일 디스크립터
with open('file.txt', 'r') as f:
    print(f.fileno())  # 출력: 4

# 프로세스당 파일 디스크립터 한도
import resource
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)
print(f"소프트 한계: {soft}, 하드 한계: {hard}")
# 예: 소프트 한계: 1024, 하드 한계: 4096

비교표

구분	프로세스	소켓 연결
정의	실행 중인 프로그램	네트워크 통신 엔드포인트
관계	컨테이너	프로세스가 소유하는 리소스
개수	프로그램당 보통 1개	프로세스당 수백~수천 개 가능
OS 식별	PID (프로세스 ID)	파일 디스크립터
생성 비용	무거움 (메모리, CPU 많이 소비)	상대적으로 가벼움
독립성	독립된 메모리 공간	프로세스 내 메모리 공유

6. 포트와 소켓 연결

핵심 사실

프로세스가 아님에도 불구하고, 각 소켓 연결마다 클라이언트 측에서 하나의 포트를 점유함

TCP 연결의 고유 식별

TCP 연결은 4-tuple로 식별:

text

(클라이언트 IP, 클라이언트 포트, 서버 IP, 서버 포트)

각 연결은 고유한 4-tuple을 가져야 하므로, 같은 서버에 여러 연결을 맺으려면 클라이언트 포트가 달라야 함.

실제 예시

python

import httpx
import asyncio

async def main():
    client = httpx.AsyncClient()
    
    # 같은 서버로 3개 동시 연결
    tasks = [
        client.get('https://example.com/page1'),
        client.get('https://example.com/page2'),
        client.get('https://example.com/page3')
    ]
    await asyncio.gather(*tasks)

실제 소켓 상태:

text

연결1: (클라이언트 192.168.1.10:54321) ↔ (서버 93.184.216.34:443)
연결2: (클라이언트 192.168.1.10:54322) ↔ (서버 93.184.216.34:443)
연결3: (클라이언트 192.168.1.10:54323) ↔ (서버 93.184.216.34:443)

분석:

서버 IP와 포트: 동일 (93.184.216.34:443)
클라이언트 IP: 동일 (192.168.1.10)
클라이언트 포트: 각각 다름 (54321, 54322, 54323)
하나의 프로세스지만 3개의 클라이언트 포트 사용!

클라이언트 임시 포트 범위

bash

# Linux/macOS에서 사용 가능한 임시 포트 확인
cat /proc/sys/net/ipv4/ip_local_port_range
# 출력: 32768   60999

# 사용 가능한 포트: 60999 - 32768 + 1 = 28,232개

Windows:

text

49152 ~ 65535 (약 16,384개)

포트 고갈 시나리오

python

# 너무 많은 동시 연결 시도
client = httpx.AsyncClient(limits=httpx.Limits(max_connections=50000))

# 같은 서버로 50,000개 연결 시도
tasks = [client.get('https://api.example.com') for _ in range(50000)]
await asyncio.gather(*tasks)

# 결과:
# 처음 ~28,000개: 성공
# 그 이후: OSError: [Errno 99] Cannot assign requested address

왜 실패하는가:

text

포트 범위: 32768 ~ 60999 (28,232개)

연결1:  클라이언트:32768 → 서버:443  ✅
연결2:  클라이언트:32769 → 서버:443  ✅
연결3:  클라이언트:32770 → 서버:443  ✅
...
연결28232: 클라이언트:60999 → 서버:443  ✅
연결28233: 클라이언트:????? → 서버:443  ❌ 사용 가능한 포트 없음!

다른 서버로의 연결

중요: 서버 IP가 다르면 클라이언트 포트를 재사용할 수 있음!

python

# 10개의 다른 서버로 각각 1000개씩 연결
tasks = []
for i in range(10):
    for j in range(1000):
        tasks.append(client.get(f'https://api{i}.example.com/data'))

await asyncio.gather(*tasks)
# 총 10,000개 연결이지만 포트 고갈 안 됨!

왜 가능한가:

text

서버1로의 연결:
(클라이언트:32768) ↔ (서버1 IP:443)
(클라이언트:32769) ↔ (서버1 IP:443)
...

서버2로의 연결:
(클라이언트:32768) ↔ (서버2 IP:443)  ← 같은 포트 32768 재사용!
(클라이언트:32769) ↔ (서버2 IP:443)  ← 같은 포트 32769 재사용!
...

4-tuple이 다르므로 각 연결은 고유함:
- (클라이언트:32768, 서버1:443) ≠ (클라이언트:32768, 서버2:443)

7. max_connections을 제한하는 이유

1. 클라이언트 측 포트 고갈

python

# 문제 상황
client = httpx.AsyncClient(limits=httpx.Limits(max_connections=50000))

# 같은 서버로 50,000개 동시 연결 시도
tasks = [client.get('https://api.example.com/item/1') for _ in range(50000)]
await asyncio.gather(*tasks)

# 결과: OSError - 포트 부족

왜 발생하나:

사용 가능한 임시 포트: ~28,000개
요청한 연결: 50,000개
차이: 22,000개 부족

해결:

python

# ✅ 적절한 제한
client = httpx.AsyncClient(limits=httpx.Limits(max_connections=100))

2. 서버 측 부하

python

# 문제 상황
client = httpx.AsyncClient(limits=httpx.Limits(max_connections=10000))

# 서버에 10,000개 동시 연결
tasks = [client.get('https://small-api.com/data') for _ in range(10000)]
await asyncio.gather(*tasks)

# 결과: 서버가 감당 못함 (503 Service Unavailable 또는 타임아웃)

서버 관점:

연결 처리 한계 초과
메모리 부족
CPU 과부하
다른 사용자 서비스 불가능

해결:

python

# ✅ 서버 친화적 접근
client = httpx.AsyncClient(limits=httpx.Limits(max_connections=50))

# 배치 처리
async def fetch_in_batches(urls, batch_size=50):
    results = []
    for i in range(0, len(urls), batch_size):
        batch = urls[i:i+batch_size]
        batch_results = await asyncio.gather(*[client.get(url) for url in batch])
        results.extend(batch_results)
    return results

3. 클라이언트 측 리소스

python

# 문제 상황: 파일 디스크립터 고갈
client = httpx.AsyncClient(limits=httpx.Limits(max_connections=5000))

# 시스템 한도: 1024개
import resource
soft, _ = resource.getrlimit(resource.RLIMIT_NOFILE)
print(f"파일 디스크립터 한도: {soft}")  # 1024

# 결과: OSError: [Errno 24] Too many open files

왜 발생하나:

각 소켓 = 파일 디스크립터 1개
시스템 한도: 1024개
요청: 5000개 연결

해결:

python

# ✅ 방법 1: 연결 수 제한
client = httpx.AsyncClient(limits=httpx.Limits(max_connections=500))

# ✅ 방법 2: 시스템 한도 증가 (주의 필요)
import resource
resource.setrlimit(resource.RLIMIT_NOFILE, (4096, 4096))

4. 메모리 사용량

python

# 문제 상황
client = httpx.AsyncClient(limits=httpx.Limits(max_connections=10000))

# 각 연결이 평균 50KB 메모리 사용
# 10,000개 × 50KB = 500MB

# 작은 시스템(예: 1GB RAM 컨테이너)에서 문제 발생

해결:

python

# ✅ 리소스 제약 환경
client = httpx.AsyncClient(limits=httpx.Limits(max_connections=100))
# 100개 × 50KB = 5MB (관리 가능)

8. TIME_WAIT 상태와 포트 재사용

TIME_WAIT란?

TCP 연결을 정상 종료(FIN)한 후, 소켓이 즉시 닫히지 않고 일정 시간(보통 60초) 대기하는 상태.

발생 원인

python

# 매번 새 클라이언트 생성
for i in range(1000):
    async with httpx.AsyncClient() as client:
        response = await client.get('https://api.example.com/data')
    # 연결 정상 종료 → TIME_WAIT 상태 진입 (60초)

# 결과:
# - 1000개 연결 생성
# - 1000개 포트가 TIME_WAIT 상태로 60초간 묶임
# - 사용 가능한 포트 감소

TIME_WAIT 확인

bash

netstat -an | grep TIME_WAIT | wc -l
# 예: 800  ← 800개 포트가 TIME_WAIT 상태

문제 상황

python

# 짧은 시간에 많은 연결 생성/종료
async def bad_pattern():
    for i in range(30000):
        async with httpx.AsyncClient() as client:
            await client.get('https://api.example.com/data')
        # 매번 연결 종료

# 결과:
# 처음 15,000개: 성공
# 그 다음: 포트 고갈 (나머지는 TIME_WAIT 상태)
# OSError: Cannot assign requested address

해결 방법

방법 1: 클라이언트 재사용 (권장)

python

# ✅ 올바른 방법
async def good_pattern():
    async with httpx.AsyncClient() as client:
        for i in range(30000):
            await client.get('https://api.example.com/data')
    # 한 번만 종료 → TIME_WAIT도 한 번만

# 사용 포트: 1~2개
# TIME_WAIT: 최소화

방법 2: SO_REUSEADDR 사용 (비권장)

python

# ⚠️ 고급 사용 (부작용 있음)
import socket

transport = httpx.AsyncHTTPTransport(
    socket_options=[(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)]
)
client = httpx.AsyncClient(transport=transport)

# 주의: 데이터 무결성 문제 발생 가능

방법 3: SO_LINGER=0 (절대 비권장)

python

# ❌ 사용하지 말 것
# RST로 강제 종료 → TIME_WAIT 없음
# 하지만 데이터 손실 위험

transport = httpx.AsyncHTTPTransport(
    socket_options=[(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 0))]
)

TIME_WAIT가 필요한 이유

text

클라이언트                           서버
    |                                 |
    |--- FIN (연결 종료) ------------>|
    |<-- ACK (확인) ------------------|
    |<-- FIN (서버도 종료) ----------|
    |--- ACK (확인) ---------------->|
    |                                 |
    |   TIME_WAIT (60초)             |
    |   - 마지막 ACK이 유실될 경우 대비  |
    |   - 지연 패킷 처리              |
    |                                 |
   종료                              종료

만약 TIME_WAIT 없이 즉시 포트 재사용하면:

이전 연결의 지연 패킷이 새 연결에 도착
데이터 혼선 발생
보안 위험

9. 권장 설정값

일반적인 사용

python

# 대부분의 경우
client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=100,
        max_keepalive_connections=20
    ),
    timeout=httpx.Timeout(30.0)
)

높은 처리량 필요

python

# API 서버, 크롤러 등
client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=500,
        max_keepalive_connections=50
    ),
    timeout=httpx.Timeout(10.0)
)

리소스 제약 환경

python

# 임베디드, 컨테이너
client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=20,
        max_keepalive_connections=5
    ),
    timeout=httpx.Timeout(5.0)
)

max_keepalive 비율

python

# 일반 규칙: max_keepalive = max_connections // 5

max_conn = 100
max_keepalive = max_conn // 5  # 20

client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=max_conn,
        max_keepalive_connections=max_keepalive
    )
)

10. 실전 시나리오

시나리오 1: 단일 API 서버 대량 요청

python

# 상황: 하나의 API 서버로 10,000개 요청
# 목표: 빠르게 처리하되 포트 고갈 방지

client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=200,          # 충분히 큼
        max_keepalive_connections=50  # 재사용 극대화
    )
)

async def fetch_all_data():
    urls = [f'https://api.example.com/item/{i}' for i in range(10000)]
    tasks = [client.get(url) for url in urls]
    results = await asyncio.gather(*tasks)
    return results

# 동작:
# - 최대 200개 동시 연결
# - 요청 완료 시 50개 연결 유지
# - 나머지는 정상 종료 → TIME_WAIT
# - 유지된 50개 연결로 나머지 9,800개 요청 빠르게 처리

시나리오 2: 다수 서버 분산 요청

python

# 상황: 100개 서버로 각각 100개씩 요청
# 목표: 전체 처리 속도 최적화

client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=500,          # 여러 서버 대응
        max_keepalive_connections=50  # 자주 쓰는 서버만 유지
    )
)

servers = [f'https://api{i}.example.com' for i in range(100)]

async def fetch_from_all_servers():
    tasks = []
    for server in servers:
        for item_id in range(100):
            tasks.append(client.get(f'{server}/item/{item_id}'))
    
    results = await asyncio.gather(*tasks)
    return results

# 동작:
# - 다양한 서버로의 연결 생성
# - 각 서버별로 연결 재사용
# - 포트 고갈 위험 낮음 (서버 IP 다르므로 포트 재사용 가능)

시나리오 3: 웹 크롤링

python

# 상황: 수천 개의 다른 웹사이트 크롤링
# 목표: 안정적인 크롤링, 리소스 관리

client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=100,          # 보수적
        max_keepalive_connections=20  # 적당히
    ),
    timeout=httpx.Timeout(10.0),      # 빠른 타임아웃
    follow_redirects=True,
    headers={
        'User-Agent': 'MyBot/1.0'
    }
)

async def crawl_urls(urls):
    semaphore = asyncio.Semaphore(50)  # 추가 제한
    
    async def fetch_with_limit(url):
        async with semaphore:
            try:
                response = await client.get(url)
                return response.text
            except Exception as e:
                return None
    
    tasks = [fetch_with_limit(url) for url in urls]
    results = await asyncio.gather(*tasks)
    return results

# 동작:
# - max_connections과 Semaphore로 이중 제한
# - 안정적인 리소스 사용
# - 에러 처리로 일부 실패해도 계속 진행

시나리오 4: 리소스 제약 환경 (임베디드, 컨테이너)

python

# 상황: 메모리와 파일 디스크립터가 제한적
# 목표: 최소 리소스로 동작

client = httpx.AsyncClient(
    limits=httpx.Limits(
        max_connections=20,   # 매우 보수적
        max_keepalive_connections=5
    ),
    timeout=httpx.Timeout(5.0)
)

# 배치 처리로 메모리 사용 최소화
async def process_in_batches(items, batch_size=20):
    results = []
    for i in range(0, len(items), batch_size):
        batch = items[i:i+batch_size]
        batch_results = await asyncio.gather(*[
            client.get(f'https://api.example.com/item/{item}')
            for item in batch
        ])
        results.extend(batch_results)
        # 배치마다 짧은 대기로 시스템 부담 완화
        await asyncio.sleep(0.1)
    return results

# 동작:
# - 한 번에 20개씩만 처리
# - 메모리 사용 최소화
# - 배치 간 휴식으로 시스템 안정성 확보

11. 종합 정리

핵심 개념 요약

개념	설명
AsyncClient	비동기 HTTP 클라이언트 객체
Response	HTTP 응답 객체
소켓 연결	4-tuple로 식별되는 TCP 연결 (클라이언트IP:포트 ↔ 서버IP:포트)
연결 재사용	동일한 소켓으로 여러 HTTP 요청 전송 (keep-alive)
max_connections	동시에 열릴 수 있는 소켓의 최대 개수
max_keepalive_connections	응답 후 유지할 유휴 연결 개수
프로세스 vs 소켓	하나의 프로세스가 수천 개의 소켓 소유 가능
포트 점유	각 소켓 연결마다 클라이언트 포트 하나 필요
TIME_WAIT	정상 종료 후 60초간 포트 재사용 불가 상태

최적화 체크리스트

✅ 반드시 해야 할 것:

AsyncClient 재사용: 연결 풀링과 keep-alive 활용
적절한 max_connections 설정: 리소스와 요구사항에 맞게
에러 처리: 네트워크 실패에 대비
타임아웃 설정: 무한 대기 방지

✅ 권장 사항:

max_keepalive_connections 조정: 자주 쓰는 서버만 유지
배치 처리: 대량 요청 시 나누어 처리
Semaphore 추가: 추가적인 동시성 제어
모니터링: TIME_WAIT, 파일 디스크립터 확인

❌ 피해야 할 것:

매번 새 클라이언트 생성: TIME_WAIT와 핸드셰이크 오버헤드
SO_LINGER=0 사용: 데이터 손실 위험
무제한 max_connections: 리소스 고갈
Connection: close 헤더: keep-alive 비활성화

디버깅 명령어

bash

# 현재 연결 상태 확인
netstat -an | grep ESTABLISHED | wc -l  # 활성 연결
netstat -an | grep TIME_WAIT | wc -l     # TIME_WAIT 연결

# 특정 프로세스의 연결 확인
lsof -p <PID> | grep TCP

# 파일 디스크립터 사용량
ls -l /proc/<PID>/fd | wc -l

# 시스템 한도 확인
ulimit -n                                 # 파일 디스크립터
cat /proc/sys/net/ipv4/ip_local_port_range  # 포트 범위

# 실시간 모니터링
watch -n 1 'ss -s'                       # 소켓 통계

완전한 예제 코드

python

import httpx
import asyncio
from typing import List

class OptimizedHTTPClient:
    def __init__(self, max_connections: int = 100):
        self.client = httpx.AsyncClient(
            limits=httpx.Limits(
                max_connections=max_connections,
                max_keepalive_connections=max_connections // 5
            ),
            timeout=httpx.Timeout(30.0),
            follow_redirects=True
        )
    
    async def fetch_one(self, url: str) -> dict:
        """단일 URL 가져오기"""
        try:
            response = await self.client.get(url)
            response.raise_for_status()
            return {
                'url': url,
                'status': response.status_code,
                'data': response.text
            }
        except Exception as e:
            return {
                'url': url,
                'error': str(e)
            }
    
    async def fetch_many(self, urls: List[str]) -> List[dict]:
        """여러 URL 동시 가져오기"""
        tasks = [self.fetch_one(url) for url in urls]
        results = await asyncio.gather(*tasks)
        return results
    
    async def fetch_in_batches(self, urls: List[str], batch_size: int = 50) -> List[dict]:
        """배치로 나누어 가져오기 (메모리 효율적)"""
        results = []
        for i in range(0, len(urls), batch_size):
            batch = urls[i:i+batch_size]
            batch_results = await self.fetch_many(batch)
            results.extend(batch_results)
            print(f"처리 완료: {len(results)}/{len(urls)}")
        return results
    
    async def close(self):
        """클라이언트 정리"""
        await self.client.aclose()
    
    async def __aenter__(self):
        return self
    
    async def __aexit__(self, exc_type, exc_val, exc_tb):
        await self.close()

# 사용 예시
async def main():
    urls = [f'https://httpbin.org/delay/1?id={i}' for i in range(100)]
    
    async with OptimizedHTTPClient(max_connections=20) as client:
        # 방법 1: 모두 동시 실행
        results = await client.fetch_many(urls[:10])
        
        # 방법 2: 배치 처리
        results = await client.fetch_in_batches(urls, batch_size=10)
    
    print(f"총 {len(results)}개 처리 완료")

if __name__ == '__main__':
    asyncio.run(main())