Dev_Note/Krafton Jungle

[Python] MediaPipe Pose Detection Config Options

InFinity_Dev 2023. 2. 9. 03:07
728x90

https://google.github.io/mediapipe/solutions/pose.html#static_image_mode

STATIC_IMAGE_MODE = True / False
기본값 = False
False 일 경우 솔루션은 입력 이미지를 비디오 스트림으로 취급합니다.
첫번째 이미지에서 가장 두드러진 사람을 기준으로 랜드마크 추출을 지역화하여
후속이미지에서 화면상에서 인물 탐지를 잃을때 까지 해당 랜드마크를 기준으로 트래킹하여 계산과 지연시간을 줄인다.
True일 경우 매 입력 프레임마다 화면상에서 사람이 어디있는지 탐지하며, 정적이고 서로 연관되지 않은 일련의 이미지를 처리하는데 적합

MODEL_COMPLEXITY  = 0 / 1 / 2
기본값 = 1
포즈랜드마크 모델의 복잡성
랜드마크의 정확도와 추론대기 시간은 모델 복잡도가 올라갈수록 증가

SMOOTH_LANDMARKS = True / False
기본값 = True
True 일 경우, 솔루션은 지터를 줄이기 위해 각각의 인풋 프레임에 대해 필터링
STATIC_IMAGE_MODE에서는 True로 설정되어 있더라도 무시됨.

MIN_DETECTION_CONFIDENCE = [0.0 ~ 1.0]
기본값 = 0.5
화면 상에서 사람을 탐지하는 모델의 최소 신뢰 정확도. (화면상에서 사람이 있다고 판단하는 Threshold 값)

MIN_TRACKING_CONFIDENCE
기본값 = 0.5
포즈 랜드마크를 추적하기 위한 랜드마크-추적 모델의 최소 신뢰 정확도. (랜드마크 추적에 대한 Threshold 값)
설정 값 이하로 떨어지면 랜드마크 추적을 놓쳤다고 간주하고 해당 화면에서 사람 탐지를 다시 함.
값이 높을 경우 레이턴시가 길어지는 대신 솔루션에 대한 Robust를 높게 가져갈 수 있다.
STATIC_IMAGE_MODE 가 True 일때는 연관되지 않은 일련의 이미지라는 가정하에 기본적으로 매 프레임에 대해
새롭게 화면상의 사람 위치를 탐지하기 때문에 무시된다.

Sample Code for Config Options

import cv2
import mediapipe as mp
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_pose = mp.solutions.pose

# For static images:
IMAGE_FILES = []
BG_COLOR = (192, 192, 192) # gray
with mp_pose.Pose(
    static_image_mode=True,
    model_complexity=2,
    enable_segmentation=True,
    min_detection_confidence=0.5) as pose:
  for idx, file in enumerate(IMAGE_FILES):
    image = cv2.imread(file)
    image_height, image_width, _ = image.shape
    # Convert the BGR image to RGB before processing.
    results = pose.process(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

    if not results.pose_landmarks:
      continue
    print(
        f'Nose coordinates: ('
        f'{results.pose_landmarks.landmark[mp_pose.PoseLandmark.NOSE].x * image_width}, '
        f'{results.pose_landmarks.landmark[mp_pose.PoseLandmark.NOSE].y * image_height})'
    )

    annotated_image = image.copy()
    # Draw segmentation on the image.
    # To improve segmentation around boundaries, consider applying a joint
    # bilateral filter to "results.segmentation_mask" with "image".
    condition = np.stack((results.segmentation_mask,) * 3, axis=-1) > 0.1
    bg_image = np.zeros(image.shape, dtype=np.uint8)
    bg_image[:] = BG_COLOR
    annotated_image = np.where(condition, annotated_image, bg_image)
    # Draw pose landmarks on the image.
    mp_drawing.draw_landmarks(
        annotated_image,
        results.pose_landmarks,
        mp_pose.POSE_CONNECTIONS,
        landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style())
    cv2.imwrite('/tmp/annotated_image' + str(idx) + '.png', annotated_image)
    # Plot pose world landmarks.
    mp_drawing.plot_landmarks(
        results.pose_world_landmarks, mp_pose.POSE_CONNECTIONS)

# For webcam input:
cap = cv2.VideoCapture(0)
with mp_pose.Pose(
    min_detection_confidence=0.5,
    min_tracking_confidence=0.5) as pose:
  while cap.isOpened():
    success, image = cap.read()
    if not success:
      print("Ignoring empty camera frame.")
      # If loading a video, use 'break' instead of 'continue'.
      continue

    # To improve performance, optionally mark the image as not writeable to
    # pass by reference.
    image.flags.writeable = False
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    results = pose.process(image)

    # Draw the pose annotation on the image.
    image.flags.writeable = True
    image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
    mp_drawing.draw_landmarks(
        image,
        results.pose_landmarks,
        mp_pose.POSE_CONNECTIONS,
        landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style())
    # Flip the image horizontally for a selfie-view display.
    cv2.imshow('MediaPipe Pose', cv2.flip(image, 1))
    if cv2.waitKey(5) & 0xFF == 27:
      break
cap.release()