356 views

【1】やりたいこと

Ubuntu24.04に CUDA 12.8をインストールしたので、これを使ってパソコンに搭載している GPUの情報を表示してみたい。

使用するプログラムは毎度おなじみの deviceQuery
NVIDIA公式の CUDAサンプルプログラムの一つだ。

【2】やってみる

Step 1/3: CUDAサンプルを入手＆展開する。

下記のサイトから zipファイルをダウンロードする。

あらかじめ展開先のディレクトリを作り、そこへダウンロードすると楽だ。
以下は、 /home/{user}/cuda-samples にインストールした例だ。

$ mkdir -p ~/cuda-samples
$ cd ~/cuda-samples
$ wget https://github.com/NVIDIA/cuda-samples/archive/refs/heads/master.zip

ダウンロードしたファイルを解凍する。

$ unzip master.zip
$ ls -l
drwxrwxr-x 6 hoge hoge      4096 May 23 03:43 cuda-samples-master/

cuda-samples-master なるディレクトリが出現した。この中に色々と入っている。

Step 2/3: deviceQueryプログラムを作る。

ZIPファイルを展開しただけでは、deviceQueryプログラムは使えない。

プログラムソースコードが提供されただけなので、
これを自分の環境でコンパイル、リンクし、deviceQueryプログラムを作る必要がある。

以下、一つずつそれを実行していく。

まずは Samplesディレクトリへ移動し、中身を見てみる。

$ cd cuda-samples-master/Samples/
$ ls -l
total 48
drwxrwxr-x 11 hoge hoge 4096 May 23 03:43 ./
drwxrwxr-x  6 hoge hoge 4096 May 23 03:43 ../
drwxrwxr-x 48 hoge hoge 4096 May 23 03:43 0_Introduction/
drwxrwxr-x  5 hoge hoge 4096 May 23 03:43 1_Utilities/
drwxrwxr-x 34 hoge hoge 4096 May 23 03:43 2_Concepts_and_Techniques/
drwxrwxr-x 26 hoge hoge 4096 May 23 03:43 3_CUDA_Features/
drwxrwxr-x 36 hoge hoge 4096 May 23 03:43 4_CUDA_Libraries/
drwxrwxr-x 38 hoge hoge 4096 May 23 03:43 5_Domain_Specific/
drwxrwxr-x  7 hoge hoge 4096 May 23 03:43 6_Performance/
drwxrwxr-x 11 hoge hoge 4096 May 23 03:43 7_libNVVM/
drwxrwxr-x  3 hoge hoge 4096 May 23 03:43 8_Platform_Specific/
-rw-rw-r--  1 hoge hoge  867 May 23 03:43 CMakeLists.txt

カテゴリごとに色々と入っている。
deviceQueryがどこに入っているのか探してみる。

$ find . -type d -name deviceQuery
./1_Utilities/deviceQuery

ディレクトリ 1_Utilities/deviceQuery の中にいるので、ここへ移動する。

$ cd 1_Utilities/deviceQuery
$ ls -l
total 36
drwxrwxr-x 3 hoge hoge  4096 May 23 03:43 ./
drwxrwxr-x 5 hoge hoge  4096 May 23 03:43 ../
-rw-rw-r-- 1 hoge hoge  1239 May 23 03:43 CMakeLists.txt
-rw-rw-r-- 1 hoge hoge 14774 May 23 03:43 deviceQuery.cpp
-rw-rw-r-- 1 hoge hoge  1503 May 23 03:43 README.md
drwxrwxr-x 2 hoge hoge  4096 May 23 03:43 .vscode/

CMakeでビルドするように提供されているので、まずはビルドツール類をインストールする。

$ sudo apt update
$ sudo apt install build-essential
$ sudo apt install cmake

CMakeではビルドディレクトリを分けておくのがお作法なので（＝元ディレクトリを汚さないため）、
直下に buildディレクトリを作って、そこへ移動して作業する。

$ mkdir build
$ cd build
$ cmake ..
-- The C compiler identification is GNU 13.3.0
-- The CXX compiler identification is GNU 13.3.0
-- The CUDA compiler identification is NVIDIA 12.8.93
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-12.8/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-12.8/targets/x86_64-linux/include (found version "12.8.93")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Configuring done (1.9s)
-- Generating done (0.0s)
-- Build files have been written to: /home/hoge/cuda-samples/cuda-samples-master/Samples/1_Utilities/deviceQuery/build

cmakeを実行し、ビルド環境が生成された。
いよいよ本番だ、makeする。

$ make
[ 50%] Building CXX object CMakeFiles/deviceQuery.dir/deviceQuery.cpp.o
[100%] Linking CXX executable deviceQuery
[100%] Built target deviceQuery
$
$ ls -l
total 88
drwxrwxr-x 3 hoge hoge  4096 May 23 06:23 ./
drwxrwxr-x 4 hoge hoge  4096 May 23 06:19 ../
-rw-rw-r-- 1 hoge hoge 32119 May 23 06:19 CMakeCache.txt
drwxrwxr-x 6 hoge hoge  4096 May 23 06:23 CMakeFiles/
-rw-rw-r-- 1 hoge hoge  1754 May 23 06:19 cmake_install.cmake
-rwxrwxr-x 1 hoge hoge 32616 May 23 06:23 deviceQuery*
-rw-rw-r-- 1 hoge hoge  5654 May 23 06:19 Makefile

deviceQueryが出力された。

Step 3/3: deviceQueryを実行する。

上記の Step2 で作った deviceQueryプログラムを実行するだけだ。

$ ./deviceQuery
./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 5070 Ti"
  CUDA Driver Version / Runtime Version          12.8 / 12.8
  CUDA Capability Major/Minor version number:    12.0
  Total amount of global memory:                 15842 MBytes (16611999744 bytes)
  (070) Multiprocessors, (128) CUDA Cores/MP:    8960 CUDA Cores
  GPU Max Clock rate:                            2482 MHz (2.48 GHz)
  Memory Clock rate:                             14001 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 50331648 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 2 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.8, CUDA Runtime Version = 12.8, NumDevs = 1
Result = PASS

我が家の GeForce RTX 5070ti の情報が表示された。

CUDA Capability	12.0
Total amount of global memory	15842 MBytes
CUDA Cores	8960 CUDA Cores
GPU Max Clock rate	2482 MHz (2.48 GHz)

などなど。

11年前の過去記事 (22) cuda-convnet2はやってみれないでは
CUDA Compute Capabilityが 3.5に満たないために、動かしたいプログラムを動かせなかった苦々しい記憶がある。

あの頃は GTX 780 が高嶺の花だった。
今は Compute Capabilityが 12.0 なのか…

時代の流れを強く感じる今日この頃です。。。

【3】他のサンプルもやってみる

1) Mandelbrot

フラクタル図形「マンデルブロ集合」を CUDA で高速描画するサンプルだ。
Zoom inしてどこまでも無限に続く幾何学模様を楽しめる。

図形を Zoom inすると、リアルタイムで再計算して描画更新してくれる。
一瞬で演算＆描画できてしまう速度性能は、さすがは並列計算が得意な GPUだ。

実行までの手順は以下の通り。

OpenGL + GLUT 開発用ライブラリが無ければインストールしておく。

$ sudo apt update
$ sudo apt install libgl1-mesa-dev freeglut3-dev

先ほどの deviceQueryと同様に、cmake → make を実行する。

$ cd ~/cuda-samples/cuda-samples-master/Samples/5_Domain_Specific/Mandelbrot
$ mkdir build
$ cd build
$ cmake ..
$ make

実行体 Mandelbrot が出来上がっている。

$ ls -l
total 4492
drwxrwxr-x 4 hoge hoge    4096 May 23 07:55 ./
drwxrwxr-x 6 hoge hoge    4096 May 23 07:51 ../
-rw-rw-r-- 1 hoge hoge   37568 May 23 07:55 CMakeCache.txt
drwxrwxr-x 5 hoge hoge    4096 May 23 07:55 CMakeFiles/
-rw-rw-r-- 1 hoge hoge    1764 May 23 07:51 cmake_install.cmake
drwxrwxr-x 2 hoge hoge    4096 May 23 07:55 data/
-rw-rw-r-- 1 hoge hoge    7449 May 23 07:55 Makefile
-rwxrwxr-x 1 hoge hoge 4526328 May 23 07:55 Mandelbrot*

実行！

$ ./Mandelbrot

一瞬、0.1秒も経たずにこんな画像が表示された。

拡大する。（画像上でマウスホイールを押しながら上方向移動）

更に拡大する。

Zoom inしてどこまでも無限に続く幾何学模様を楽しめる。

でも…
3Dグリングリンなゲームに日ごろから接しているので、全然驚かない…

慣れるって怖いな。
刺激に気づかなくなってしまう。

アクセス数（直近7日）: ※試験運用中、BOT除外簡易実装済

2025-10-07: 4回

2025-10-06: 2回

2025-10-05: 1回

2025-10-04: 2回

2025-10-03: 1回

2025-10-02: 2回

2025-10-01: 0回

(47) RTX 5070tiで deviceQueryを実行してみる。

【1】やりたいこと

【2】やってみる

Step 1/3: CUDAサンプルを入手＆展開する。

Step 2/3: deviceQueryプログラムを作る。

Step 3/3: deviceQueryを実行する。

【3】他のサンプルもやってみる

1) Mandelbrot

コメントを残すコメントをキャンセル

【1】やりたいこと

【2】やってみる

Step 1/3: CUDAサンプルを入手＆展開する。

Step 2/3: deviceQueryプログラムを作る。

Step 3/3: deviceQueryを実行する。

【3】他のサンプルもやってみる

1) Mandelbrot

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル