一、背景
心血来潮,想组装一台可以用于 deep learning 的电脑,于是开始了以下的探索之路。
二、装机
配件:
- CPU i7-8700
- CPU 散热器
- 技嘉 z370 主板
- 海盗船 16G x 2
- 技嘉 1060 GPU
- 500G SSD
- 3T HDD
- 海盗船 550w 电源
- 机箱
问题:
- 组装之后电脑屏幕不显示?
主板有检测各个模块是否装机正确的功能,我的在主板的右下角有四个灯,分别显示 CPU、内存、显卡、风扇是否安装正确的提示灯,根据说明书重新安装即可。
三、系统
1. 安装问题
本来想安装 Ubuntu 19,但是安装之后黑屏,无法打开 grub,被迫装回 18.04,真是坑爹。
有网友说 安装Linux之前先关闭Security Boot。
安装过程中,由于使用的是 NVIDIA 的显卡,装机和开机过程中会出现黑屏无法进入系统:解决方案:进入 GRUB,按 e 进行编辑,找到 quiet splash
,在之前添加 nomodeset
, ctrl + x
重新进行安装,在第一次开机过程中同样需要设置,进入系统后,打开 terminal,编辑 /etc/default/grub
文件,进行同样的修改,更新 grub
,可执行 sudo update-grub
。
更换国内镜像源:
sudo vim /etc/apt/sources.list 或者
sudo apt edit-sources
deb http://mirrors.aliyun.com/ubuntu/ trusty main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ trusty-security main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ trusty-updates main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ trusty-proposed main restricted universe multiverse
deb http://mirrors.aliyun.com/ubuntu/ trusty-backports main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-security main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-updates main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-proposed main restricted universe multiverse
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-backports main restricted universe multiverse
sudo apt update
切记这里不要执行 sudo apt upgrade
2. 台式机网卡问题
由于台式机无法用网线连接网络,所以购买了无线网卡,安装发现缺少 make
、 gcc
等一堆东西,折腾之后心态爆炸,突然发现可以用与手机用 USB 连接共享网络的方式解决这个问题。
- 用 USB 与手机连接后,共享网络
- 由于我是用的是 rtl8812au,可以参考这篇文章
3. 安装 NVIDIA
设置 root 密码:
$ sudo passwd
安装
$ sudo apt purge nvidia-* //删除可能存在的已有驱动
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ ubuntu-drivers devices
$ sudo ubuntu-drivers autoinstall
$ sudo reboot
$ nvidia-smi // 测试是否安装正确
4. 安装依赖库
$ sudo apt install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-devlibgl1-mesa-glx
$ sudo apt install libglu1-mesa libglu1-mesa-dev
5. 降低 GCC 版本
cuda 9.0 编译安装只支持gcc、g++ 6.0 及以下的版本,Ubuntu18.04系统默认安装了gcc7.0以上的版本,所以,就需要降级了。
$ sudo apt install gcc-5
$ sudo apt install g++-5
$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50
$ g++ -v
$ gcc -v
6. 安装 CUDA9.0
注意:在安装过程中会提示是否需要安装显卡驱动,在这里要选择n,其他的选择y或者回车键进行安装。
$ sudo sh cuda_9.0.176_384.81_linux.run
$ sudo sh cuda_9.0.176.1_linux.run
$ sudo sh cuda_9.0.176.2_linux.run
$ sudo sh cuda_9.0.176.3_linux.run
$ sudo sh cuda_9.0.176.4_linux.run
设置环境变量
$ sudo vi ~/.bashrc
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda
# 运行命令
$ source ~/.bashrc
# 重新系统
$ sudo reboot
# 测试是否安装成功
$ cd ~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery
$ make -j4
$ sudo ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 6GB"
CUDA Driver Version / Runtime Version 10.2 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6075 MBytes (6370295808 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1810 MHz (1.81 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS
7. 安装 CUDNN7.4.1
$ tar -zxvf cudnn-9.0-linux-x64-v7.4.1.5.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
8. 安装 pip
更新 pip 源
$ mkdir ~/.pip
$ vim pip.conf
[global]
trusted-host=mirrors.aliyun.com
index-url=https://mirrors.aliyun.com/pypi/simple/
9. 安装 tensorflow-gpu
$ pip install tensorflow-gpu==1.12.0 一定要安装 1.12.0,目前不支持最新的版本
# 测试
$ vim test.py
import tensorflow as tf
hello = tf.constant('first tensorflow')
sess = tf.Session()
print(sess.run(hello))
# 运行脚本,如果输出以下信息,则安装成功
$ python3 test.py
2019-06-09 22:01:38.163583: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-09 22:01:38.253557: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-09 22:01:38.253785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.8095
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.15GiB
2019-06-09 22:01:38.253800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-09 22:01:39.102096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-09 22:01:39.102124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-06-09 22:01:39.102129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-06-09 22:01:39.102196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4906 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
b'first tensorflow'
10. 安装 pytorch
$ pip install torch
$ pip install torchvision
# 测试
$ python3
>>> import torch
>>> print(torch.cuda.is_available())
其余可安装的软件
- vscode
- jetbrains-toolbox 套件
- Shadowsocks
- albert
11. 如何将 Shadowsocks 转为 http_proxy
$ sudo apt install polipo
$ sudo vim /etc/polipo/config
socksParentProxy = "127.0.0.1:1080"
socksProxyType = "socks5"
proxyAddress = "127.0.0.1"
proxyPort = 8123
$ sudo service polipo start
$ http_proxy=http://localhost:8123 curl www.google.com