Ubuntu 深度学习装机过程

从装机开始 Ubuntu

Posted by ZYT on June 5, 2019

一、背景

心血来潮,想组装一台可以用于 deep learning 的电脑,于是开始了以下的探索之路。

二、装机

配件:

  • CPU i7-8700
  • CPU 散热器
  • 技嘉 z370 主板
  • 海盗船 16G x 2
  • 技嘉 1060 GPU
  • 500G SSD
  • 3T HDD
  • 海盗船 550w 电源
  • 机箱

问题:

  1. 组装之后电脑屏幕不显示?

主板有检测各个模块是否装机正确的功能,我的在主板的右下角有四个灯,分别显示 CPU、内存、显卡、风扇是否安装正确的提示灯,根据说明书重新安装即可。

三、系统

1. 安装问题

本来想安装 Ubuntu 19,但是安装之后黑屏,无法打开 grub,被迫装回 18.04,真是坑爹。

有网友说 安装Linux之前先关闭Security Boot

安装过程中,由于使用的是 NVIDIA 的显卡,装机和开机过程中会出现黑屏无法进入系统:解决方案:进入 GRUB,按 e 进行编辑,找到 quiet splash,在之前添加 nomodesetctrl + x 重新进行安装,在第一次开机过程中同样需要设置,进入系统后,打开 terminal,编辑 /etc/default/grub 文件,进行同样的修改,更新 grub,可执行 sudo update-grub

更换国内镜像源:

sudo vim /etc/apt/sources.list 或者
sudo apt edit-sources

deb http://mirrors.aliyun.com/ubuntu/ trusty main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ trusty-security main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ trusty-updates main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ trusty-proposed main restricted universe multiverse  
deb http://mirrors.aliyun.com/ubuntu/ trusty-backports main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ trusty main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-security main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-updates main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-proposed main restricted universe multiverse  
deb-src http://mirrors.aliyun.com/ubuntu/ trusty-backports main restricted universe multiverse

sudo apt update

切记这里不要执行 sudo apt upgrade

2. 台式机网卡问题

由于台式机无法用网线连接网络,所以购买了无线网卡,安装发现缺少 makegcc 等一堆东西,折腾之后心态爆炸,突然发现可以用与手机用 USB 连接共享网络的方式解决这个问题。

  1. 用 USB 与手机连接后,共享网络
  2. 由于我是用的是 rtl8812au,可以参考这篇文章

3. 安装 NVIDIA

设置 root 密码:

$ sudo passwd

安装

$ sudo apt purge nvidia-*  //删除可能存在的已有驱动
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ ubuntu-drivers devices
$ sudo ubuntu-drivers autoinstall

$ sudo reboot

$ nvidia-smi   // 测试是否安装正确

4. 安装依赖库

$ sudo apt install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-devlibgl1-mesa-glx 
$ sudo apt install libglu1-mesa libglu1-mesa-dev 

5. 降低 GCC 版本

cuda 9.0 编译安装只支持gcc、g++ 6.0 及以下的版本,Ubuntu18.04系统默认安装了gcc7.0以上的版本,所以,就需要降级了。

$ sudo apt install gcc-5 
$ sudo apt install g++-5

$ sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 50
$ sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-5 50

$ g++ -v
$ gcc -v

6. 安装 CUDA9.0

NVIDIA 官网

注意:在安装过程中会提示是否需要安装显卡驱动,在这里要选择n,其他的选择y或者回车键进行安装。

$ sudo sh cuda_9.0.176_384.81_linux.run
$ sudo sh cuda_9.0.176.1_linux.run
$ sudo sh cuda_9.0.176.2_linux.run
$ sudo sh cuda_9.0.176.3_linux.run
$ sudo sh cuda_9.0.176.4_linux.run

设置环境变量

$ sudo vi ~/.bashrc
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-9.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

# 运行命令
$ source ~/.bashrc

# 重新系统
$ sudo reboot

# 测试是否安装成功
$ cd ~/NVIDIA_CUDA-9.0_Samples/1_Utilities/deviceQuery
$ make -j4
$ sudo ./deviceQuery
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 1060 6GB"
  CUDA Driver Version / Runtime Version          10.2 / 9.0
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 6075 MBytes (6370295808 bytes)
  (10) Multiprocessors, (128) CUDA Cores/MP:     1280 CUDA Cores
  GPU Max Clock rate:                            1810 MHz (1.81 GHz)
  Memory Clock rate:                             4004 Mhz
  Memory Bus Width:                              192-bit
  L2 Cache Size:                                 1572864 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 9.0, NumDevs = 1
Result = PASS

7. 安装 CUDNN7.4.1

NVIDIA CUDNN 官网

$ tar -zxvf cudnn-9.0-linux-x64-v7.4.1.5.tgz
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

8. 安装 pip

更新 pip 源

$ mkdir ~/.pip
$ vim pip.conf
[global]
trusted-host=mirrors.aliyun.com
index-url=https://mirrors.aliyun.com/pypi/simple/

9. 安装 tensorflow-gpu

$ pip install tensorflow-gpu==1.12.0 一定要安装 1.12.0,目前不支持最新的版本

# 测试
$ vim test.py
import tensorflow as tf

hello = tf.constant('first tensorflow')
sess = tf.Session()
print(sess.run(hello))

# 运行脚本,如果输出以下信息,则安装成功
$ python3 test.py
2019-06-09 22:01:38.163583: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-09 22:01:38.253557: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-09 22:01:38.253785: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.8095
pciBusID: 0000:01:00.0
totalMemory: 5.93GiB freeMemory: 5.15GiB
2019-06-09 22:01:38.253800: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-06-09 22:01:39.102096: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-09 22:01:39.102124: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-06-09 22:01:39.102129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-06-09 22:01:39.102196: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4906 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
b'first tensorflow'

10. 安装 pytorch

$ pip install torch
$ pip install torchvision

# 测试
$ python3
>>> import torch
>>> print(torch.cuda.is_available())

其余可安装的软件

  • vscode
  • jetbrains-toolbox 套件
  • Shadowsocks
  • albert

11. 如何将 Shadowsocks 转为 http_proxy

$ sudo apt install polipo
$ sudo vim /etc/polipo/config
socksParentProxy = "127.0.0.1:1080"
socksProxyType = "socks5"
proxyAddress = "127.0.0.1"
proxyPort = 8123

$ sudo service polipo start
$ http_proxy=http://localhost:8123 curl www.google.com