Skip to content

Commit b3f9aec

Browse files
committed
refined ml-server post
1 parent f923125 commit b3f9aec

File tree

1 file changed

+53
-62
lines changed

1 file changed

+53
-62
lines changed

collections/blog/Computers/_posts/2025-04-03-building-private-ml-server.md

Lines changed: 53 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ slider1:
1111
- url: /img/private-ml-server/server/server5.jpg
1212
- url: /img/private-ml-server/server/server6.jpg
1313

14+
slider2:
15+
- url: /img/private-ml-server/horse-astronaut-good.png
16+
- url: /img/private-ml-server/horse-astronaut-good-2.png
17+
1418
---
1519

1620
In an attempt to better understanding machine learning (ML) software, it seemed like a good idea to set up a private server to run software like **PyTorch**, **TensorFlow,** and even **large language models** or **text-to-image generators**. These are among the most prevalent applications of machine learning, so it made sense to build a computer capable of executing such tasks to gain hands-on experience and deeper knowledge on the subject.
@@ -22,8 +26,7 @@ In an attempt to better understanding machine learning (ML) software, it seemed
2226

2327
To achieve this, a computer with sufficient computational power was necessary to these programs in a reasonable timeframe. While the majority of modern computers can run these types of programs, their **performance** depends on the available processing power. The processor can be used to perform machine learning computations, but it will be relatively slow. Some of this can be circumvented by using a processor with multiple cores and threads to enhance the ability to perform parallel computations.
2428

25-
See:
26-
- [What is a GPU?](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html)
29+
See: [GPU vs CPU](https://www.intel.com/content/www/us/en/products/docs/processors/what-is-a-gpu.html)
2730

2831
However, the primary component involved in running machine learning software is a **graphics card** (GPU). These cards handle the overwhelming bulk of the ML computations because they excel at matrix multiplications. Consequently, besides the raw computation power of a GPU, the video memory (**V-RAM**) of the graphics card becomes a crucial factor. This dedicated memory, separate from system **RAM**, is specifically designed for the graphics card's operation.
2932

@@ -36,12 +39,12 @@ Having ample VRAM allows more complex programs to be loaded into the graphics ca
3639
Based on these considerations, should sufficient VRAM be available, the computational prowess of the graphics card becomes the primary concern. Newer graphics cards generally require less power for equivalent performance, while the performance of an individual graphics card is directly tied to the power it uses; more power draw typically means better performance. This results in a compromise between desired computational power and the computer's physical ability to deliver power.
3740

3841
### Selecting Hardware
39-
I opted for an affordable desktop computer from a surplus store, specifically a used **Dell T-1700**. It featured a **4th-gen Xeon E1271 v3** CPU with 4 cores and 8 threads with 8Mb cache, and **32GB DDR3 RAM**. By 2025 standards, these specifications are modest, wherein the Xeon CPU released in ~2013 is comparable to 10th generation i3-10100f released in 2020. Given these specifications, the $50 price for the entire computer was well worth it.
42+
I opted for an affordable desktop computer from a surplus store, specifically a used **Dell T-1700**. It featured a **4th-gen Xeon E3-1271v3** CPU with 4 cores and 8 threads with 8Mb cache, and **32GB DDR3 RAM**. By 2025 standards, these specifications are modest, wherein the Xeon CPU released in ~2013 is comparable to 10th generation i3-10100f released in 2020. Given these specifications, the $50 price for the entire computer was well worth it.
4043

4144
- [Dell Precision T-1700](https://i.dell.com/sites/doccontent/shared-content/data-sheets/en/Documents/Dell-Precision-T1700-Spec-Sheet-tab.pdf)
42-
- [Xeon E1271 v3](https://www.intel.com/content/www/us/en/products/sku/80908/intel-xeon-processor-e31271-v3-8m-cache-3-60-ghz/specifications.html)
45+
- [Xeon E3-1271v3](https://www.intel.com/content/www/us/en/products/sku/80908/intel-xeon-processor-e31271-v3-8m-cache-3-60-ghz/specifications.html)
4346
- [Core i3-10100f](https://www.intel.com/content/www/us/en/products/sku/203473/intel-core-i310100f-processor-6m-cache-up-to-4-30-ghz/specifications.html)
44-
- [E1271-v3 VS i3-10100f](https://www.cpubenchmark.net/compare/Intel-Core-i3-10100F-vs-Intel-Core-i7-4770K/3863vs1919)
47+
- [Comparison: E3-1271v3 VS i3-10100f](https://www.cpubenchmark.net/compare/Intel-Core-i3-10100F-vs-Intel-Core-i7-4770K/3863vs1919)
4548

4649
Since machine learning algorithms rely heavily on graphics cards, processor and system RAM limitations become less critical compared to the graphics card's capabilities. To this end, the proper choice of graphics card was critical. The Dell T-1700's power supply could output _325 watts_, with other components consuming around 150 watts (including a maximum of 85 watts for the processor). This left approximately _175 watts_ of usable excess power for the graphics card, setting a threshold for its maximum power consumption.
4750

@@ -76,95 +79,83 @@ Having considered all factors, I bought a used GTX 1660 online for approximately
7679
#### Hard drives
7780
Next, I acquired a solid-state drive (SSD) to replace the noisy 250GB hard drive. I opted for a **Samsung 850 Evo 250GB SATA SSD** with **DRAM**, which ensures consistent read/write speeds superior to DRAM-less alternatives. Additionally, I purchased a **Micron MTFDDAK512TBN** 512GB SATA SSD, also with DRAM, intended to work alongside the smaller SSD. The 250GB drive stored the operating system, while the 512GB drive held user data (mainly in the home folder). The old hard drive served as a backup for system restoration if needed.
7881

79-
https://www.harddrivebenchmark.net/hdd.php?hdd=Samsung+SSD+850+EVO+250GB
80-
https://www.harddrivebenchmark.net/hdd.php?hdd=Micron%201100%20MTFDDAK512TBN
81-
82-
https://computercity.com/hardware/storage/dram-vs-dram-less-ssd
83-
84-
https://oemdrivers.com/network-tp-link-archer-t6e
82+
See:
83+
- [Samsung 850 Evo 250GB](https://www.harddrivebenchmark.net/hdd.php?hdd=Samsung+SSD+850+EVO+250GB)
84+
- [Micron MTFDDAK512TBN](https://www.harddrivebenchmark.net/hdd.php?hdd=Micron%201100%20MTFDDAK512TBN)
85+
- [DRAM vs DRAM-less Solid state drives](https://computercity.com/hardware/storage/dram-vs-dram-less-ssd)
8586

86-
To enable wireless connectivity, I installed a **TP-Link T6E PCIe-X1 Wi-Fi card** compatible with both 2.4GHz and 5GHz connections. This adapter was inserted into one of the remaining PCIe slots in the Dell T-1700, offering better performance and stronger connections compared to USB alternatives.
87+
To enable wireless connectivity, I installed a **TP-Link T6E PCIe Wi-Fi card** compatible with both 2.4GHz and 5GHz connections. This adapter was inserted into one of the remaining PCIe slots in the Dell T-1700, offering better performance and stronger connections compared to USB alternatives.
8788

8889
The total cost for the computer, including components and software, was approximately $180-$190. The final configuration had an 8-thread processor with 32GB RAM, a 6GB VRAM GPU, and about 1TB of storage. I chose **Linux Mint** with the **XFCE** desktop environment to minimize resource usage by the graphical interface. Essentially, this means the computer runs a lightweight version of **Ubuntu 24.04 LTS**.
8990

9091
### Python AI packages
9192
With the hardware assembled, I proceeded to install the machine learning / AI software:
9293

93-
1. **Miniconda**: To manage Python environments effectively.
94+
1. [Miniconda](https://www.anaconda.com/docs/getting-started/miniconda/main): To manage Python environments effectively.
9495
2. Proprietary **Nvidia drivers**: For optimal graphics card performance on Linux.
95-
3. **CUDA** (per-environment): Instead of a global installation, I installed CUDA individually for each Python environment to ensure compatibility with specific AI software like TensorFlow.
96-
4. **Jupyter Notebook**: For running sessions involving TensorFlow and PyTorch.
96+
3. [CUDA](https://developer.nvidia.com/cuda-toolkit) (per-environment): Instead of a global installation, I installed CUDA individually for each Python environment to ensure compatibility with specific AI software like TensorFlow.
97+
4. [Jupyter Notebook](https://jupyter.org/install): For running sessions involving TensorFlow and PyTorch.
9798
5. TensorFlow environment: Created a separate conda environment to run the latest TensorFlow version, automatically pulling the required CUDA version for Turing GPUs.
9899
6. PyTorch environment: Established a dedicated conda environment for PyTorch installation and usage, which was straightforward and worked out of the box.
99-
7. **Ollama** and **Open-WebUI**: I setup a separate conda environment for running large language models locally and accessing a graphical user interface through Open-WebUI (a Python package). Ollama was installed at the user level and handled running the LLMs.
100+
7. [Ollama](https://ollama.com/download) and [Open-WebUI](https://docs.openwebui.com/): I setup a separate conda environment for running large language models locally and accessing a graphical user interface through Open-WebUI (a Python package). Ollama was installed at the user level and handled running the LLMs.
100101

101-
https://ollama.com/download
102-
https://docs.openwebui.com/
103-
104-
https://www.anaconda.com/docs/getting-started/miniconda/main
105-
https://jupyter.org/install
106-
https://developer.nvidia.com/cuda-toolkit
107102

108103
This setup allowed me to utilize both TensorFlow and PyTorch efficiently, with PyTorch's simpler installation and code being particularly advantageous. The local execution of large language models using Ollama was facilitated by the open web UI, encapsulated within a conda environment to keep each configuration organized.
109104

110105
### Stable diffusion
111-
As the final software installation, I added **Automatic1111's GUI for Stable Diffusion**, although its importance was less critical for my primary interest in PyTorch-based application development. Despite occasional instability and the need for additional parameter tweaking, it was fascinating to observe the computer generate images using text commands.
112-
113-
https://github.com/AUTOMATIC1111/stable-diffusion-webui
106+
As the final software installation, I added **Automatic1111's WebUI** for Stable Diffusion, although its importance was less critical for my primary interest in PyTorch-based application development. Despite occasional instability and the need for additional parameter tweaking, it was fascinating to observe the computer generate images using text commands.
114107

115-
1. **Create a new conda environment**:
116-
```bash
117-
conda create -n auto1111 python=3.10.6
118-
conda activate auto1111
119-
```
108+
{% include image-slider.html list=page.slider2 aspect_ratio="1/1" %}
109+
<p align="center"><i>Images generated by the PC with the prompt: "Astronaut riding a horse""</i></p>
120110
121-
2. **Install PyTorch with CUDA support**:
122-
```bash
123-
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
124-
```
111+
See:
112+
- [Stable Diffusion WebUI](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
113+
- [Using conda for automatic1111](https://www.reddit.com/r/StableDiffusion/comments/11z9wmk/managing_with_your_python_environment_using_conda/)
125114
126-
3. **Clone the Automatic1111 repository**:
127-
```bash
128-
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
129-
cd stable-diffusion-webui
130-
```
115+
To run the WebUI server from a conda environment, I used the following commands:
116+
```bash
117+
# Create python environment
118+
conda create -n sdwebui python=3.10.6
119+
conda activate sdwebui
131120
132-
4. **Install required dependencies**:
133-
```bash
134-
pip install -r requirements.txt
135-
```
121+
# Install webui
122+
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
123+
cd stable-diffusion-webui
124+
pip install -r requirements.txt
136125
126+
# Run webui (required every time to start server)
127+
python launch.py
128+
```
137129
138-
6. **Launch the web UI**:
139-
```bash
140-
python launch.py
141-
```
142-
143-
Credit goes to _Deekspeek_ for the above commands. Other approximate solutions can be found in the references section.
130+
Credit goes to [DeepSeek](https://www.deepseek.com/) for the above commands and the above reference.
144131
145132
### Remote access
146-
To enable remote access, I installed **Tailscale**, a free software solution that offers an extremely simple VPN configuration, allowing me to connect to the server and execute AI software remotely. This setup resulted in an affordable device accessible anywhere with internet, capable of running heavy computational tasks via Jupyter Notebook sessions or processing large language models for information retrieval or document analysis, which would otherwise be impossible on a less powerful laptop due to insufficient computational resources.
133+
To enable remote access, I installed **Tailscale** to get extremely simple to use VPN connection. This setup resulted in an server that was accessible anywhere that was capable of running computational tasks via JupyterLab, or that could run private large language models for information retrieval or document analysis. These are tasks that would otherwise be impossible on a much less powerful laptop due to insufficient computational resources.
147134
148-
See:
149-
- [Tailscale](https://tailscale.com/)
135+
See: [Tailscale](https://tailscale.com/)
150136
151137
152138
### LLM Models
153-
The large language models, despite the 6GB VRAM limitation of the graphics card, proved impressive with models like **mistral:7b** and **deepseek-r1:8b**. Response times ranged from a few seconds to around 20 seconds for commands. These models could access the internet for information retrieval, enhancing their capabilities. Ollama's flexibility allowed users to experiment with various models beyond popular ones like Deepseek or ChatGPT. Tailored models, such as **qwen2.5-coder:7b** for code assistance, could be easily selected using Ollama's GUI. Another notable model was IBM's **granite3.3:8b**, which also ran well on the GPU and provided concise, solid replies. This setup offered a powerful AI system with diverse use cases when given appropriate prompts.
139+
The large language models, despite the 6GB VRAM limitation of the graphics card, proved to be impressive with models like **mistral:7b** and **deepseek-r1:8b**. Response times ranged from a few seconds to around 20 seconds for very long prompts. Moreover, these models could be configured to access the internet for additional information, which greatly enhanced their capabilities. Ollama's flexibility allowed users to experiment with various models beyond popular ones like Deepseek or ChatGPT. Tailored models, such as **qwen2.5-coder:7b** for code generation, could be easily selected using Ollama's CLI interface, or via Open-webUI's GUI. Another notable model was IBM's **granite3.3:8b**, which also ran well on the GPU and consistently provided concise and precise replies. This multi-model setup created a generally useful AI system when the right models were given appropriate prompts.
154140
155141
See:
156-
https://ollama.com/library/deepseek-r1:8b
157-
https://ollama.com/library/mistral:7b
158-
https://ollama.com/library/granite3.3:8b
159-
https://ollama.com/library/qwen2.5-coder:7b
142+
- [deepseek-r1:8b](https://ollama.com/library/deepseek-r1:8b)
143+
- [mistral:7b](https://ollama.com/library/mistral:7b)
144+
- [granite3.3:8b](https://ollama.com/library/granite3.3:8b)
145+
- [qwen2.5-coder:7b](https://ollama.com/library/qwen2.5-coder:7b)
146+
160147
161148
### Conclusion
162-
This project proved highly educational, providing insights into small large language models' capabilities and hardware limitations for running such software. Key takeaways include:
149+
This project was highly educational and provided insights into the capabilities of small-size large language models. Some of the main takeaways include:
163150
164-
1. Nvidia graphics cards are recommended due to CUDA compatibility issues with AMD cards.
165-
2. Familiarity with Linux is essential, as it's the primary environment for configuring and using AI packages. Although other operating systems might support these tools, Linux offers a more straightforward setup.
166-
3. Prior research is crucial for making informed decisions when building a system capable of running AI software.
151+
1. Familiarity with Linux is essential, as it's the primary environment for configuring and using the AI/ML packages. Although other operating systems might support these tools, Linux offers a more straightforward setup.
152+
2. Prior research is crucial for making informed decisions when building a system capable of running AI software.
153+
3. Nvidia graphics cards are recommended to avoid compatibility issues with CUDA.
154+
4. The power supply is the main bottleneck when choosing a Graphics card.
155+
5. The CPU is mostly idle when computations are offloaded to the GPU.
156+
6. One can use a relatively weak CPU to run AI models so long as the graphics card is powerful enough and has enough VRAM.
167157
168-
In conclusion, this project demonstrates the potential of small language models and highlights the importance of understanding hardware limitations and preferred operating environments (Linux) when working with AI packages.
158+
Overall, this entire experience was an eyeopener, and demonstrated the importance of understanding hardware limitations when working with AI/ML packages.
169159
160+
### References
170161

0 commit comments

Comments
 (0)