Projects & Learnings

Technical notes, experiments and updates

lanseg@proton.me - github.com/lanseg - plans and projects

,

A note on analyzing Epstein files

epstein
face recognition
pdf

So, the FBI released another bunch of the Epstein files: hundreds of the PDF documents with the emails, flight logs, chat extracts and photos - too much for me to handle, especially when I don't have a powerful GPU that can run advanced multimodal LLMs.

But I have plenty of space and these preparations were really helpful:

  1. Split PDF documents into text and pictures.
  2. Detect faces: find photos with faces.
  3. Generate vectors for faces and cluster them.
Step Number of files Duration
Splitting PDFs 346485 PDFs → 557466 pngs ~10 minutes
Finding faces 557466 pngs -> 1022 pngs with faces ~16 hours
Clustering faces 1022 pngs -> 92 clusters ~3 minutes

Both face detection and clustering are straightforward and could be done by simply using face_recognition (pypi) and DBSCAN from scikit-learn. The only tuning I did was adding n_jobs=-1 to let DBSCAN use all the CPU cores.

Initial issues with the scripts

The first version of the script was not very good. It had some obvious things that could be considered issues, like copying files instead of symlinking, but I had reasons to do it that way. But there were things I didn't notice:

So now I have a better version: instead of copying files and writing the metadata to JSONs, it stores those in an SQLite database, without copying or symlinking the files. I use the database to make queries like "what people appear on the same photos as this person?"

The scripts: split_pdf.py, find_faces.py. db.py.

A note on integrating self-hosted AI models integration

ollama
firejail
docker
selfhosted

Prev: Note on self-hosted isolated AI

It's quite easy to isolate an Ollama instance: just run it inside a firejail sandbox with network disabled. If you need a script that interacts with the model, you can run it in the same sanbox wit the "--join" flag. But what if you want to make Ollama available for other applications like IDEs or agent, while keep it isolated from the internet?

Docker

The simplest way to achieve this is to run Ollama inside a docker container with an internal network (that will block internet access by default) and create a socat gateway which can see both the internal and host networks.
networks: ollama-local: driver: bridge # internal: true external: driver: bridge services: ollama: image: ollama/ollama:rocm environment: - OLLAMA_DEBUG=1 - OLLAMA_NUM_THREADS=15 # nproc - 1 volumes: - ./ollama_home:/root/.ollama devices: - /dev/kfd - /dev/dri networks: - ollama-local gateway: image: alpine/socat:latest command: "TCP-LISTEN:11434,fork,reuseaddr TCP:ollama:11434" depends_on: - ollama ports: - "11434:11434" networks: - ollama-local - external
docker-compose.yaml Docker compose to run an isolated Ollama instance.

Firejail with a limited network access

Less overhead, but more manual setup and it also requires extra permissions to apply the firewall configurations. While the firejail profile stays simple, you will also need a netfilter ruleset and the script that sets up and removes a bridge interface.

OLLAMA_HOST=10.10.20.2 firejail --profile=./ollama.profile ./ollama serve
Starting ollama: same as before, but with the custom host

The script is not universal, so pay attention to what you paste and check if it interferes with your existing network.

#!/bin/bash set -euo pipefail IFNAME="enp1s0f0" BRNAME="firebridge" BRADDR="10.10.20.2" if [[ "$1" == "up" ]]; then brctl addbr $BRNAME ip addr add 10.10.20.1/24 dev $BRNAME ip link set $BRNAME up iptables -t nat -A POSTROUTING -o $IFNAME -s 10.10.20.0/24 -j MASQUERADE iptables -t nat -A OUTPUT -m addrtype --src-type LOCAL --dst-type LOCAL -p tcp --dport 11434 -j DNAT --to-destination $BRADDR:11434 iptables -t nat -A POSTROUTING -m addrtype --src-type LOCAL --dst-type UNICAST -p tcp -d $BRADDR --dport 11434 -j MASQUERADE sysctl -w net.ipv4.conf.all.route_localnet=1 elif [[ "$1" == "down" ]]; then iptables -t nat -D POSTROUTING -o $IFNAME -s 10.10.20.0/24 -j MASQUERADE iptables -t nat -D OUTPUT -m addrtype --src-type LOCAL --dst-type LOCAL -p tcp --dport 11434 -j DNAT --to-destination $BRADDR:11434 2>/dev/null iptables -t nat -D POSTROUTING -m addrtype --src-type LOCAL --dst-type UNICAST -p tcp -d $BRADDR --dport 11434 -j MASQUERADE 2>/dev/null sysctl -w net.ipv4.conf.all.route_localnet=0 ip link set $BRNAME down brctl delbr $BRNAME else echo "Usage: $0 {up|down}" exit 1 fi
bridge.sh Script to set up/clean up a bridge and packet forwarding.
A strict firejail profile: use custom home folder and custom network configuration:
name ollama net firebridge ip 10.10.20.2 netfilter ./ollama.netfilter private /home/arusakov/devel/c2c/local-agent/local-agent-firejail/ollama_home
ollama.profile Firejail profile to run an isolated Ollama instance.
Netfilter profile: block everything except the default ollama port:
*filter :INPUT DROP [0:0] :FORWARD DROP [0:0] :OUTPUT DROP [0:0] -A INPUT -i lo -j ACCEPT -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT -A INPUT -s 10.10.20.1 -p tcp --dport 11434 -m conntrack --ctstate NEW -j ACCEPT -A OUTPUT -o lo -j ACCEPT -A OUTPUT -d 10.10.20.1 -j ACCEPT -A OUTPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT COMMIT
ollama.netfilter Netfilter ruleset for the firejail profile.

A note on self-hosted isolated AI models

ollama
firejail
selfhosted

Next: Note on self-hosted AI integration

It's always better to have a machine that is completely disconnected from the internet when you work on some sensitive data. Configuring a firewall ruleset or a virtual machine with a proper passthrough settings is also good option. But if it's not a top secret intelligence data, there is a simpler option with an acceptable overhead and privacy level: firejail with a shared network:

Quick example: firejail with network disabled

Ollama sandbox and client share a network, so they can only connect to each other:
# Let ollama to save models into our home directory export OLLAMA_MODELS=/path/to/our/ml-models/ollama/ # Installing models (if needed): path/to/ollama pull llava3 # Or llama3.2-vision, gemma3, etc # Start ollama in a sandbox with a custom sandbox name "ollama" firejail --noprofile --net=none --name=ollama path/to/ollama serve # Join an existing sandbox and sending a command to ollama firejail --noprofile --net=none --join=ollama \ curl -X POST http://localhost:11434/api/chat \ -H "Content-Type: application/json" \ -d '{...}'
ollama-run.txt Installing ollama, no network restrictions here.

A note on deobfuscation

java
android
reverse engineering

Using MAX messenger for an autopsy is a kind of special thing now and while the messenger itself could be boring and unoriginal, the process could be still educational.

Comparing two versions

The defpackage folder after the jadx still contains small files with random garbage names and you typically cannot do much about it. Identifying files or classes that are identical except for their names is simple, but the sheer volume makes manual or semi-automated (diff between each two files) makes the approach impractical. My approach, while still straightforward, was quite effective:

  1. Load all files from both versions into memory: it was still less than several gigabytes. This step can be skipped if you have a quick enough SSD.
  2. Calculate locality sensitive hash and group candidates based on the hash similarity.
  3. Use a more precise similarity metric to confirm matches.
Comparison of the two defpackage folders with ~30000 files now takes roughly a minute, but without LSH filtering and parallelizing the same task takes more than an hour.

Hints on real class names

Obfuscators can shuffle the code and randomize the names, yet the subtle clues that reveal the true class and field names:

public final String toString() { return "NetworkState(isConnected=" + this.a + ", isValidated=" + this.b + ", isMetered=" + this.c + ", isNotRoaming=" + this.d + ')'; }
toString method of the class "xn9" of Max 25.8.1

And "xn9.java" becomes "NetworkState.java", "a" becomes "isConnected" and so on. And don't forget of name collisions when classes from different packages end up in the same defpackage.

Newer Python and yt-dlp for the Sailfish OS

Sailfish OS
Python
yt-dlp
Jolla

I moved to the Redeer C2, community edition with Sailfish OS. Not 100% voluntarily, but not against my will either. My phone's touch screen just broke, and I needed a replacement.

I still think that Sailfish OS is a nice system, and its user interface is the only mobile Linux UI that is customer-ready. Unfortunately, the system lacks many common applications, such as a video player (the music player exists and is okay for my needs) or a YouTube client. So my plan was to download videos with yt-dlp and view \ them locally.

Part 1: yt-dlp Python

What could be simpler than downloading and running yt-dlp from github or installing from pip? But my phone only had Python 3.8 (python3-base-3.8.18) and yt-dlp dropped support for it almost a year ago (#1132).

I was not looking for an easy way, so I decided to cross-compile python myself. But I was not looking for a dirty way either, so I decided to create an RPM package that could be installed independently from the original Python and removed later if needed.

Building and packaging

Let's assume you have already followed the Sailfish SDK installation instructions and have sfdk installed. Make sure to do all the stuff within a workspace, because Sailfish SDK uses a VirtualBox VM or Docker container as a build host and mounts workspace directory there.

~/devel/sailfish $ sfdk tools list SailfishOS-5.0.0.62 sdk-provided,latest ├── SailfishOS-5.0.0.62-aarch64 sdk-provided,latest ├── SailfishOS-5.0.0.62-armv7hl sdk-provided,latest └── SailfishOS-5.0.0.62-i486 sdk-provided,latest ~/devel/sailfish $ sfdk config --push target SailfishOS-5.0.0.62-aarch64 ~/devel/sailfish $ curl https://www.python.org/ftp/python/3.13.6/Python-3.13.6.tgz -O ~/devel/sailfish $ curl https://lanseg.github.io/2025-08-07/Python-3.13.6.spec -O ~/devel/sailfish $ sfdk --specfile ./Python-3.13.6.spec build --prepare ... Lot's of output... ~/devel/sailfish $ md5sum RPMS/* 4465b8ccfcd11838c93a2671d645f5e7 RPMS/Python-3.13.6-1.aarch64.rpm ab2ace4188f8e5fe5d0d9a6406392abb RPMS/Python-debuginfo-3.13.6-1.aarch64.rpm b73d814fec8e3cb3af708126f5220534 RPMS/Python-debugsource-3.13.6-1.aarch64.rpm

That should be enough: a somewhat optimized, somewhat stripped, complete python distribution ready for common tasks including downloading youtube videos with yt-dlp.

Name: Python Summary: Version 3.13 of the python interpreter Version: 3.13.6 Release: 1 License: Python-2.0.1 Source0: %{name}-%{version}.tgz URL: https://www.python.org Requires: openssl readline libuuid xz Buildrequires: openssl-devel readline-devel libuuid-devel xz-devel %description Python is an interpreted, interactive, object-oriented programming language. This package contains most of the need a programmable interface and standard Python modules. %prep tar -xvf %{name}-%{version}.tgz %build mkdir build cd build ../%{name}-%{version}/configure --prefix=/usr/local --enable-optimizations --with-openssl=/usr/ make %{?_smp_mflags} %install cd build DESTDIR=$RPM_BUILD_ROOT make install %files %defattr(-,root,root,-) /usr/local/
The RPM spec unpacks, compiles, and prepares a large package

Part 2: yt-dlp

With the latest stable Python and pip, it's enough to just install it from pip:

defaultuser $ python3 -m pip install -U "yt-dlp[default]"

If you are going to use yt-dlp, there is no need to create an RPM package.

Part 3: VLC

Next time - VLC, probably. It works, but requires some styling to look like other Sailfish apps.

Terminal with a long error log and vlc window without video
Terminal where I started VLC with lots of error logs and broken app window

Exploring MAX messenger

messengers
android
reverse engineering
MAX

MAX messenger is a government-forced Russian messenger that is planned to replace WhatsApp, Facebook messenger and other Western propaganda spreading machines. And the Telegram too. While there is nothing interesting about the interface, there could be something interesting inside so I installed the app on my Android emulator and started to explore it.

TL;DR

Nothing unexpected, yet another messenger with an unclear future. Not a KGB trojan surveillance app, just as private as any other messenger which is not focused on protecting your data. It also needs a working phone number to register, so you can forget about anonymity.

The only somewhat interesting thing is that it has a TamTam messenger inside, so it's probably based on a TamTam codebase.

Package details

Version information
Origin RuStore
Package ru.oneme.app
Version 25.7.1
Package content
md5:d88c78a92d75f0319af1a95b59e3867e 23M base.apk
md5:7ef573467d338b6411ceb80cd278f9cb 2.8M split_config.mdpi.apk
md5:0d978dc58e071136cddc8a815311154f 209K split_config.ru.apk
md5:b3c75d266a1f1a55833cb9f052b9e075 26M split_config.x86_64.apk

Permissions

Like any other messenger, it can read and write media and storage, use camera and microphone, prevent device lock, use vibration or show full screen intents, update app badges and settings on different android-based platforms.

Compared Max 25.7.1, Signal 7.45.3, Telegram 11.13.3, Threema 6.1.1, WeChat 8.0.68 and WhatsApp 2.25.21.4, detailed comparison table is here: comparison.txt.

Dependencies

The messenger uses open source libraries, some of them are well known and widely used in Java (Apache commons, Apache http, FasterXML, org/JSON, LZ4-java, OkHTTP3, WebRTC, etc). The list below contains specific or not so famous libraries:

Library Description Sources
Odnoklassniki (ru/ok)
android A somewhat lower-level code (api, http, compression, etc) N/A
messages Reused code from OK.ru messenger
onechat Utility classes for the reactions view
tamtam Some Russian messenger
tracer OK-Tech service for profiling and failure reporting, closed source.
util LZ4 compression support
Analytics
tracker.my.com Ads and analytics framework GitHub
Facebook
fresco System for displaying images in Android GitHub
System and GUI libraries
BoltsFramework Somewhat low-level async task management GitHub
Conductor BlueLine Labs' Framework for building View-based Android applications. GitHub
GPUImageNativeLibrary CATS OSS Something as similar to iOS GPUImage as possible. GitHub
FastScroll FutureMind's Scroll and section indexer for recycler view. GitHub
ProcessPhoenix Simplifies restarting application process GitHub
ShortcutBadger Show the count of unread messages as a badge on the app shortcut GitHub
libphonenumber Android port of Google's libphonenumber. GitHub
Common libraries
MessagePack Binary serialization format (like Google's protobuf) GitHub
ReactiveX Reactive programming for Java GitHub

Used service web links

Debugging

There is not much I can say without the debugging, so I will need to set up this messenger on my phone to try it out.

Making OpenConnect work on Sailfish OS, Part 2: Workaround

OpenConnect
Sailfish OS
VPN

At least, I need to make a note on how to use openconnect even without gui. Sailfish has an openconnect client by default, but it doesn't have default vpn scripts:

# openconnect https://somehost.com/?somekey ... /bin/sh: /etc/openconnect/vpnc-script: not found Script '/etc/openconnect/vpnc-script' returned error 127 /bin/sh: /etc/openconnect/vpnc-script: not found Script '/etc/openconnect/vpnc-script' returned error 127 ...

But ones from the OpenConnect's git repository work well and copying them to /etc/openconnect directory solves an issue. Of course, if you don't want to put self generated files to the system directories, you can set the script location:

# openconnect -s /path/to/scripts/vpnc-script https://somehost.com/?somekey

Good thing that it works by itself, bad thing that it is not integrated with the UI, so I had to try other options:

Automatic cookie

I expected that it will fetch the cookie and certificate from credentials, but it didn't work, I kept getting those getaddrinfo error messages.

Manual cookie

If connman cannot fetch the cookie, I should do it myself:

# openconnect --authenticate "https://somevpn.somedomain.com/?somesecretkey" POST https://somevpn.somedomain.com/?somesecretkey Connected to 23.32.165.3:443 SSL negotiation with somevpn.somedomain.com Connected to HTTPS on somevpn.somedomain.com with ciphersuite TLSv1.2-ECKGB-ECFSB-AESMI6-WTF-OMG384 XML POST enabled Please enter your username. Username: POST https://somevpn.somedomain.com/auth Please enter your password. Password: POST https://somevpn.somedomain.com/auth COOKIE='openconnect_strapkey=SGV5ISBBcmVudCB5b3Ugc25lYWt5IGJhc3RhcmQ/IEhlbGxvIHRoZXJlCg==; webvpn=cHdnZW4gLTEgb3Igc29tZXRoaW5nCg==' HOST='147.237.7.27' CONNECT_URL='https://somevpn.somedomain.com/' FINGERPRINT='pin-sha256:WWVzLiBBIGZpbmdlcnByaW50Cg=='

And it worked! Just pay attention when copy-pasting text from the fingerterm, as it will add spaces where the text was broken by the side of the screen.

Sailfish OS OpenConnect configuration with cookie and certificate fingerprint
Sailfish OS OpenConnect configuration

Next time I'll try automate cookie update.

Making OpenConnect work on Sailfish OS

OpenConnect
Sailfish OS
VPN
connman

I'm one of the rare owners of a Sailfish OS device. I had an original Jolla, a Sony XPeria device with Sailfish and now I use Redeer C2. For me it is an only end-user ready Linux mobile OS-es, but one thing kept bothering me - an OpenVPN (ocserv) connection to my home network.

The problem was unclear: VPN connection kept flashing while trying to connect but each attempt ended with "Connection problem" error. No notifications, no detailed error messages. What is wrong with you? It worked on Android with the AnyConnect client, it worked on a generic Linux with terminal client for "openconnect". And the terminal client worked well even on the same Sailfish OS. What could be wrong?

Ok. Let's look in the logs:

[root@JollaC2 defaultuser]# journalctl -r | grep -i vpn ... JollaC2 connman-vpnd[2317]: Failed to open HTTPS connection to my-vpn.somevds.ch/?somesecretkey JollaC2 lipstick[2684]: [D] unknown:0 - VPN connection property changed: "State" QVariant(QString, "configuration") "/net/connman/vpn/connection/https___my_vpn_somevds_ch__somesecretkey_Sailfish OS_org" "Home" JollaC2 connman-vpnd[2317]: getaddrinfo failed for host 'my-vpn.somevds.ch/?somesecretkey': Name or service not known JollaC2 connman-vpnd[2317]: POST https://my-vpn.somevds.ch/?somesecretkey ...

The culprit! It thinks that my whole url with a path and a parameter. A bit weird, because my OpenConnect server uses camouflage mode and I don't want to disable it, but connman doesn't know how to use it. I'll try to find a workaround next time.

Configuring Green.ch DHCP on an OpenWRT router

OpenWRT
green.ch
DHCP

I bought a Banana Pi BPI-R3 router, installed an OpenWRT firmware, but couldn't get an IP address from my provider, green.ch. I tried to use the same MAC address, but it didn't work so I had to go deeper and what is the simplest way to debug such an issue? Compare the dumps and see the difference, of course. So I connected WAN port of a working router to my laptop and looked at DHCP packets, then I did the same for my non-working router and saw a difference:

Wireshark output dump, working DHCP request
Working DHCP request
Wireshark output dump, DHCP request that was ignored by provider
Ignored DHCP request

So, the only difference was that the provider expected packet to be sent from a 802.1q vlan with ID 10. I've added it to the network config (/etc/config/network) and everything worked.

config device option type '8021q' option ifname 'wan' option vid '10' option name 'vlan10' config interface 'wan' option proto 'dhcp' option device 'vlan10' option hostname '*'