Blog
16 December 2021
Nikolay Bogdanov, software engineer

Our experience with migrating from Gitea to GitLab. Challenging but successful

There are many different source code hosting systems in the world. They differ both in the supported CVSs (Git, Mercurial, Bazaar) and the way they are arranged (cloud, self-hosted). But there is another important point when deciding which one to choose: the degree of integration with complementary tools such as issue trackers, CI/CD, wiki, etc. At Flant, we prefer GitLab (the on-premise variant) and offer this platform to our customers by default (some exceptions persist, though). This article will cover our customer’s migration from Gitea/Jenkins to GitLab as well as the challenges encountered along the way. I will also share some Python scripts that came in handy during the process.

Caution! This article discusses Gitea 1.13.4 and GitLab 13.8. There may be some improvements in the new versions that will make the migration easier, but these versions were the latest at the time of migration.

A brief intro to Gitea

Gitea is a lightweight Open Source system for managing Git repositories. It is a community fork of another lightweight and popular system called Gogs. One notable feature of Gitea is its ability to switch context between multiple organizations within a single instance supplemented by a fairly wide range of permission settings and a GitHub-like API.

The project boasts over 27K stars on GitHub. It is sponsored by DigitalOcean, Equinix, and other companies. You can also support Gitea on the Open Collective website (and even track where your money goes).

The Gitea web interface

On the plus side, Gitea is very easy to install, configure, and back up. You can run it on any reasonably capable computer and take advantage of some built-in services such as wiki pages, tasks, projects, etc.

However, such simplicity comes with its drawbacks. Gitea lacks ready-to-use CI/CD, and you have to use third-party tools to implement the whole workflow. In our case, Jenkins was such a tool (there is a special plugin that provides Jenkins integrations for Gitea). However, this choice had more to do with historical reasons rather than technical ones: the Jenkins-based CI/CD process was by no means easy or straightforward… So, we decided to switch to GitLab to optimize the deployment process*, which also meant replacing Gitea because its features were no longer needed. On top of that, we discovered some minor issues that interfered with the migration.

* In this article, we won’t delve into the reasoning why you could possibly want to migrate. However, these comparisons from Wikipedia, Gitea, and GitLab might help in making the right decision for your specific case.

The starting point

In the beginning, we had a Gitea 1.13.4 installation with 165 repositories and 94 users. They were spread between 18 organizations, while some repositories were private.

In addition, the client wanted to preserve the history of pull requests (and there were quite a few — over 5,000 in some repositories). Our Gitea stats before the migration included 94 users, 18 organizations, 36 teams, 153 public keys, 165 repos, 2.6m total actions.

Although there were not many users, organizations, and groups, the manual migration would likely have been a painful and time-consuming task (and it had to be error-free!).

After all, we are engineers and automation is our way. Below is a summary of the challenges we had to overcome in the process and a few tips for using both Gitea and GitLab’s REST APIs. As a result, our experience can be useful not only for migration but for everyday tasks as well.

Execution: import, export, and migration

Now, let’s get to the migration. Both projects have a sophisticated API. First, let’s choose API clients for both systems:

  • In the case of Gitea, we will use giteapy, a Python-based SDK. Unfortunately, it’s not as good as the one for GitLab. (Get ready for more notes as we proceed with the story.)
  • In the case of GitLab, we will use python-gitlab.

Part 1. Users

A single class is used to connect to GitLab:

import gitlab

gl = gitlab.Gitlab('https://gitlab.example.com', private_token='secret')

gl_users = gl.users.list(page=1, per_page=1000)

In giteapy, each API section has its own subclass. There are 6 of them in total, but we only need four: AdminApi, OrganizationApi, RepositoryApi, and UserApi. Here is the script for copying users (see the flant/examples repository for complete listings):

import giteapy
configuration = giteapy.Configuration()
configuration.api_key['access_token'] = 'secret'
configuration.host = 'https://git.example.com/api/v1'

admin_api = giteapy.AdminApi(giteapy.ApiClient(configuration))
user_api_instance = giteapy.UserApi(giteapy.ApiClient(configuration))
org_api_instance = giteapy.OrganizationApi(giteapy.ApiClient(configuration))
repo_api_instance = giteapy.RepositoryApi(giteapy.ApiClient(configuration))

gt_users = admin_api_instance.admin_get_all_users()

Take a look at the output below. Doesn’t it look suspicious?

{'avatar_url': 'https://gitea.example.com/user/avatar/user1/-1',
 'created': datetime.datetime(2018, 10, 11, 19, 0, 0, tzinfo=tzutc()),
 'email': 'user1@example.com',
 'full_name': 'User Name',
 'id': 2,
 'is_admin': False,
 'language': 'en-US',
 'last_login': datetime.datetime(2020, 10, 19, 8, 0, 0, tzinfo=tzutc()),
 'login': 'user1'}

The thing is, there is no indication of whether the user is blocked or not in the API output. I looked into every possible way and found out that this information has to be queried from the Gitea database. Fortunately, that is not a problem: use the following simple query to get it:

SELECT is_active FROM "user" WHERE id = <user_id>

There were only a few blocked users (about a dozen), and we migrated them manually.

SSH keys

Obviously, as part of the migration, we will also have to transfer the user SSH keys. The Gitea API client library provides the user_current_get_key method, but it works weirdly:

  • if the user has many keys, it returns only one key;
  • If there are no keys, it returns a 404 error.

We didn’t take this into account during the initial migration and used an unmodified API call. The result was the wrong keys — there were not enough of them. That is why I strongly recommend using the user_list_keys method. However, there was another pitfall: it turned out that there is no unique fingerprint index in the database.

Indexes:
    "public_key_pkey" PRIMARY KEY, btree (id)
    "IDX_public_key_fingerprint" btree (fingerprint)
    "IDX_public_key_owner_id" btree (owner_id)

Because of this, we had to solve conflicts when importing keys into GitLab during the migration. Fortunately, these keys mostly belonged to the blocked users. So we decided to delete all the blocked users’ keys that were migrated. Here is how you can do it GitLab:

# clean blocked users keys
for block_gl_user in gl.users.list(blocked=True, page=1, per_page=10000):
    print("Blocker user", block_gl_user.username)
    for block_gl_user_key in block_gl_user.keys.list():
        print("Found key", block_gl_user_key.title)
        block_gl_user_key.delete()

Permissions

The next step is to grant the correct permissions. Gitea organizations are converted into GitLab groups, while teams are converted into access rules.

After retrieving all the teams using the API, we asked the client to approve the right mapping matrix:

# map access rules
map_access = {'Owners': gitlab.OWNER_ACCESS,
              'Developers': gitlab.DEVELOPER_ACCESS,
              'QA': gitlab.DEVELOPER_ACCESS,
              'Manager':gitlab.REPORTER_ACCESS,
              'Managers': gitlab.REPORTER_ACCESS,
              'Dev': gitlab.DEVELOPER_ACCESS,
              'Services': gitlab.REPORTER_ACCESS,
              'services': gitlab.REPORTER_ACCESS}

# inspect Gitea orgs and create Gitlab groups
# get all orgs
gt_all_orgs = admin_api_instance.admin_get_all_orgs()
for gt_org in gt_all_orgs:
    # does the group exist?
    res = None
    try:
        res = gl.groups.get(gt_org.username)
    except:
        pass

    if res:
        # append existing groups to dictionary 
        dict_gl_groups[gt_org.username] = res
    else:
        # create the missing group
        gl_group = gl.groups.create({'name': gt_org.username, 'path': gt_org.username})
        if len(gt_org.description) > 0:
            gl_group.description = gt_org.description
        if len(gt_org.full_name) > 0:
            gl_group.full_name = gt_org.full_name
        gl_group.save()
        dict_gl_groups[org.username] = gl_group
    # list teams for the Gitea org
    gt_org_teams = org_api_instance.org_list_teams(gt_org.username)
    for team in teams:
        # get all team members
        members = org_api_instance.org_list_team_members(team.id)
        for user in members:
            # add members to groups with their access level
            # dict_gl_users was created on user creation step
            member = dict_gl_groups[gt_org.username].members.create({'user_id': dict_gl_users[user.login].id, 'access_level': map_access.get(team.name, gitlab.REPORTER_ACCESS)})

Email notifications

Transferring users will generate a ton of emails about creating a user, granting access, etc. So you have to decide in advance whether to leave this stuff intact or redirect it to a “black hole”.

Below is the method for creating a user the easy way (no emails, temporary passwords, etc.):

gl_user = gl.users.create({'email': gt_user.email,
                                   'password': password,
                                   'username': gt_user.login,
                                   'name': gt_user.full_name if len(gt_user.full_name) > 0 else gt_user.login,
                                   'admin': gt_user.is_admin,
                                   'skip_confirmation': True})

In addition, you will need a mail server that will route all emails to /dev/null. You can use the following Postfix config:

relayhost = 
relay_transport = relay
relay_domains = static:ALL
smtpd_end_of_data_restrictions = check_client_access static:discard

In our case, the client first decided to route all emails to a “black hole” but later changed his mind and asked us to confirm all email addresses. Note that if you omit the skip_confirmation parameter, you will have to confirm users manually. Unfortunately, this is a well-known GitLab bug: you have to use the Rails console to confirm users.

Interim results

Below are some pitfalls I discovered in the process:

  • absence of essential information about the user’s properties in the API;
  • non-obvious API methods for working with keys;
  • duplicate keys.

GitLab, on the other hand, has a problem with email confirmation. The resulting user migration script can be found in our example repository.

Part 2. Repositories

Now that the user tree is ready, we can migrate the repositories. GitLab can import Gitea projects starting with version 8.15. However, things are not as simple as we would like them to be.

First of all, we need to add our Gitea migration user to all the repositories. For that, we’ll use the following script:

all_orgs = admin_api_instance.admin_get_all_orgs()
for org in all_orgs:
    for repo in org_api_instance.org_list_repos(org.username):
        body = giteapy.AddCollaboratorOption()
        repo_api_instance.repo_add_collaborator(repo.owner.login, repo.name, 'import_user', body=body)

    teams = org_api_instance.org_list_teams(org.username)
    for team in teams:
        members = org_api_instance.org_list_team_members(team.id)
        for user in members:
            for repo in user_api_instance.user_list_repos(user.login):
                repo_api_instance.repo_add_collaborator(repo.owner.login, repo.name, 'import_user', body=body)

The section with the API connection is omitted here because we already listed a similar example above. You may notice another problem with the Gitea client: the User ID and User Login are used simultaneously. This is inconvenient because it leads to endless searching through documentation.

Now that we’ve got information about all the repositories via the API, we can import them into GitLab. At first sight, the process should not be complicated: create a new project and click the import button. However, nothing gets imported that way. In practice, problems occur at every step.

Fixing API flaws with NGINX

It’s worth noting that GitLab is aware that the Gitea API is similar to that of GitHub and uses Octokit Gem. However, Gitea’s API implementation is not complete. That is why the importing process periodically stumbles. There are two major aspects:

  1. Absence of rate_limit in the API;
  2. Paths are messed up due to Octokit adding garbage prefix to requests.

Fortunately, the source Gitea instance was sitting behind an NGINX reverse proxy, so I was able to get around these problems by modifying the proxy configuration.

First of all, let’s deal with the rate limit. Octokit uses this built-in method to find out how often it can send requests to the API. However, when requests are sent to the root methods, Gitea returns a 404 error, and the client treats it as Unimplemented:

[13/Apr/2021:01:08:15 +0000] "GET /api/v1/rate_limit HTTP/1.1" 404 152 "-" "Octokit Ruby Gem 4.15.0"

At the same time, a request to RepoApi returns a 401 error, which causes the import to stop:

[13/Apr/2021:01:26:25 +0000] "GET /org1/project1.git/api/v1/rate_limit HTTP/1.1" 401 152 "-" "Octokit Ruby Gem 4.15.0"

To get around this, let’s create the following location in NGINX:

location ~* "\/api\/v1\/rate_limit$" {
  return 404;
}

Now, all requests will return a 404 error — the migration should go smoothly.

The second issue is more interesting: Octokit adds a garbage prefix to requests. Suppose we have org1 with project1, and there is a user g.smith with project2 in a personal namespace. In this case, you may find some strange requests in the NGINX log:

/org1/project1/api/v1/repos/org1/project1/labels?page=1&per_page=100
/org1/project1/api/v1/repos/org1/project1/milestones?page=1&per_page=100&state=all
/org1/project1/api/v1/users/g.smith
/g./api/v1/repos/g.smith/project2/labels?page=1&per_page=100
/g.smith/project2/api/v1/rate_limit

As you can see, Octokit adds the /org1/project1 prefix to the org request. As for the user repository, it adds the following two prefixes:

  • /g.
  • /g.smith/project2

The rewrite below corrects the invalid requests:

rewrite '^\/([^\/]+)\/([^\/]+\.git)\/api\/v1\/repos\/([^\/]+)\/([^\/]+)\/([^\/]+)$' /api/v1/repos/$3/$4/$5;
rewrite '^\/([^\/]+)\/([^\/]+\.git)\/api\/v1\/users\/(.+)$' /api/v1/users/$3;

Finally, the import was successful!

Importing repositories

This leaves us with one last problem: 160 projects need to be imported into the correct namespaces. Unfortunately, the API can only import archives (and you cannot include merge requests, issues, and other auxiliary stuff into the archive). I had to make a script that sends import requests via the GitLab WebUI. Here it is (note that the script below is based on Selenium):

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
import time

GITLAB_URL="https://gitlab.example.com/"
GITLAB_USER="user"
GITLAB_PASSWORD="pa$$word"
GITED_URL="https://gitea.example.com/"
GITED_TOKEN="superSecret"


driver = webdriver.Firefox(os.getcwd()+os.path.sep)
driver.get(GITLAB_URL)

# Gitlab login
user = driver.find_element_by_id("user_login")
user.send_keys(GITLAB_USER)
pas = driver.find_element_by_id("user_password")
pas.send_keys(GITLAB_PASSWORD)
login = driver.find_element_by_name("commit").click()

Now let’s request the import page using the session created:

# Starting import process
driver.get(GITLAB_URL+"/import/gitea/new")
gitea_host = driver.find_element_by_name("gitea_host_url")
gitea_host.send_keys(GITED_URL)
gitea_token = driver.find_element_by_name("personal_access_token")
gitea_token.send_keys(GITED_TOKEN)
process = driver.find_element_by_name("commit").click()

Finally, let’s move on to importing all the repositories. Initialize the Gitea connection and start importing repos:

# iterate over table and import repos step by step
wait = WebDriverWait(driver, 10)
table =  wait.until(EC.presence_of_element_located((By.XPATH, '//table')))
for row in table.find_elements_by_xpath(".//tr"):
  group=row.get_attribute("data-qa-source-project").split("/")[0]
  # clicking select button to show dropdown menu and activate buttons
  row.find_element_by_class_name("gl-dropdown-toggle").click()
  time.sleep(1)
  # Finding project group
  for btn in row.find_elements_by_class_name("dropdown-item"):
    if btn.get_attribute("data-qa-group-name") == group:
      btn.click()
  time.sleep(1)
  # starting import
  import_button = row.find_element(By.XPATH, "//button[@data-qa-selector='import_button']")
  import_button.click()
  while True:
    time.sleep(10)
    # Wait until 
    status = row.find_elements_by_class_name("gl-p-4")[-1].text
    if status == "Complete":
      break

It looks like we finally did it! Alas, no.

The final touch

Once the migration was complete, we temporarily enabled Jenkins integration and got the following error:

fatal: couldn't find remote ref refs/merge-requests/184/head

It turned out that Git references were lost during the migration. To fix this, we made empty commits in all branches with open merge requests. As a result, references were recreated.

Here is the workaround script:

shutil.rmtree('code',ignore_errors=True)

all_orgs = gl.groups.list()
skip_orgs = ['org1','org2']
for org in all_orgs:
    if org.name in skip_orgs:
        print("Skip group", org.name)
        continue
    projects = org.projects.list(all=True)
    for project in projects:
        id=project.id
        mrs=gl.projects.get(id=id).mergerequests.list(state='opened', sort='desc',page=1, per_page=10000)
        os.mkdir('code')
        print(subprocess.run(["git", "clone", project.ssh_url_to_repo, "code"], capture_output=True))
        for mr in mrs:
            print(project.name, id, mr.title, mr.source_branch, '=>', mr.target_branch)
            print(subprocess.run(["git", "checkout", mr.source_branch], cwd='code', capture_output=True))
            print(subprocess.run(["git", "pull"], cwd='code', capture_output=True))
            print(subprocess.run(["git", "commit", "--allow-empty", "-m", "Nothing here"], cwd='code', capture_output=True))
            print(subprocess.run(["git", "push"], cwd='code', capture_output=True))
        shutil.rmtree('code',ignore_errors=True)

The script fixed the error, and the rest of the process went smoothly.

You can find full listings of all the Python scripts used in the flant/examples repository.

The resulting GitLab installation. New users have been added to it, while some older projects have been deleted

Takeaways

Migrating from Gitea to GitLab, while seemingly easy, proved to be a challenge. I had to write a number of scripts and pass through a lot of unexpected complications due to the incompleteness and incompatibility of APIs to get the desired result. Nevertheless, I succeeded. I hope my experience will make your life easier and help you in solving similar problems.