January 17, 2020

Forking a Wikipedia article

Instead of putting the fork online a more easier to explain version is hold only a local copy. The advantage is, that in such a case no copyright problems can occurs, because downloading information is always allowed, only the upload of information can produce trouble.

In the following tutorial a local fork of the AIMA article https://en.wikipedia.org/wiki/Artificial_Intelligence:_A_Modern_Approach is created. Because it has to do with Artificial Intelligence and the article is not very long, so it's a great choice for experimenting a bit. The first thing to do is to create in a working directory a git project:

git init

Then the AIMA article from 2004-03-31 (an early version of the stub) is copied into the folder and the commit is created.

cp /remote/AIMA-article2004-03-31 aima.txt

git add --all && git commit -m "init"

The file doesn't has a date as a name, but it's simply call aima. The different versions are tracked by the git tool, but not by the programmer. Right now, the git repository contains of a single file which was downloaded from the Wikipedia server.



It's time for forking the file. This isn't done by creating a new git branch, but the fork is managed by the user himself. This provides a greater control of the fork and the upstream version:

cp aima.txt aimafork.txt

gedit aimafork.txt

The aimafork.txt file is edited by the user, he adds a new paragraph with criticism about the book. The user writes down, that the book is too expensive for the normal student.



In the meantime, the users in the Wikipedia project have updated the AIMA article. They are not aware, that in the local fork a new chapter was added, but they are following their own strategy. The updated upstream version is copied into the working directory and overwrites the previous version. It's important that before doing so the git commit command was executed. So that it's possible to go back into the timeline to a a previous point.

git add --all && git commit -m "add chapter criticism"

cp /remote/AIMA-article2004-03-31 aima.txt

git add --all && git commit -m "new upstream version"

The open question is how to merge the upstream version with the forked-version? Merging is equal to concatenate different files:

cat aima.txt >> aimafork.txt

git add --all && git commit -m "cat upstream to fork"

gedit aimafork.txt

git add --all && git commit -m "clean up the fork"



After the upstream and the fork file are combined and all the sections are ordered, the aimafork.txt file is in a great condition.It contains all the latest information from the upstream plus the extra chapter written in the fork. The procedure is repeated over and over again. Which means:

1. download the upstream version into the local repository

2. cat the upstream to the fork

3. clean up the fork

4. improve the fork with new sections

What the user is doing is to maintain two versions in parallel, he has access to the upstream version and to his local fork at the same time. The self created fork is more advanced than the upstream version because it contains of more information. It's the same article about the AIMA book but which was improved by a single user.

No comments:

Post a Comment