Searching for Random

As I was "casually" surfing the web I stumbled upon some PHP files with random names. It turns out that these files are backdoors created by a hacking tool, which might warrant a post on its own later. The problem is that you can't really search for random data in a similar fashion as you can do with Google Dorks for specific files. So given a bunch of files, how would you find the ones with random names? In this post, I'll outline a statistical approach (#AI, #MachineLearning, #BigData, ...) I managed to use with some success to find multiple active and useable backdoors online. That being said, it's only a first step and far from perfect, any input on possible improvements is appreciated!

Motivating example

To shine some more light on what I mean, here is an example:
Which of these five files have a random name?

  • readertest.php
  • fkzptcdrrj.php
  • timeserver.php
  • datatables.php
  • fileserver.php

For anyone that knows English, this is pretty obvious, it's fkzptcdrrj.php.
While it's easy to tell in this case, how would you program a general solution for finding this file?

Specification

In the specific case of this malware, there are some things we know. The filename, excluding extension, is ten characters long and only lowercase letters. This is based on observations, I still haven't found the exact generating function for the names. 

Method

My first idea was to try some machine learning solution to detect English but I didn't like that solution. Firstly, it would fail for non-English names like "configuraracceso.php", "MotDePasse.php", etc. Secondly, it might also break for names with special characters like "edit_user.php", "wp-login.php", etc.

Since my wife is a librarian and a professional in data retrieval, I started by asking her. Understandably, she was a bit stumped at my request to find random names. However, before soon she recommended looking for names with "unlikely patterns".

Happily, I have created my own database with lots of file names that I use to uncover these unlikely patterns. The idea was to use this to create a Markov Chain with transition probabilities between characters in the names. Then finally, for each name, calculate the probability for the corresponding path in the chain. The names with the lowest probability are the "unlikely patterns".

To generate the Markov Chain I used approximately 16,000 unique file names of variable length converted to lowercase.

Example

If we have the files "abc", "aaa", and "aac" we construct the following chain. We start with "abc" where we have one "a b" transition and one "b c" transition, then "aaa" with two "a a" transition and finally "aac" with one "a a" and one "a c" transition. Now, for "a" there is a 1/5 chance the next character is "b", 3/5 it is "a" and a 1/5 that it is "c".

Results

Starting with the Markov Chain, the transitions seem to make sense. "f  i" is very popular at about 18% (any file name with "file" in the name will add to this), while "f  v" is very unlikely at 0.02%. Below is the Markov Chain for the likely name "fileserver".

Markov chain for the word

The top five most "likely" ten letter names results were:

  1. fileserver.php (8.2e-08)
  2. datatables.php (7.1e-08), 
  3. controller.php (7.0e-08),
  4. icontainer.php (1.9e-08),
  5. convention.php (1.8e-08).

And for the key results, the most "unlikely" names (slightly changed to preserve anonymity for victims):

  1. u5xvnsnvdn.php (1.42e-24)
  2. wp-dk1ugc4.php (4.3e-21)
  3. fkzptcdrrj.php (1.23e-20)
  4. wp-z0czikm.php (1.6e-19)
  5. wp-05qvpwb.php (2.4e-19)

I am very happy with these results! All top five results are from hacked servers. fkzptcdrrj.php was executable allowing for RCE. Actually, 1,2,4, and 5 are all from the same hacked server.

The seventh most unlikely pattern was an active web shell from the same malware authors that anyone could use on the infected server, as shown in the figure below.

Web shell with random name

Discussion

While I think these results were surprisingly good I'm sure there is much room for improvement. Even continuing on the Markov Chain model there are many parameters. Should everything be converted to lowercase? Should only unique names be used? If the target length is 10, should only names of this length be used?

Looking at more than the top 5 results, there are quite a lot of false positives (non-random names) like "logoff_wtd", "notify_vtm", "bookflight". The first two are combinations of words and other data. "bookflight" is interesting as it is a combination of two normal English words, but the "k  f" transition is quite rare (1%). Some names arguably seem random, like "esp8266h2o", unless you know that "ESP8266" is a microchip. So perhaps including some word lists or knowledge databases could help eliminate false positives.

Almost eveyone I've told about this to have mentioned ENTROPY! "Just calculate the entropy", "find the name with highest entropy", etc. But this is really not straight forward. Sometimes a simplified version is used for passwords where entropy can be calculated as E = L * log2(R), where L is the length and R the size of the character set. This doesn't really help us in deciding if "admin" or "fkzpt" has the highest entropy, as the have the same length and possible same character set [a-z]. I believe the core problem is that we need to know the distribution of file names before entropy can be applied. Please let me know if you have a good idea on this track! 

Conclusion

Markov chains are pretty good at finding probable patterns and consequently also non-patterns or randomness. Furthermore, multiple of the random files I found were indeed malicious with some even providing RCE on the infected servers.

 


Write your comment!

Comments

AM !8567b3b233e9 No. 1419 >>1421 2022-09-19 08:55:19
> This doesn't really help us in deciding if "admin" or "fkzpt" has the highest entropy.

Not if you consider letters separately, but you could split the word into n-grams (2- or 3- grams could be enough), and then compare by their entropy instead. "ad" should be a much more common 2-gram than "fk" in most popular languages.
Benjamin ## Admin !d2782292df32 No. 1421 2022-09-24 10:49:34
>>1419
Awesome idea AM! :)

For the total probability of a filename, I'm simply taking the product of each ngram in the filename divided by the total weighted ngrams in the dataset. This seems to work but maybe there is a more correct method to do it?

Looking at the bigrams the results are very similar with pretty much the same top lists for both common and uncommon. I guess this makes sense as bigrams and transition probabilities between individual characters are similar. The most common bigrams are "er" (423/13140) and "on" (284/13140). Indeed you are correct that "ad" (86/13140) is much more common than "fk" (1/13140), nice!

Trigrams also give similar results but perhaps a bit more false positives. For example, "subscriber" is the third most unlikely result. Most common trigrams are "ion", "con", and "tio", which makes sense too.

In practice, it would probably be useful to complement/filter the results w.r.t. to popular dictionaries.


yasu313 No. 1422 2022-10-19 20:26:09
good job bejamin :D
Hisssssssss There No. 4834 2024-02-05 16:04:05
Holle this is 6est
Vikillp No. 9253 2025-11-22 23:47:32
Бажаєте бути в курсі всіх цікавих новин країн? Бажаєте прочитати цікаві статті з перших першоджерела?
Тоді рекомендуємо підписатися на вебсайт Rusjizn. У нас ви знайдете свіжі теми економіки та політики держав, отримайте інформацію про останні військові конфлікти на земній кулі, поринете в Світ дикої природи.
І це все на одному веб-сайті.
Також ми даємо щотижневі збірки найбільш важливих публікацій і Вам не доведеться читати кожну статтю щоб не пропустити найцікавіше.
Дані статті можна отримувати на пошту та не витрачати час на читання всіх статей .
http://www.dragonfly-trimarans.org/phpBB/viewtopic.php?t=11104
http://www.alkwet.com/vb/showthread.php?t=193533&p=467479#post467479
http://www.mutterkind-kur.de/forum/viewtopic.php?f=39&t=1250544
http://www.alkwet.com/vb/showthread.php?t=193336&p=466863#post466863
http://www.dragonfly-trimarans.org/phpBB/viewtopic.php?t=10920
Vikitov No. 9255 2025-11-23 01:16:23
Бажаєте перебувати в курсі всіх актуальних новин Світу? Хочете дізнатися важливі статті з перших першоджерела?
Тоді пропонуємо підписатися на онлайн газету Rusjizn. У нас ви можете прочитати класні теми економіки та політики країн, дізнаєтеся про останні військові конфлікти на земній кулі, поринете в Світ дикої природи.
І це все на одному веб-сайті.
Також ми надаємо щотижневі збірки найбільш жвавих новостей і Вам не доведеться читати кожну статтю щоб не пропустити найпотрібне.
Дані статті можна отримувати на e-mail та не витрачати час на прочитання всіх матеріалів .
http://users.atw.hu/mtm-site/viewtopic.php?p=3433#3433
http://www.dragonfly-trimarans.org/phpBB/viewtopic.php?t=10915
http://phpbb2.00web.net/profile.php?mode=viewprofile&u=74754
http://dragonsgate.awardspace.us/viewtopic.php?f=14&t=6817
http://www.theadultstories.net/viewtopic.php?t=733245
irhd No. 9258 2025-11-26 21:23:25
In the lively world of news, where every minute reveals a new story, staying updated with dependable and timely information is crucial.
For Canadians and global news aficionados alike, Couponchristine.com emerges as a formidable force in the digital news field.
Our platform is dedicated to bringing you the most current news from Canada and throughout the world, ensuring that you remain informed on matters that impact you and the global community.
http://www.alkwet.com/vb/showthread.php?t=201268&p=489178#post489178
http://www.alkwet.com/vb/showthread.php?t=201723&p=489839#post489839
http://www.alkwet.com/vb/showthread.php?t=202412&p=492853#post492853
http://www.c-strike.fakaheda.eu/forum/viewthread.php?thread_id=4862
http://www.alkwet.com/vb/showthread.php?t=200130&p=485884#post485884
irky No. 9260 2025-11-26 22:51:22
In the lively world of news, where every minute reveals a new story, staying updated with dependable and timely information is essential.
For Canadians and global news enthusiasts alike, Couponchristine.com appears as a formidable power in the digital news arena.
Our platform is devoted to bringing you the most up-to-date news from Canada and around the world, guaranteeing that you remain informed on matters that impact you and the global community.
http://www.alkwet.com/vb/showthread.php?t=199266&p=478752#post478752
http://www.dragonfly-trimarans.org/phpBB/viewtopic.php?t=11701
http://www.forum.jehovih.ru/viewtopic.php?t=1718
http://www.alkwet.com/vb/showthread.php?t=199243&p=478729#post478729
http://www.alkwet.com/vb/showthread.php?t=201748&p=489890#post489890
Allalix No. 9262 2025-11-27 16:33:18
In an era age where the flood inundation of information never ceases stops, discerning astute consumers seek look for a beacon signal of clarity intelligibility, insight understanding, and understanding awareness. Insanityflows.net stands remains as your premier chief web-based news agency provider, delivering furnishing the most current latest and comprehensive thorough news from Canada and across the globe sphere. Our commitment allegiance to journalistic integrity honesty, in-depth comprehensive reporting, and factual precise accuracy fidelity makes us your trusted trustworthy origin for news that matters counts.
Here's why Insanityflows.net should be your go-to selected place for news:
http://dragonsgate.awardspace.us/viewtopic.php?f=26&t=6944
http://www.dragonfly-trimarans.org/phpBB/viewtopic.php?t=11352
http://www.alkwet.com/vb/showthread.php?t=196040&p=474554#post474554
http://www.alkwet.com/vb/showthread.php?t=196585&p=475249#post475249
http://www.c-strike.fakaheda.eu/forum/viewthread.php?thread_id=4731
Allafza No. 9264 2025-11-27 18:01:59
In an era epoch where the flood deluge of information never ceases terminates, discerning perceptive readers seek hunt for a beacon lodestar of clarity clearness, insight understanding, and understanding grasp. Insanityflows.net stands remains as your premier leading web-based news agency service, delivering furnishing the most current recent and comprehensive all-inclusive news from Canada and across the globe earth. Our commitment allegiance to journalistic integrity ethics, in-depth extensive reporting, and factual accurate accuracy correctness makes us your trusted steadfast foundation for news that matters is important.
Here's why Insanityflows.net should be your go-to selected destination for news:
http://jkasiege.net/viewtopic.php?t=686226
http://dragonsgate.awardspace.us/viewtopic.php?f=14&t=6904
http://www.alkwet.com/vb/showthread.php?t=196904&p=475574#post475574
http://classichammer.com/viewtopic.php?t=729
http://cusatalk.com/viewtopic.php?t=1334
Mehmet Aksoy No. 9266 2025-12-09 19:57:58
Hej,

Erbjuder ni tjänster eller produkter till fastighetsbolag?

Ni kan skicka epost reklam precis som detta till fastighetsbolag i hela Sverige.

Eftersom du läser detta mail så vet du att reklam utskick via epost fungerar.

Vi har hjälpt hundratals företagare att få nya kunder, alltifrån städfirmor till säkerhetsbolag och nu vill vi hjälpa ännu fler.

Läs mer om vad andra företag tycker om oss här:
https://www.fastighetslistor.se/kundomdomen

Nyligen hjälpte vi en städfirma att få 22 offertförfrågningar redan inom 24 timmar.

Vi hjälper er att komma i kontakt med tusentals fastighetsbolag.
Antingen i ert län, i flera län eller i hela Sverige.

Nu kan ni köpa Fastighetslistor med kontaktinformation till alla fastighetsbolag i Sverige.

Fastighetslistor innehåller följande information:
- Företagsnamn
- E-postadress
- Hemsida
- Telefon
- Adress
- Kommun
- Län
- Omsättning
- Antal Anställda
- Antal Fordon
- Bolagsform
- SNI-Kod & Beskrivning

All kontaktinformation ni behöver för att nå ut till nya kunder ingår i våra excel listor.

Med tanke på hur dyrt det är med frimärken så sparar ni både tid och pengar, samt får resultat mycket snabbare via epost.
Det bästa är att ni äger listorna för livet, inga bindningstider eller konstiga abonnemang.

Eftersom vi nyss lanserat Fastighetslistor så har vi ett erbjudande.

Istället för ordinarie pris på 5 Kr per fastighetsbolag med epostadress tar vi nu endast 2 Kr per epostadress.
Minimum beställning på 3,000 st fastighetsbolag gäller för att få rabatterade priset på 2 kr styck.

Erbjudandet gäller endast fram till Fredag 12 December 2025.

Ni kan se antal fastighetsbolag samt priser här:
https://www.fastighetslistor.se/priser

Det är väldigt enkelt att beställa kontaktinformation till fastighetsbolag från oss, på beställningsformuläret fyller ni bara i alla områden ni vill ha listor i.

Kom ihåg att erbjudandet på endast 2 Kr styck per fastighetsbolag med epostadress gäller endast fram till Fredag 12 December 2025.

Fastighetslistorna levereras i excel format och är engångskostnad.

Ni är välkomna att kontakta oss vid eventuella frågor och funderingar.

Med vänliga hälsningar,
Mehmet Aksoy
Telefon: 070-488 05 43
https://www.Fastighetslistor.se