Opened 4 years ago

Closed 4 years ago

#2586 closed change (fixed)

Clear unused accounts from Trac database of the issue tracker at issues1

Reported by: trev Assignee: matze
Priority: P2 Milestone:
Module: Infrastructure Keywords:
Cc: philll, fhd, fred Blocked By:
Blocking: #2771, #2780, #2782, #2783 Platform: Unknown
Ready: yes Confidential: no
Tester: Unknown Verified working: no
Review URL(s):

Description (last modified by philll)

Background

We currently have beyond 26000 accounts in the Trac database, less than 500 are actually being used. That makes Trac slow, its inefficient data structures cannot deal with a large number of accounts. We should ensure staff accounts to not be removed.

What to change

Remove unused accounts from the Trac database used to authenticate users for the issue tracker at issues1, excluding accounts with email domains @eyeo.com and @adblockplus.org

Change History (15)

comment:1 Changed 4 years ago by philll

  • Description modified (diff)
  • Summary changed from Clear unused accounts from Trac database to Clear unused accounts from Trac database of the issue tracker at issues1

comment:2 Changed 4 years ago by philll

  • Cc philll added

comment:3 Changed 4 years ago by philll

  • Description modified (diff)
  • Tester set to Unknown

comment:4 Changed 4 years ago by matze

  • Priority changed from Unknown to P2
  • Ready set

comment:5 Changed 4 years ago by matze

  • Blocking 2771 added

comment:6 Changed 4 years ago by matze

  • Cc fhd fred added; matze removed
  • Owner set to matze

Beside the vast amount of machine-generated accounts and the missing scheme optimizations we are going to address today, there is also a large amount of actual session data from unauthenticated users laying around in the same tables.

Unfortunately, this data seems to not get cleaned up automatically. There's month's old records in there. Other than the spam accounts, however, these can easily be distinguished and even deleted via e.g. cron. I'll post a query later.

comment:7 Changed 4 years ago by matze

Table modifications applied to the issues (trac) database:

mysql> alter table session modify sid character varying(100);
Query OK, 190086 rows affected (15.20 sec)
Records: 190086  Duplicates: 0  Warnings: 0
mysql> alter table session_attribute modify sid character varying(100);                                                        
Query OK, 895880 rows affected (1 min 6.82 sec)
Records: 895880  Duplicates: 0  Warnings: 0
mysql> alter table session_attribute modify name character varying(100);                                                       
Query OK, 895880 rows affected (1 min 4.70 sec)
Records: 895880  Duplicates: 0  Warnings: 0

Table modifications to the orders (trac_orders) one:

mysql> alter table session modify sid character varying(100);
Query OK, 38 rows affected (0.40 sec)
Records: 38  Duplicates: 0  Warnings: 0
mysql> alter table session_attribute modify sid character varying(100);
Query OK, 82728 rows affected (3.19 sec)
Records: 82728  Duplicates: 0  Warnings: 0
mysql> alter table session_attribute modify name character varying(100);
Query OK, 82728 rows affected (3.23 sec)
Records: 82728  Duplicates: 0  Warnings: 0
Last edited 4 years ago by matze (previous) (diff)

comment:8 Changed 4 years ago by matze

Deletion of obsolete session data (we may want to establish this as a regular cron job):

mysql> delete ms, msa from session ms left join session_attribute msa on ms.sid = msa.sid and ms.authenticated = msa.authenticated where ms.authenticated = 0 and ms.last_visit < unix_timestamp(now() - interval 10 day);
Query OK, 603069 rows affected (1 min 50.88 sec)

Note that the interval 10 day has been used because of the hard-coded cookie lifetime:

trac@issues1:~$ grep -A1 -B1 -i "delete from auth_cookie where time" /usr/local/lib/python2.7/dist-packages/trac/web/auth.py
            # Delete cookies older than 10 days
            db("DELETE FROM auth_cookie WHERE time < %s",
               (int(time.time()) - 86400 * 10,))

comment:9 Changed 4 years ago by matze

  • Blocking 2780 added

comment:10 Changed 4 years ago by matze

Before deleting the accounts, here's an overview of the backups available on hand in production (in addition to mine) - just in case we need to react quickly:

+----------------------------+
| Tables_in_trac (_backup_%) |
+----------------------------+
| _backup_session            |
| _backup_session_attribute  |
+----------------------------+

(Before any manual operation.)

+----------------------------------+
| Tables_in_trac (_intermediate_%) |
+----------------------------------+
| _intermediate_session            |
| _intermediate_session_attribute  |
+----------------------------------+

(After altering tables and deleting obsolete sessions.)

comment:11 Changed 4 years ago by matze

The following query has been used to delete the fake accounts older than 10 days:

mysql> delete s, sa from session s join session_attribute sc on s.sid = sc.sid and s.authenticated = sc.authenticated join sess
ion_attribute sa on s.sid = sa.sid and s.authenticated = sa.authenticated where sc.name = "email_verification_token" and s.last
_visit < unix_timestamp(now() - interval 10 day);                                                                              
Query OK, 346199 rows affected (33.23 sec)

Note that it seems like we get a few thousand new fake accounts every day. Thus we may want to execute the query above on a regular basis as well. Instead of investing time in a captcha solution that does not work anyway.

Last edited 4 years ago by matze (previous) (diff)

comment:12 Changed 4 years ago by matze

As discussed with @philll, I've executed the query again, this time restricting to those records older than 5 days:

mysql> delete s, sa from session s join session_attribute sc on s.sid = sc.sid and s.authenticated = sc.authenticated join session_attribute sa on s.sid = sa.sid and s.authenticated = sa.authenticated where sc.name = "email_verification_token" and s.last_visit < unix_timestamp(now() - interval 5 day);
Query OK, 9334 rows affected (0.48 sec)

This is the version that should become a cron job as well - due to the aforementioned permanent creation of new fake accounts we still have a few thousand laying around and slowing down the lookups:

mysql> select count(*) from session where authenticated = 1;
+----------+
| count(*) |
+----------+
|     5701 |
+----------+
1 row in set (0.01 sec)

comment:13 Changed 4 years ago by matze

  • Blocking 2782 added

comment:14 Changed 4 years ago by matze

  • Blocking 2783 added

comment:15 Changed 4 years ago by matze

  • Resolution set to fixed
  • Status changed from new to closed
Note: See TracTickets for help on using tickets.