SSD RAID TRIM + MD- bzw. DM-RAID

#Marc

Schüler

Beiträge: 150

Geburtstag: 13.06.1987 (24)

Geschlecht: Männlich

Wohnort: München

Beruf: FISI

Anzahl Nagios-Server: 1x

Hobbys: Technik

Nagios-Version(en): -

Icinga-Version(en): 1.*

Verteiltes Monitoring: Nein

Redundantes Monitoring: Nein

Anzahl-Hosts: ~70

Anzahl Services: ~300

Betriebssystem(e): Gentoo

Plugin-Version(en): Nagios Plugins 1.4.*

IDO-Version: 1.*

Sonstige Addon's: PNP4Nagios_0.6.*

06.07.2011, 13:03

SSD RAID TRIM + MD- bzw. DM-RAID

Hallo zusammen,

dieser Thread richtet sich an die Linux-Cracks unter euch.

Hat jemand schon Erfahrungen mit SSD in Bezug auf RAID und deren TRIM-Kommandos gemacht? Ich habe in den Kernel-Logs bei fast jedem Dateisystem (brtfs, ext3, xfs) etwas über Verbesserung bei der DISCARD-Funktion lesen können, aber ich finde par tou nichts in Bezug auf RAIDs.

#
Marc

Zum Seitenanfang

#Marc

Schüler

Beiträge: 150

Geburtstag: 13.06.1987 (24)

Geschlecht: Männlich

Wohnort: München

Beruf: FISI

Anzahl Nagios-Server: 1x

Hobbys: Technik

Nagios-Version(en): -

Icinga-Version(en): 1.*

Verteiltes Monitoring: Nein

Redundantes Monitoring: Nein

Anzahl-Hosts: ~70

Anzahl Services: ~300

Betriebssystem(e): Gentoo

Plugin-Version(en): Nagios Plugins 1.4.*

IDO-Version: 1.*

Sonstige Addon's: PNP4Nagios_0.6.*

Gestern, 01:14

Tja, wenn ein Thema so neu ist, findet man natürlich auch fast nichts...
Jetzt habe ich noch einmal gesucht:

Zitat

ATA Trim mit Linux Software RAID

Aktueller Status Juli 2011:

Linux Software RAID (md) hat keinen eingebauten discard-Support und unterstützt damit kein TRIM. Die Nutzung von dm-mirror des Device Mappers zur Datenspiegelung ist jedoch mit discard-Support etwa ab RHEL 6.1 möglich. Red Hat rät von der Verwendung von Linux Software RAID mit den RAID Leveln 1, 4, 5 und 6 ab, da bei der RAID-Initialisierung manche RAID Managment Utilities wie mdadm alle Blöcke des Storage Devices beschreiben um die korrekte Funktionalität von Prüfsummen sicherzustellen. Da die SSD damit voll beschrieben ist, sinken Lebensdauer und Performance der SSD.[7] Abhilfe wäre hier ein großzügiges SSD Over-Provisioning mit hdparm.
Chris Caputo hat ein Skript geschrieben (raid1ext4trim.sh-1.4), das ein manuelles ATA Trim für Ext4 Dateisysteme auf einem Software RAID 1 ermöglicht.[8][9] Die Slaves Devices des RAIDs müssen dabei aber vollständige SSDs sein (z.B. /dev/sda) und dürfen keine Partitionen sein (z.B. /dev/sda1). Das Skript liegt hdparm bei und ist Unterverzeichnis hdparm-9.37/wiper/contrib/ des Quelltextpakets abgelegt.

Mögliche künftige Entwicklungen:

Im Juni und Juli 2011 gab es eine Diskussion zum möglichen discard-Support von Linux Software RAID. Siehe dazu Linux Software RAID Roadmap.

Quelle:

http://www.thomas-krenn.com/de/wiki/ATA_Trim

http://www.thomas-krenn.com/de/wiki/Linu…re_RAID#Roadmap

Zitat

Bitmap of non-sync regions.

There are a couple of reasons for having regions of an array that are known not to contain important data and are known to not necessarily be in-sync.

1/ When an array is first created it normally contains no valid data. The normal process of a 'resync' to make all parity/copies correct is largely a waste of time.
2/ When the filesystem uses a "discard" command to report that a region of the device is no-longer used it would be good to be able to pass this down to the underlying devices. To do this safely we need to record at the md level that the region is unused so we don't complain about inconsistencies and don't try to re-sync the region after a crash.

If we record which regions are not in-sync in a bitmap then we can meet both of these needs.

A read to a non-in-sync region would always return 0s. A 'write' to a non-in-sync region should cause that region to be resynced. Writing zeros would in some sense be ideal, but to do that we would have to block the write, which would be unfortunate. As the fs should not be reading from that area anyway, it shouldn't really matter.

The granularity of the bit is probably quite hard to get right. Having it match the block size would mean that no resync would be needed and that every discard request could be handled exactly. However it could result in a very large bitmap - 30 Megabytes for a 1 terabyte device with a 4K block size. This would need to be kept in memory and looked up for every access, which could be problematic.

Having a very coarse granularity would make storage and lookups more efficient. If we make sure the bitmap would fit in 4K, we would have about 32 megabytes for bit. This would mean that each time we triggered a resync it would resync for a second or two which is probably a reasonable time as it wouldn't happen very often. But it would also mean that we can only service a 'discard' request if it covers whole blocks of 32 megabytes, and I really don't know how likely that is. Actually I'm not sure if anyone knows, the jury seems to still be out on how 'discard' will work long-term.

So probably aiming for a few K to a few hundred K seems reasonable. That means that the in-memory representation will have to be a two-level array. A page of pointers to other pages can cover (on a 64bit system) 512 pages or 2Meg of bitmap space which should be enough.

As always we need a way to:
- record the location and size of the bitmap in the metadata
- allow the granularity to be set via sysfs
- allow bits to be set via sysfs, and allow the current bitmap to be read via sysfs.

For v0.90 metadata we won't support this as there is no room. We could possibly store about 32 bytes directly in the superblock allowing for 4Gig sections but this is unlikely to be really useful.

For v1.x metadata we use 8 bytes from the 'array state info'. 4 bytes give an offset from the metadata of the start of the bitmap, 2 bytes give the space reserved for the bitmap (max 32Meg) and 2 bytes give a shift value from sectors to in-sync chunks. The actual size of the bitmap must be computed from the known size of the array and the size of the chunks.

We present the bitmap in sysfs similar to the way we present the bad block list. A file 'non-sync/regions' contains start and size of regions (measured in sectors) that are known to not be in-sync. A file 'non-sync/now-in-sync' lists ranges that actually are in sync but have not been recorded in non-in-sync yet. User-space reads now-in-sync', updates the metadata, and write to 'regions'.

Another file 'non-sync/to-discard' lists ranges for a which a discard request has been made. These need to be recorded in the metadata. They are then written back to the file which allows the discard request to complete.

The granularity can be set via sysfs by writing to 'non-sync/chunksize'.

http://neil.brown.name/blog/20110216044002

Zusammenfassend muss man sagen... abwarten.
Bleibt die Frage wer schneller sein wird... die MDRAID-Entwickler oder die Hersteller der RAID-Controller.
Ansonsten... es stellt sich auch die Frage wie wichtig wird es am Ende sein, da die aktuelle GC (Garbage Collection) schon sehr gut sind.
Man sollte einfach darauf achten seine SSDs mit maximal 90% belegtem Speicherplatz zu belasten.

#
Marc

Dieser Beitrag wurde bereits 1 mal editiert, zuletzt von »#Marc« (Gestern, 01:53)

Zum Seitenanfang

Nagios-/Icinga-Portal

SSD RAID TRIM + MD- bzw. DM-RAID

SSD RAID TRIM + MD- bzw. DM-RAID

Zitat

Zitat

Ähnliche Themen