Forum: Transit support
Topic: Find/replace thousand separators and decimal points using regex within Transit
Poster: Steveling2
Post title: Some progress with thousand separators
Thank you for your quick reply. I've made some progress on the thousand separators. A small change from
#([0-9]+)0\.#([0-9]+)1
to
#([0-9])0\.#([0-9])1
finds every dot (or comma, as required) between numbers. Using
#0,#1
in Replace changes 1.000.000 to 1,000,000 rather than 1,000.000. However, it will also find and change dates in the xx.xx.xxxx format. Your suggestion of exclusion (or Negation, as Transit calls it) sounded promising. Based on your string I can use
([0-9][0-9]\.[0-9][0-9]\.[0-9]+)
to find the format xx.xx.xxx (I tweaked your code so that it will find any year). But I haven't found a way of using this to exlude dates in this format, e.g.
#([0-9]+)0\.#([0-9])1(![0-9][0-9]\.[0-9][0-9]\.[0-9]+)
or
(![0-9][0-9]\.[0-9][0-9]\.[0-9]+)#([0-9]+)0\.#([0-9])1
And another problem springs to mind – in files where there are both separators and decimal points, the separators will be the same as the source language's decimal point after the conversion. Then a regex will be required that will convert x,x and ignore any variant of x,xxx (and that's assuming there are only one or two digits after the decimal point in the file).
I seem to be spending a whole lot of time chasing my tail here. And I haven't even looked at date conversion, e.g. 12/05/2017 to 12 March 2017 (or March 12, 2017). I can't be the first Transit user to want to do this, so I can only assume that the lack of an existing solution means that this can't be done in Transit, or at least not in single find/replace action. Looks like automating these changes needs to be done pre-import.
Topic: Find/replace thousand separators and decimal points using regex within Transit
Poster: Steveling2
Post title: Some progress with thousand separators
Thank you for your quick reply. I've made some progress on the thousand separators. A small change from
#([0-9]+)0\.#([0-9]+)1
to
#([0-9])0\.#([0-9])1
finds every dot (or comma, as required) between numbers. Using
#0,#1
in Replace changes 1.000.000 to 1,000,000 rather than 1,000.000. However, it will also find and change dates in the xx.xx.xxxx format. Your suggestion of exclusion (or Negation, as Transit calls it) sounded promising. Based on your string I can use
([0-9][0-9]\.[0-9][0-9]\.[0-9]+)
to find the format xx.xx.xxx (I tweaked your code so that it will find any year). But I haven't found a way of using this to exlude dates in this format, e.g.
#([0-9]+)0\.#([0-9])1(![0-9][0-9]\.[0-9][0-9]\.[0-9]+)
or
(![0-9][0-9]\.[0-9][0-9]\.[0-9]+)#([0-9]+)0\.#([0-9])1
And another problem springs to mind – in files where there are both separators and decimal points, the separators will be the same as the source language's decimal point after the conversion. Then a regex will be required that will convert x,x and ignore any variant of x,xxx (and that's assuming there are only one or two digits after the decimal point in the file).
I seem to be spending a whole lot of time chasing my tail here. And I haven't even looked at date conversion, e.g. 12/05/2017 to 12 March 2017 (or March 12, 2017). I can't be the first Transit user to want to do this, so I can only assume that the lack of an existing solution means that this can't be done in Transit, or at least not in single find/replace action. Looks like automating these changes needs to be done pre-import.